WO2019041519A1

WO2019041519A1 - Target tracking device and method, and computer-readable storage medium

Info

Publication number: WO2019041519A1
Application number: PCT/CN2017/108794
Authority: WO
Inventors: 周舒意; 王建明; 肖京
Original assignee: 平安科技（深圳）有限公司
Priority date: 2017-08-29
Filing date: 2017-10-31
Publication date: 2019-03-07
Also published as: CN107679455A

Abstract

A target tracking device and method based on a convolutional neural network, and a computer-readable storage medium, these being able to improve the accuracy of target tracking. The device comprises a memory and a processor, wherein a target tracking program capable of running on the processor is stored in the memory, and when the program is executed by the processor, the following steps are implemented: collecting picture samples from a video frame image according to a sampling point distribution, and recording the position coordinates of the picture samples (S10); extracting, based on a CNN model, sample features from the picture samples, and calculating a degree of confidence of the picture samples and a tracked target according to the sample features (S20); adjusting the weight of the picture samples according to the degree of confidence, and calculating the position coordinates of the tracked target according to the position coordinates and the weight (S30); collecting a positive sample and a negative sample from the video frame image according to the position coordinates to train a CNN model with a training sample set (S40); then updating model parameters of the CNN model (S50); and repeating the above-mentioned steps until the tracking of a video is complete (S60).

Description

Target tracking device, method and computer readable storage medium

Priority claim

The present application is based on the priority of the Chinese Patent Application entitled "Target Tracking Apparatus, Method and Computer Readable Storage Medium", filed on Aug. 29, 2017, filed on Aug. 29, 2017, entitled The entire content is incorporated herein by reference.

Technical field

The present application relates to the field of image recognition technologies, and in particular, to a target tracking device, a method, and a computer readable storage medium based on a convolutional neural network.

Background technique

Computer target tracking is an important part of practical applications such as video surveillance. Target tracking refers to accurately locating and tracking moving targets (such as pedestrians, vehicles, etc.) in the video, and estimating the trajectory of the target. As an important topic in the field of computer vision, target tracking has important value in video surveillance, target recognition, and video information discovery.

With the introduction of a large number of target tracking algorithms, target tracking technology has been rapidly developed, but due to actual tracking, there are many practical difficulties in target tracking tasks, such as object occlusion, viewing angle changes, target deformation, ambient illumination changes, and unpredictable The complex background situation, and the existing target tracking algorithm mostly uses the difference between the target and the background to construct the classification model, separates the target from the background, and tracks the target, but this tracking algorithm is difficult to adapt to the above mentioned in the tracking process. Changes in the target and background, such as partial occlusion of the target, or similar background interference, cause the target to be tracked incorrectly, resulting in low target tracking accuracy.

Summary of the invention

The present application provides a target tracking device, a method and a computer readable storage medium based on a convolutional neural network, the main purpose of which is to dynamically update a model during a tracking process to adapt to changes in targets and backgrounds, and to improve target tracking. Accuracy.

To achieve the above object, the present application provides a target tracking device based on a convolutional neural network, the device comprising a memory and a processor, the memory storing a target tracking program executable on the processor, the target The following steps are implemented when the tracking program is executed by the processor:

A. Collecting a plurality of picture samples from the video frame image according to the sampling point distribution, and recording position coordinates of each picture sample;

B. Extracting a plurality of sample features from the plurality of picture samples based on the convolutional neural network CNN model, and respectively calculating a confidence level between each picture sample and the tracking target according to the extracted sample features;

C. Adjusting the weight of the corresponding picture sample according to the calculated confidence, and calculating the position of the tracking target on the video frame image according to the position coordinates of all the picture samples and the adjusted weight coordinate;

D. Collecting positive and negative samples of the tracking target from the video frame image according to the position coordinates;

E. Updating a training sample set of the CNN model according to the positive sample and the negative sample, and training the CNN model with the updated training sample set to update model parameters of the CNN model;

F. Repeat steps A through E until the tracking of the tracking target in all video frame images of the video is completed.

Optionally, the step D includes:

And acquiring a first preset number of picture samples located in a peripheral area of the position coordinate as a positive sample, wherein the peripheral area is an area formed by a point that is smaller than a first preset threshold between the position coordinates ;

And acquiring a second preset number of picture samples located in a distant area of the position coordinate as a negative sample, wherein the distant area is an area formed by a point that is greater than a second preset threshold between the position coordinates The second preset threshold is greater than the first preset threshold.

Optionally, the processor is further configured to execute the target tracking program, to perform the following steps after the step E:

G, adjusting the position of the sampling point on the video frame image according to the adjusted weight to update the sampling point distribution;

The step F includes:

Steps A through G are repeated until the tracking of the tracking target in all video frame images of the video is completed.

Optionally, the step G includes:

Adding a sampling point within a first preset range of the sampling point corresponding to the sample whose weight is greater than the first preset weight, and decreasing the sampling point in a second preset range of the sampling point corresponding to the sample whose weight is smaller than the second preset weight, The second preset weight is smaller than the first preset weight, and the number of added sampling points is equal to the number of reduced sampling points.

Optionally, the processor is further configured to execute the target tracking program to implement the following steps:

Determining whether the video frame image is the first frame image of the video;

If the video frame image is the first frame image of the video, prompting the user to manually select a tracking target on the video frame image and receive a tracking target selected by the user based on the prompt, and after determining the tracking target Initializing a sample point distribution and a training sample set of the CNN model and receiving a second frame image;

If the video image is not the first frame image of the video, the step A is performed.

In addition, to achieve the above object, the present application further provides a target tracking method based on a convolutional neural network, the method comprising:

C. Adjusting weights of the corresponding picture samples according to the calculated confidence, and calculating position coordinates of the tracking target on the video frame image according to position coordinates and weights of all picture samples;

Optionally, the step D includes:

Optionally, after step E, the method further includes:

The step F includes:

Optionally, the step G includes:

In addition, in order to achieve the above object, the present application further provides a computer readable storage medium having a target tracking program stored thereon, the target tracking program being executable by one or more processors to implement Next steps:

C. Adjust the weight of the corresponding picture sample according to the calculated confidence, and calculate the position coordinates of the tracking target on the video frame image according to the position coordinates of all the picture samples and the adjusted weight;

The object tracking device, the method and the computer readable storage medium based on the convolutional neural network proposed by the present application identify the video frame image in the video frame by frame, and collect multiple image samples from the video frame image according to the sampling point distribution, and Recording position coordinates of each picture sample, extracting a plurality of sample features correspondingly from the plurality of sample pictures based on the CNN model, calculating a confidence level between each picture sample and the tracking target according to the extracted sample features, and adjusting the sample according to the confidence level Weighting, and then calculating the position coordinates of the tracking target on the video frame image according to the position coordinates and weights of the sample, and collecting positive and negative samples of the tracking target from the video frame image according to the position coordinates, and retraining the CNN using the collected samples The model updates the model parameters, continues to track the next frame image using the model after updating the model parameters, and so on. After obtaining the tracking result of each frame image, the model is updated according to the tracking result, so that the tracking target changes. The updated model can adapt to changes in goals and backgrounds. Even when the phenomenon of partial occlusion, background interference appear in the image, it can also be successfully tracking the target, improve the accuracy of target tracking.

DRAWINGS

1 is a schematic diagram of a preferred embodiment of a target tracking device based on a convolutional neural network;

2 is a schematic diagram of a program module of a target tracking program in an embodiment of a target tracking device based on a convolutional neural network;

3 is a flow chart of a preferred embodiment of a target tracking method based on a convolutional neural network.

The implementation, functional features and advantages of the present application will be further described with reference to the accompanying drawings.

Detailed ways

It is understood that the specific embodiments described herein are merely illustrative of the application and are not intended to be limiting.

The application provides a target tracking device based on a convolutional neural network. Referring to FIG. 1, a schematic diagram of a preferred embodiment of a target tracking device based on a convolutional neural network is provided.

In this embodiment, the target tracking device based on the convolutional neural network may be a PC (Personal Computer), or may be a smart phone, a tablet computer, an e-book reader, or a portable device. A terminal device having a display function such as a computer.

The convolutional neural network based target tracking device includes a memory 11, a processor 12, a communication bus 13, and a network interface 14.

The memory 11 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (for example, an SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like. Memory 11 may in some embodiments be an internal storage unit of a target tracking device based on a convolutional neural network, such as a hard disk of a target tracking device based on a convolutional neural network. The memory 11 may also be an external storage device of a target tracking device based on a convolutional neural network in other embodiments, such as a plug-in hard disk equipped on a target tracking device based on a convolutional neural network, a smart memory card (Smart Media Card) , SMC), Secure Digital (SD) card, Flash Card, etc. Further, the memory 11 may also include both an internal storage unit of the target tracking device based on the convolutional neural network and an external storage device. The memory 11 can be used not only for storing application software and various types of data installed on a target tracking device based on a convolutional neural network, such as code of a target tracking program, but also for temporarily storing data that has been output or is to be output.

The processor 12, in some embodiments, may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor or other data processing chip for running program code or processing stored in the memory 11. Data, such as executing a target tracking program.

Communication bus 13 is used to implement connection communication between these components.

The network interface 14 can optionally include a standard wired interface, a wireless interface (such as a WI-FI interface), and is typically used to establish a communication connection between the device and other electronic devices.

Figure 1 shows only a convolutional neural network based target tracking device with components 11-14 and a target tracking program, but it should be understood that not all illustrated components may be implemented, alternative implementations may be more or more Less components.

Optionally, the device may further include a user interface, the user interface may include a display, an input unit such as a keyboard, and the optional user interface may further include a standard wired interface and a wireless interface. Optionally, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch sensor, or the like. The display may also be suitably referred to as a display screen or display unit for displaying information processed in a target tracking device based on a convolutional neural network and a user interface for displaying visualization.

Optionally, the device may also include a touch sensor. The area provided by the touch sensor for the user to perform a touch operation is referred to as a touch area. Further, the touch sensor described herein may be a resistive touch sensor, a capacitive touch sensor, or the like. Moreover, the touch sensor includes not only a contact type touch sensor but also a proximity type touch sensor or the like. Furthermore, the touch sensor may be a single sensor or a plurality of sensors arranged, for example, in an array. The area of the display of the device may be the same as or different from the area of the touch sensor. Optionally, a display is stacked with the touch sensor to form a touch display. The device is based on a touch display The screen detects the touch operation triggered by the user.

Optionally, the device may further include a camera, an RF (Radio Frequency) circuit, a sensor, an audio circuit, a WiFi module, and the like. Among them, sensors such as light sensors, motion sensors, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor, wherein if the device is a mobile terminal, the ambient light sensor may adjust the brightness of the display screen according to the brightness of the ambient light, and the proximity sensor may move when the mobile terminal moves to the ear. , turn off the display and / or backlight. As a kind of motion sensor, the gravity acceleration sensor can detect the magnitude of acceleration in each direction (usually three axes), and can detect the magnitude and direction of gravity when stationary, and can be used to identify the posture of the mobile terminal (such as horizontal and vertical screen switching, Related games, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tapping), etc.; of course, the mobile terminal can also be equipped with other sensors such as gyroscope, barometer, hygrometer, thermometer, infrared sensor, etc. No longer.

In the apparatus embodiment shown in FIG. 1, a target tracking program is stored in the memory 11; when the processor 12 executes the target tracking program stored in the memory 11, the following steps are implemented:

A. Collecting a plurality of picture samples from the video frame image according to a sampling point distribution.

In the embodiment of the present application, a convolutional neural network is used to perform offline training on a massive picture to obtain a CNN (Convolutional Neural Network) model, which may be a two-class model, and the model can be extracted from the image. A deeper semantic feature of the moving target and the background feature of the target.

When tracking a moving target in a video, the video image is tracked frame by frame. Specifically, a video to be subjected to target tracking is input to the device, and the device processes each video frame image in the video in accordance with the following operation.

The image samples are collected from the video frame image according to the sampling point distribution, wherein the number of sampling points can be preset by the user, for example, 100 image samples are collected, wherein when the first frame image is started to be recognized, the user can manually The tracking target is selected in the image. For example, the tracking target is selected by a frame selection manner, and the sampling point distribution is initialized based on the position of the tracking target selected by the user. Specifically, when the video frame image is received, determining whether the video frame image is the first frame image of the video; if the video frame image is the first frame image of the video, prompting the user to Manually selecting a tracking target on the video frame image and receiving a tracking target selected by the user based on the prompt; after determining the tracking target, initializing a sampling point distribution and a training sample set of the CNN model and receiving a second frame image; The video image is not the first frame image of the video, and the step A is performed. Alternatively, in other embodiments, if the user sets the target to be tracked in advance and stores it, the tracking target is directly acquired after starting the tracking, and the user is not required to manually select from the first frame image.

After obtaining the tracking target selected by the user, the color histogram of the tracking target area is calculated and used as the target feature of the tracking target, and the target feature can be represented as a vector of N*1.

B. Extracting a plurality of sample features from the plurality of picture samples based on the convolutional neural network CNN model, and separately calculating each picture sample from the tracking target according to the extracted sample features Confidence.

C. Adjust the weight of the corresponding picture sample according to the calculated confidence, and calculate the position coordinates of the tracking target on the video frame image according to the position coordinates of all the picture samples and the adjusted weight.

After the sample image is collected, the collected sample image is input into the trained CNN model for feature extraction, and the sample feature is extracted, and the sample feature can also be represented as an N*1 vector. A sample feature is extracted corresponding to each sample picture, and the confidence between each sample feature and the target feature is calculated separately. The confidence of the sample features reflects the similarity between the image sample and the tracking target. By calculating the similarity between the sample feature and the target feature, the similarity between the two N*1 vectors is calculated as a sample of the image. Confidence with tracking targets.

After obtaining the confidence of each picture sample, the weight of each picture sample is adjusted according to the confidence level, and for the sample with small confidence, the weight is reduced, and for the sample with high confidence, the weight is increased, and then for all the weights, then for all The weights of the image samples are normalized such that the sum of the weights of all samples is equal to one. The position coordinates of the tracking target on the video frame image are calculated based on the weight value of the picture sample and its position coordinates on the video frame image. Specifically, it is assumed that a total of k picture samples are collected, wherein the position coordinates of the sample P _i are (x _i , y _i ), and the confidence degree with the tracking target is S _i . Then, the position coordinates (x, y) of the tracking target can be predicted according to the following formula.

D. Collect positive and negative samples of the tracking target from the video frame image according to the position coordinates.

E. Updating a training sample set of the CNN model according to the positive sample and the negative sample, and training the CNN model with the updated training sample set to update model parameters of the CNN model.

Collecting a positive sample and a negative sample of the tracking target from the video frame image according to the position coordinate, specifically, acquiring a first preset number of picture samples located in a peripheral area of the position coordinate as a positive sample, wherein the periphery The area is an area formed by a point having a distance from the position coordinate that is smaller than a first preset threshold; and a second predetermined number of picture samples located in a distant area of the position coordinate are collected as a negative sample, wherein the The remote preset area is an area formed by a point that is greater than a second preset threshold value, and the second preset threshold is greater than the first preset threshold.

That is to say, after predicting the position of the tracking target on the image, the image samples are collected from the region closer to the tracking target, and the difference between the samples and the tracking target is small, and can be used as a positive sample from the video frame image. Capture image samples from areas farther away from the tracking target. The difference between the tracking targets is large, and can be added as a negative sample to the training sample set of the CNN model, and the CNN model is trained to update the model parameters and improve the accuracy of the model to identify the features of the moving target from the image samples. To enable the model to adapt to changes in the target and background in the video frame image. In this way, during the tracking process, the CNN model is continuously updated, and even if the tracking target is partially occluded or the background interferes with the tracking target, it does not interfere with the correct tracking of the target. After the tracking of the video frame image is completed, the next frame image is continuously tracked, and the updated CNN model is used for feature extraction. Target tracking is performed for each frame of image according to steps A through E, and after the tracking is completed, the CNN model is trained until the tracking of the target in all frame images of the video is completed. It can be understood that the first preset threshold, the second preset threshold, the first preset number, and the second preset number may be preset by use.

Further, in other embodiments, after step E, the following steps are further implemented:

The step F includes repeating steps A to G until the tracking of the tracking target in all video frame images of the video is completed.

Specifically, the distribution of the sampling points is adjusted according to the adjusted weights. Specifically, the sampling points are added in the first preset range of the sampling points corresponding to the samples whose weight is greater than the first preset weight, that is, the pictures with the weights are significant. Adding more sampling points near the sampling point corresponding to the sample, and reducing the sampling point in the second preset range of the sampling point corresponding to the sample whose weight is smaller than the second preset weight, wherein the second preset weight is smaller than the first preset The weight, that is, the sampling point near the sampling point corresponding to the picture sample with small weight reduction, wherein the number of added sampling points is equal to or greater than the number of reduced sampling points, or, when the weight is very small, the corresponding sampling point can be deleted. For example, the sampling point corresponding to the sample whose weight is smaller than the third preset weight is deleted, wherein the third preset weight is smaller than the fourth preset weight.

The target tracking device based on the convolutional neural network proposed in this embodiment performs frame-by-frame recognition on the video frame image in the video, collects multiple picture samples from the video frame image according to the sampling point distribution, and records the position coordinates of each picture sample. And extracting a plurality of sample features correspondingly from the plurality of sample pictures based on the CNN model, calculating a confidence degree between each picture sample and the tracking target according to the extracted sample features, adjusting the weight of the sample according to the confidence degree, and further according to the position of the sample Coordinates and weights calculate the position coordinates of the tracking target on the video frame image, and collect positive and negative samples of the tracking target from the video frame image according to the position coordinates, and retrain the CNN model to update the model parameters using the collected samples, using After updating the model parameters, the model continues to track the next frame image, and so on. After obtaining the tracking result of each frame image, the model is updated according to the tracking result, so that the updated model can be changed when the tracking target changes. Adapt to changes in goals and background, even if partial obscuration occurs in the image When the phenomenon of background interference, but also able to successfully track the target, improve the accuracy of target tracking.

Optionally, in other embodiments, the target tracking program may also be divided into one or more modules, one or more modules being stored in the memory 11 and being processed by one or more processors (this Embodiments are executed by processor 12) to accomplish the present application, and a module referred to herein refers to a series of computer program instructions that are capable of performing a particular function.

For example, referring to FIG. 2, it is a schematic diagram of a program module of a target tracking program in an embodiment of a target tracking device based on a convolutional neural network according to the present application. In this embodiment, the target tracking program may be divided into a preprocessing module 10, Tracking module 20, sampling module 30 and update module 40, illustratively,

The collecting module 10 is configured to: collect a plurality of picture samples from the video frame image according to the sampling point distribution, and record position coordinates of each picture sample;

The pre-processing module 20 is configured to: correspondingly extract a plurality of sample features from the plurality of picture samples based on a convolutional neural network CNN model, and respectively calculate a confidence between each picture sample and the tracking target according to the extracted sample features respectively degree;

The tracking module 30 is configured to: adjust weights of the corresponding picture samples according to the calculated confidence, and calculate position coordinates of the tracking target on the video frame image according to position coordinates and weights of all picture samples;

The sampling module 40 is configured to: collect positive and negative samples of the tracking target from the video frame image according to the position coordinates;

The updating module 50 is configured to: update the training sample set of the CNN model according to the positive sample and the negative sample, and train the CNN model to update model parameters of the CNN model by using the updated training sample set;

The acquisition module 10, the pre-processing module 20, the tracking module 30, the sampling module 40, and the update module 50 perform the above steps to track the target in the order of the video frame images in the video until the tracking target in all the video frame images in the video is completed. Tracking.

The functions or operation steps performed by the above-mentioned collection module 10, the pre-processing module 20, the tracking module 30, the sampling module 40, and the update module 50 are substantially the same as those of the foregoing embodiment, and are not described herein again.

In addition, the present application also provides a target tracking method based on a convolutional neural network. Referring to FIG. 3, it is a flowchart of a preferred embodiment of a target tracking method based on a convolutional neural network. The method can be performed by a device that can be implemented by software and/or hardware.

In this embodiment, the target tracking method based on the convolutional neural network includes:

Step S10: Collect a plurality of picture samples from the video frame image according to the sampling point distribution, and record position coordinates of each picture sample.

The image samples are collected from the video frame image according to the sampling point distribution, wherein the number of sampling points can be preset by the user, for example, 100 image samples are collected, wherein when the first frame image is started to be recognized, the user can manually The tracking target is selected in the image. For example, the tracking target is selected by a frame selection manner, and the sampling point distribution is initialized based on the position of the tracking target selected by the user. Specifically, when the video frame image is received, determining whether the video frame image is the first frame image of the video; if the video frame image is the first frame image of the video, prompting the user to Manually selecting a tracking target on the video frame image and receiving a tracking target selected by the user based on the prompt; after determining the tracking target, initializing a sampling point distribution and a training sample set of the CNN model and receiving a second frame image; The video image is not the first frame image of the video, and the step S10 is performed. Alternatively, in other embodiments, if the user sets the target to be tracked in advance and stores it, the tracking target is directly acquired after starting the tracking, and the user is not required to manually select from the first frame image.

Step S20, correspondingly extracting a plurality of sample features from the plurality of picture samples based on the convolutional neural network CNN model, and respectively calculating a confidence level between each picture sample and the tracking target according to the extracted sample features.

Step S30, adjusting the weight of the corresponding picture sample according to the calculated confidence, and calculating the position coordinates of the tracking target on the video frame image according to the position coordinates and weights of all the picture samples.

Step S40, collecting positive and negative samples of the tracking target from the video frame image according to the position coordinates.

Step S50, updating the training sample set of the CNN model according to the positive sample and the negative sample, and training the CNN model to update the model parameters of the CNN model by using the updated training sample set.

In step S60, steps S10 to S50 are repeatedly performed until the tracking of the tracking target in all the video frame images of the video is completed.

That is to say, after predicting the position of the tracking target on the image, the image samples are collected from the region closer to the tracking target, and the difference between the samples and the tracking target is small, and can be used as a positive sample from the video frame image. Image samples are taken from areas farther away from the tracking target. These samples have a large difference from the tracking targets. They can be added as negative samples to the training sample set of the CNN model, and the CNN model is trained to update the model parameters. The improved model identifies the accuracy of the features of the moving object from the image samples so that the model can adapt to changes in the target and background in the video frame image. In this way, during the tracking process, the CNN model is continuously updated, and even if the tracking target is partially occluded or the background interferes with the tracking target, it does not interfere with the correct tracking of the target. After the tracking of the video frame image is completed, the next frame image is continuously tracked, and the updated CNN model is used for feature extraction. Target tracking is performed for each frame of image in accordance with steps S10 through S40, and after the tracking is completed, the CNN model is trained until all tracking of the target in all frame images of the video is completed. It can be understood that the first preset threshold, the second preset threshold, the first preset number, and the second preset number may be preset by use.

Further, in other embodiments, after step S50, the method further includes the following steps: adjusting the distribution of the sampling points according to the adjusted weights, specifically, the sampling corresponding to the samples having the weight greater than the first preset weight Adding a sampling point within a first preset range of the point, that is, adding more sampling points near the sampling point corresponding to the image sample with a large weight, and second sampling of the sampling point corresponding to the sample whose weight is smaller than the second preset weight The sampling point is reduced in the range, wherein the second preset weight is smaller than the first preset weight, that is, the sampling point near the sampling point corresponding to the image sample with small weight is reduced, wherein The number of added sampling points is equal to or greater than the number of reduced sampling points, or, when the weight is very small, the corresponding sampling points may be deleted, for example, the sampling points corresponding to the samples whose weights are smaller than the third preset weight are deleted, wherein The third preset weight is smaller than the fourth preset weight.

The target tracking method based on the convolutional neural network proposed in this embodiment performs frame-by-frame recognition on the video frame image in the video, collects multiple picture samples from the video frame image according to the sampling point distribution, and records the position coordinates of each picture sample. And extracting a plurality of sample features correspondingly from the plurality of sample pictures based on the CNN model, calculating a confidence degree between each picture sample and the tracking target according to the extracted sample features, adjusting the weight of the sample according to the confidence degree, and further according to the position of the sample Coordinates and weights calculate the position coordinates of the tracking target on the video frame image, and collect positive and negative samples of the tracking target from the video frame image according to the position coordinates, and retrain the CNN model to update the model parameters using the collected samples, using After updating the model parameters, the model continues to track the next frame image, and so on. After obtaining the tracking result of each frame image, the model is updated according to the tracking result, so that the updated model can be changed when the tracking target changes. Adapt to changes in goals and background, even if partial obscuration occurs in the image When the phenomenon of background interference, but also able to successfully track the target, improve the accuracy of target tracking.

In addition, the embodiment of the present application further provides a computer readable storage medium, where the target readable program is stored on the computer readable storage medium, and the target tracking program can be executed by one or more processors to implement the following operations:

Further, when the target tracking program is executed by the processor, the following operations are also implemented:

Collecting a second predetermined number of picture samples located in a distant area of the position coordinates as a negative And a sample, wherein the remote area is an area formed by a point that is greater than a second preset threshold, and the second preset threshold is greater than the first preset threshold.

The position of the sampling point on the video frame image is adjusted according to the adjusted weight to update the sampling point distribution.

The specific embodiment of the computer readable storage medium of the present application is substantially the same as the foregoing embodiments of the target tracking apparatus and method based on the convolutional neural network, and is not described herein.

It should be noted that the foregoing serial numbers of the embodiments of the present application are merely for the description, and do not represent the advantages and disadvantages of the embodiments. And the terms "including", "comprising", or any other variations thereof are intended to encompass a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a plurality of elements includes not only those elements but also Other elements listed, or elements that are inherent to such a process, device, item, or method. An element that is defined by the phrase "comprising a ..." does not exclude the presence of additional equivalent elements in the process, the device, the item, or the method that comprises the element.

Through the description of the above embodiments, those skilled in the art can clearly understand that the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is better. Implementation. Based on such understanding, the technical solution of the present application, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM as described above). , a disk, an optical disk, including a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the methods described in the various embodiments of the present application.

The above is only a preferred embodiment of the present application, and is not intended to limit the scope of the patent application, and the equivalent structure or equivalent process transformations made by the specification and the drawings of the present application, or directly or indirectly applied to other related technical fields. The same is included in the scope of patent protection of this application.

Claims

A target tracking device based on a convolutional neural network, the device comprising a memory and a processor, wherein the memory stores a target tracking program executable on the processor, the target tracking program is The processor implements the following steps when executed:

A. Collecting a plurality of picture samples from the video frame image according to the sampling point distribution, and recording position coordinates of each picture sample;

B. Extracting a plurality of sample features from the plurality of picture samples based on the convolutional neural network CNN model, and respectively calculating a confidence level between each picture sample and the tracking target according to the extracted sample features;

C. Adjust the weight of the corresponding picture sample according to the calculated confidence, and calculate the position coordinates of the tracking target on the video frame image according to the position coordinates of all the picture samples and the adjusted weight;

D. Collecting positive and negative samples of the tracking target from the video frame image according to the position coordinates;

E. Updating a training sample set of the CNN model according to the positive sample and the negative sample, and training the CNN model with the updated training sample set to update model parameters of the CNN model;

F. Repeat steps A through E until the tracking of the tracking target in all video frame images of the video is completed.
The object tracking device based on a convolutional neural network according to claim 1, wherein the step D comprises:

And acquiring a first preset number of picture samples located in a peripheral area of the position coordinate as a positive sample, wherein the peripheral area is an area formed by a point that is smaller than a first preset threshold between the position coordinates ;

And acquiring a second preset number of picture samples located in a distant area of the position coordinate as a negative sample, wherein the distant area is an area formed by a point that is greater than a second preset threshold between the position coordinates The second preset threshold is greater than the first preset threshold.
The convolutional neural network-based target tracking device according to claim 1, wherein the processor is further configured to execute the target tracking program, to further implement the following steps after the step E:

G, adjusting the position of the sampling point on the video frame image according to the adjusted weight to update the sampling point distribution;

The step F includes:

Steps A through G are repeated until the tracking of the tracking target in all video frame images of the video is completed.
The convolutional neural network-based target tracking device according to claim 2, wherein the processor is further configured to execute the target tracking program to further implement, after step E, Next steps:

G, adjusting the position of the sampling point on the video frame image according to the adjusted weight to update the sampling point distribution;

The step F includes:

Steps A through G are repeated until the tracking of the tracking target in all video frame images of the video is completed.
The object tracking device based on a convolutional neural network according to claim 3, wherein the step G comprises:

Adding a sampling point within a first preset range of the sampling point corresponding to the sample whose weight is greater than the first preset weight, and decreasing the sampling point in a second preset range of the sampling point corresponding to the sample whose weight is smaller than the second preset weight, The second preset weight is smaller than the first preset weight, and the number of added sampling points is equal to the number of reduced sampling points.
The convolutional neural network-based target tracking device according to claim 1, wherein the processor is further configured to execute the target tracking program to implement the following steps before the step A:

Determining whether the video frame image is the first frame image of the video;

If the video frame image is the first frame image of the video, prompting the user to manually select a tracking target on the video frame image and receive a tracking target selected by the user based on the prompt, and after determining the tracking target Initializing a sample point distribution and a training sample set of the CNN model and receiving a second frame image;

If the video image is not the first frame image of the video, the step A is performed.
The convolutional neural network-based target tracking device according to claim 2, wherein the processor is further configured to execute the target tracking program to implement the following steps before the step A:

Determining whether the video frame image is the first frame image of the video;

If the video frame image is the first frame image of the video, prompting the user to manually select a tracking target on the video frame image and receive a tracking target selected by the user based on the prompt, and after determining the tracking target Initializing a sample point distribution and a training sample set of the CNN model and receiving a second frame image;

If the video image is not the first frame image of the video, the step A is performed.
A target tracking method based on a convolutional neural network, characterized in that the method comprises:

A. Collecting a plurality of picture samples from the video frame image according to the sampling point distribution, and recording position coordinates of each picture sample;

B. Extracting a plurality of sample features from the plurality of picture samples based on the convolutional neural network CNN model, and respectively calculating each picture sample and the tracking target according to the extracted sample features Confidence between

C. Adjust the weight of the corresponding picture sample according to the calculated confidence, and calculate the position coordinates of the tracking target on the video frame image according to the position coordinates of all the picture samples and the adjusted weight;

D. Collecting positive and negative samples of the tracking target from the video frame image according to the position coordinates;

E. Updating a training sample set of the CNN model according to the positive sample and the negative sample, and training the CNN model with the updated training sample set to update model parameters of the CNN model;

F. Repeat steps A through E until the tracking of the tracking target in all video frame images of the video is completed.
The method for tracking a target based on a convolutional neural network according to claim 8, wherein the step D comprises:

And acquiring a first preset number of picture samples located in a peripheral area of the position coordinate as a positive sample, wherein the peripheral area is an area formed by a point that is smaller than a first preset threshold between the position coordinates ;

And acquiring a second preset number of picture samples located in a distant area of the position coordinate as a negative sample, wherein the distant area is an area formed by a point that is greater than a second preset threshold between the position coordinates The second preset threshold is greater than the first preset threshold.
The convolutional neural network-based target tracking method according to claim 8, wherein after the step E, the method further comprises:

G, adjusting the position of the sampling point on the video frame image according to the adjusted weight to update the sampling point distribution;

The step F includes:

Steps A through G are repeated until the tracking of the tracking target in all video frame images of the video is completed.
The convolutional neural network-based target tracking method according to claim 9, wherein after the step E, the method further comprises:

G, adjusting the position of the sampling point on the video frame image according to the adjusted weight to update the sampling point distribution;

The step F includes:

Steps A through G are repeated until the tracking of the tracking target in all video frame images of the video is completed.
The method for tracking a target based on a convolutional neural network according to claim 10, wherein the step G comprises:

Adding a first preset range of sampling points corresponding to samples whose weight is greater than the first preset weight a sampling point, where the sampling point is decreased in a second preset range of the sampling point corresponding to the sample whose weight is smaller than the second preset weight, wherein the second preset weight is smaller than the first preset weight, and the added sampling point The number is equal to the number of reduced sampling points.
The convolutional neural network-based target tracking method according to claim 8, wherein before the step A, the method further comprises the following steps:

Determining whether the video frame image is the first frame image of the video;

If the video frame image is the first frame image of the video, prompting the user to manually select a tracking target on the video frame image and receive a tracking target selected by the user based on the prompt, and after determining the tracking target Initializing a sample point distribution and a training sample set of the CNN model and receiving a second frame image;

If the video image is not the first frame image of the video, the step A is performed.
The convolutional neural network-based target tracking method according to claim 9, wherein before the step A, the method further comprises the following steps:

Determining whether the video frame image is the first frame image of the video;

If the video frame image is the first frame image of the video, prompting the user to manually select a tracking target on the video frame image and receive a tracking target selected by the user based on the prompt, and after determining the tracking target Initializing a sample point distribution and a training sample set of the CNN model and receiving a second frame image;

If the video image is not the first frame image of the video, the step A is performed.
A computer readable storage medium, characterized in that the computer readable storage medium stores a target tracking program, and the target tracking program can be executed by one or more processors to implement the following steps:

A. Collecting a plurality of picture samples from the video frame image according to the sampling point distribution, and recording position coordinates of each picture sample;

B. Extracting a plurality of sample features from the plurality of picture samples based on the convolutional neural network CNN model, and respectively calculating a confidence level between each picture sample and the tracking target according to the extracted sample features;

C. Adjust the weight of the corresponding picture sample according to the calculated confidence, and calculate the position coordinates of the tracking target on the video frame image according to the position coordinates of all the picture samples and the adjusted weight;

D. Collecting positive and negative samples of the tracking target from the video frame image according to the position coordinates;

E. Updating a training sample set of the CNN model according to the positive sample and the negative sample, and training the CNN model with the updated training sample set to update model parameters of the CNN model;

F. Repeat steps A through E until the tracking of the tracking target in all video frame images of the video is completed.
The computer readable storage medium of claim 15, wherein the step D comprises:

And acquiring a first preset number of picture samples located in a peripheral area of the position coordinate as a positive sample, wherein the peripheral area is an area formed by a point that is smaller than a first preset threshold between the position coordinates ;

And acquiring a second preset number of picture samples located in a distant area of the position coordinate as a negative sample, wherein the distant area is an area formed by a point that is greater than a second preset threshold between the position coordinates The second preset threshold is greater than the first preset threshold.
The computer readable storage medium of claim 15, wherein the target tracking program is further executable by one or more processors to further implement the following steps after step E:

G, adjusting the position of the sampling point on the video frame image according to the adjusted weight to update the sampling point distribution;

The step F includes:

Steps A through G are repeated until the tracking of the tracking target in all video frame images of the video is completed.
The computer readable storage medium of claim 16, wherein the target tracking program is further executable by one or more processors to further implement the following steps after step E:

G, adjusting the position of the sampling point on the video frame image according to the adjusted weight to update the sampling point distribution;

The step F includes:

Steps A through G are repeated until the tracking of the tracking target in all video frame images of the video is completed.
The computer readable storage medium of claim 17, wherein the step G comprises:

Adding a sampling point within a first preset range of the sampling point corresponding to the sample whose weight is greater than the first preset weight, and decreasing the sampling point in a second preset range of the sampling point corresponding to the sample whose weight is smaller than the second preset weight, The second preset weight is smaller than the first preset weight, and the number of added sampling points is equal to the number of reduced sampling points.
The computer readable storage medium of claim 15, wherein the target tracking program is further executable by one or more processors to implement the following steps prior to step A:

Determining whether the video frame image is the first frame image of the video;

If the video frame image is the first frame image of the video, prompting the user to manually select a tracking target on the video frame image and receive a tracking target selected by the user based on the prompt, and after determining the tracking target Initializing a sample point distribution and a training sample set of the CNN model and receiving a second frame image;

If the video image is not the first frame image of the video, the step A is performed.