CN114663468A

CN114663468A - Target tracking method, device, equipment and storage medium based on neural network

Info

Publication number: CN114663468A
Application number: CN202011409335.0A
Authority: CN
Inventors: 赵凯莉; 冯金; 刘巍; 李安新; 陈岚
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2020-12-04
Filing date: 2020-12-04
Publication date: 2022-06-24
Also published as: JP2022089797A

Abstract

Disclosed is a target tracking method based on a neural network, comprising: acquiring a current image frame; determining an initial target region in the current image frame; determining a sample type to which the initial target region belongs, wherein the sample type comprises a positive sample type and a plurality of negative sample types; generating a plurality of candidate target regions in the current image frame based on a sample type to which the initial target region belongs; and determining a predicted target region from the plurality of candidate target regions.

Description

Target tracking method, device, equipment and storage medium based on neural network

Technical Field

The present disclosure relates to target tracking, and more particularly, to a target tracking method, apparatus, device, and storage medium based on a neural network.

Background

Conventional detection-based tracking methods treat visual tracking as a binary classification task, sequentially separating objects from interfering sources (e.g., background) in each frame while updating the model online. Each model is trained based on positive and negative samples randomly sampled from locations around the target and will predict candidate locations where the target may exist in the next frame. Compared with other tracking algorithms, the detection-based tracking method has the following advantages: firstly, the target position is directly estimated from the clipped video frame, but not extracted from the clipped characteristic diagram; and secondly, distinguishing a classification model trained in advance in target detection. Therefore, detection-based tracking methods are widely used and have achieved significant success in visual tracking.

However, given randomly selected samples, the imbalance of sample classes hinders the improvement of the tracking accuracy of the detection model, and this situation becomes more and more serious as the model is updated. Specifically, due to the traditional random sampling strategy and the strict definition of positive and negative samples (only the sample with IoU being greater than or equal to a threshold value such as 0.7 is defined as a positive sample, and the remaining samples are defined as negative samples, where IoU is the intersection ratio of the two regions), a situation with a large number of negative samples and only a small number of positive samples inevitably results. Under the condition of a large number of negative samples, the biased distribution of the categories is aggravated, and the judgment capability of the classifier is weakened, so that the tracking accuracy of the detection model is hindered.

Disclosure of Invention

The present application has been made in view of the above problems. The application aims to provide a target tracking method, a target tracking device, target tracking equipment and a storage medium based on a neural network, which can overcome the problem of unbalance of positive and negative samples caused by the adoption of a two-classification network in the existing tracking algorithm through a target tracking algorithm based on a multi-classification network model based on deep learning. Meanwhile, the multi-class definition mode can more finely indicate the proportion relation between the target and the background in the candidate frame, and a new candidate frame generation strategy is provided based on the proportion relation, so that the candidate frame closer to the real target is generated in the tracking process.

In one exemplary aspect, the present disclosure provides a target tracking method based on a neural network, including: acquiring a current image frame; determining an initial target region in the current image frame; determining a sample type to which the initial target region belongs, wherein the sample type comprises a positive sample type and a plurality of negative sample types; generating a plurality of candidate target regions in the current image frame based on a sample type to which the initial target region belongs; and determining a predicted target region from the plurality of candidate target regions.

According to some embodiments of the present disclosure, determining an initial target region in the current image frame comprises: determining a region in the current image frame having the same position as the prediction target region determined in the previous image frame as the initial target region in the current image frame.

According to some embodiments of the disclosure, determining the sample type to which the initial target region belongs comprises: and determining the sample type of the initial target area according to the ratio of the target to the background in the initial target area.

According to some embodiments of the disclosure, in a case that the initial target region does not contain the target and contains a portion of the background, determining that the initial target region belongs to a first negative example type; or

Determining that the initial target region belongs to a second negative example type if the initial target region includes a portion of the target and includes a portion of the background; or determining that the initial target region belongs to a third negative example type if the initial target region includes all of the target and includes a portion of the background; or determining that the initial target region belongs to a fourth negative example type if the initial target region includes a portion of the target and does not include the background.

According to some embodiments of the disclosure, determining the sample type to which the initial target region belongs comprises: determining a sample type to which the initial target region belongs according to a probability that the initial target region belongs to the positive sample type.

According to some embodiments of the present disclosure, in a case that a probability that the initial target region belongs to the positive sample type is greater than or equal to 0 and less than 0.2, determining that the initial target region belongs to a first negative sample type; or determining that the initial target region belongs to a second negative sample type if the probability that the initial target region belongs to the positive sample type is greater than or equal to 0.2 and less than 0.4; or determining that the initial target region belongs to a third negative sample type if the probability that the initial target region belongs to the positive sample type is greater than or equal to 0.4 and less than 0.6; or determining that the initial target region belongs to a fourth negative sample type if the probability that the initial target region belongs to the positive sample type is greater than or equal to 0.6 and less than 0.8.

According to some embodiments of the present disclosure, the initial target region is determined to be of a positive sample type if the overlap of the initial target region and a real target region is greater than or equal to a first threshold.

According to some embodiments of the present disclosure, generating a plurality of candidate target regions in the current image frame based on a sample type to which the initial target region belongs comprises: randomly generating a plurality of initial candidate target regions around the initial target region of the current frame; and carrying out displacement or scaling transformation on the plurality of initial candidate target regions based on the sample type of the initial target region to obtain the plurality of candidate target regions.

According to some embodiments of the present disclosure, performing a shift or scaling transformation on the plurality of initial candidate target regions based on a sample type to which the initial target region belongs to obtain the plurality of candidate target regions includes: and under the condition that the initial target region is determined to belong to the positive sample type, directly determining the initial candidate target regions as the candidate target regions without performing displacement or scaling transformation on the initial candidate target regions.

According to some embodiments of the present disclosure, performing a displacement or scaling transformation on the plurality of initial candidate target regions based on a sample type to which the initial target region belongs to obtain the plurality of candidate target regions includes: under the condition that the initial target region is determined to belong to a first negative sample type, performing displacement or scaling transformation on the plurality of initial candidate target regions by using a first displacement factor and a first scaling factor, and determining the plurality of transformed initial candidate target regions as the plurality of candidate target regions; or under the condition that the initial target region is determined to belong to a second negative sample type, performing displacement or scaling transformation on the plurality of initial candidate target regions by using a second displacement factor and a second scaling factor, and determining the plurality of transformed initial candidate target regions as the plurality of candidate target regions; or under the condition that the initial target region is determined to belong to a third negative sample type, performing displacement or scaling transformation on the plurality of initial candidate target regions by using a third displacement factor and a third scaling factor, and determining the plurality of transformed initial candidate target regions as the plurality of candidate target regions; or under the condition that the initial target region is determined to belong to a fourth negative sample type, performing displacement or scaling transformation on the plurality of initial candidate target regions by using a fourth displacement factor and a fourth scaling factor, and determining the plurality of transformed initial candidate target regions as the plurality of candidate target regions.

According to some embodiments of the disclosure, determining a predicted target region from the plurality of candidate target regions comprises: calculating probabilities that the plurality of candidate target regions belong to the positive sample type; and determining the candidate target region with the highest probability as the prediction target region.

According to some embodiments of the disclosure, the method further comprises: pre-training the neural network, wherein pre-training the neural network comprises: obtaining an initial sample library based on random sampling; and inputting the samples in the initial sample library into the neural network to obtain the classifiers of the positive sample type and the negative sample types.

According to some embodiments of the disclosure, the method further comprises: training the neural network using the plurality of candidate target region updates.

In yet another exemplary aspect, the present disclosure provides a neural network-based target tracking apparatus, including: an image acquisition module configured to acquire a current image frame; an initial target region determination module configured to determine an initial target region in the current image frame; a sample type determination module configured to determine a sample type to which the initial target region belongs, wherein the sample type includes a positive sample type and a plurality of negative sample types; a candidate target region generation module configured to generate a plurality of candidate target regions in the current image frame based on a sample type to which the initial target region belongs; and a prediction target region determination module configured to determine a prediction target region from the plurality of candidate target regions.

In yet another exemplary aspect, the present disclosure provides a target tracking device based on a neural network, including: a processor; a memory storing one or more computer program modules; wherein the one or more computer program modules are configured to, when executed by the processor, perform the above-described target tracking method.

In yet another exemplary aspect, the present disclosure provides a non-transitory computer readable storage medium having stored thereon computer instructions, wherein the computer instructions, when executed by a processor, perform the above-described target tracking method.

Drawings

FIG. 1 illustrates a flow chart of a target tracking method according to an embodiment of the principles of the present disclosure.

FIG. 2 shows a schematic diagram of an Intersection over Unit (CRP) according to an embodiment of the principles of the present disclosure.

Fig. 3 is a diagram illustrating the definition of positive and negative samples in the conventional binary method.

Fig. 4 shows a schematic diagram of positive and negative sample classification in accordance with an embodiment of the disclosed principles.

Fig. 5 shows a schematic diagram of a conventional randomly generated candidate box.

Fig. 6 illustrates a flow chart of an adaptive candidate block generation method according to an embodiment of the disclosed principles.

Fig. 7 shows a schematic diagram of candidate boxes generated by an adaptive candidate box generation method according to an embodiment of the disclosed principles.

FIG. 8 illustrates a schematic diagram of the training and updating of a target tracking model in accordance with an embodiment of the disclosed principles.

FIG. 9 illustrates a block diagram of a target tracking device in accordance with an embodiment of the disclosed principles.

FIG. 10 shows a schematic diagram of a target tracking device in accordance with an embodiment of the principles of the present disclosure.

Fig. 11 shows a schematic diagram of a non-transitory computer-readable storage medium in accordance with an embodiment of the disclosed principles.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been illustrated in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather, these embodiments are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and the embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit certain steps.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be understood that the terms "first," "second," and the like in this disclosure are used for distinguishing different devices, modules, or units, and are not used for limiting the order or interdependence of the functions performed by the devices, modules, or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The embodiment of the disclosure provides a target tracking method based on a neural network. FIG. 1 illustrates a flow diagram of a target tracking method 100 in accordance with an embodiment of the principles of the present disclosure.

As shown in fig. 1, the target tracking method 100 according to the embodiment of the present disclosure acquires a current image frame in step S101.

In the embodiments of the present disclosure, the term "current image frame" may be an image in a time-sequential image sequence corresponding to the current time, or may be an image frame cut from a piece of video. Alternatively, an image frame captured in real time by a camera of the surrounding environment at the current time may be used as the current image frame (also referred to as "image under test"). The image to be measured may be various forms of images with the target tracking object acquired through various ways in various different ways. For example, it may be an image containing a target tracking object captured by a network camera in a public place such as a train station, an airport, or the like. Alternatively, the image of the subway station people stream containing the target tracking object captured by the staff of the subway station by using a professional camera or a camera of a mobile phone can also be used. The image to be detected can be stored in a memory of a local computer, and the required image to be detected can also be searched in various platforms of the internet by utilizing a search engine and the like.

The target tracking method 100 determines an initial target region in the current image frame in step S102.

In an embodiment of the present disclosure, the term "initial target region" (may also be referred to as "reference target region") refers to a region that may include a target to be tracked in a current image frame, which is roughly determined according to various ways. The initial target region is referred to as an "initial target region" because the initial target region does not represent a final prediction result of the current image frame, and a series of candidate target regions are generated around the initial target region based on a method described later, and a final prediction target region is selected from the plurality of candidate target regions.

In an embodiment of the present disclosure, the target to be tracked is an object that is determined in advance to be tracked, such as a certain player in a competitive game, or a certain person in a crowd, etc. The target to be tracked may be a human being, but may also be other types of objects, including but not limited to moving objects such as animals, vehicles, aircraft, and the like. The target to be tracked is usually determined by manual marking or designation before tracking, for example, in the 0 th image, the player in the competitive game can be manually marked as the tracking object, and the marking can be performed by, for example, framing the tracking object, i.e., determining a rectangular frame circumscribing the tracking object. It should be understood that the circumscribed rectangle described in this embodiment is merely exemplary, and other labeling modes may be selected according to actual situations. For example, a circumscribed or circumscribed circle, ellipse, square, other polygon, etc. may be selected. Similarly, the "candidate target region" and the "prediction target region" described later are not limited to the rectangular shapes described herein.

According to the target tracking method, each image frame in an image sequence or a video is subjected to target tracking according to the time sequence. That is, it is assumed that the target tracking for the previous frame has been completed before the current image frame is acquired in step S101. In some embodiments, determining an initial target region in the current image frame comprises: determining a region in the current image frame having the same position as the prediction target region determined in the previous image frame as the initial target region in the current image frame. For example, assuming that the current image frame is the t-th frame (t is an integer greater than 0), and assuming that target tracking has been performed for the t-1 th frame and the final prediction target region in the t-1 th frame is a tracking frame of size MxN centered at (xt-1, yt-1) in the previous process, an MxN rectangular frame at the same position in the current image frame t (i.e., centered at (xt-1, yt-1)) is taken as the initial target region in the process of S102.

After determining the initial target region in the current image frame, the target tracking method 100 determines a sample type to which the initial target region belongs in step S103.

Conventional tracking algorithms model the tracking problem as a two-class problem, i.e. for the input image there are only two classes, i.e. positive or negative examples. Where positive and negative samples are defined from IoU for the sample box and the real box. FIG. 2 shows a schematic diagram of an Intersection over Unit (CRP) according to an embodiment of the principles of the present disclosure.

As shown in fig. 2, the intersection ratio (IoU) is a parameter reflecting the closeness of the predicted position frame and the actual position frame, and has a value equal to the ratio of the area of the overlapping portion of the predicted position frame and the actual position frame to the combined area thereof. IoU, the larger the predicted position box and the real position box overlap, i.e., the closer the predicted position and the real position are; IoU, the smaller the predicted position box and the real position box overlap, i.e., the more the predicted position deviates from the real position. In practice, a threshold may be specified, with IoU samples equal to or above the threshold classified as positive samples, and IoU samples below the threshold classified as negative samples.

Fig. 3 is a schematic diagram illustrating the definition of positive and negative samples in the conventional binary classification method based on IoU. As shown in fig. 3, the above threshold is defined as 0.7, and for a sample set containing a plurality of samples, IoU samples greater than or equal to 0.7 are classified as positive samples, and IoU samples less than 0.7 are classified as negative samples, resulting in two sample types. It should be understood that the IoU threshold is set to 0.7 in the embodiments of the present disclosure for convenience of description only, and not limitation, other values of the IoU threshold may be selected, e.g., 0.73, 0.8, even 0.9 or 0.98, etc., with the higher the IoU threshold, indicating that the closer the positive sample position is to the true position.

However, the method of binary classification based on the IoU threshold has a class imbalance problem, and in general, a negative sample is several times of a positive sample, so that the model bias becomes very serious. Meanwhile, although only one negative sample type exists, the learning difficulty degree of the negative sample is different, for example, there are more easily learned samples (e.g., pure background) and difficult samples (e.g., target and background account for different proportions), and this semantic information is not considered in the existing binary classification method.

In order to solve the above-described problem of class imbalance, in an embodiment of the present disclosure, the sample types are classified into a positive sample type and a plurality of negative sample types. In contrast to the binary approach described above, the present application proposes a solution for multiple negative examples types. Through refined negative sample definition, semantic information of the negative samples is extracted, and the problem of sample imbalance among multiple classes can be naturally solved through multi-class learning.

As shown in fig. 4, for each sample in the sample set, it is not simply classified into positive and negative samples according to the conventional binary classification method, but into five different classifications S0-S4. For example, where S0 is a positive sample type, S1-S4 are a first negative sample type, a second negative sample type, a third negative sample type, and a fourth negative sample type based on different semantic information. It should be understood that the division of the negative sample types into the four negative sample types S1-S4 is only exemplary in this application, and the number of the negative sample types may be adjusted as needed, for example, the number of the negative sample types may be less, such as 2 or 3, or may be more, such as 5 or 6, so that the number of samples among the categories is as balanced as possible to avoid the problem of skewed distribution of the categories.

In an embodiment of the present disclosure, for example, a sample of IoU > -0.7 may be first classified as a positive sample type according to IoU, and a sample of IoU <0.7 may be classified as one of four negative sample types of S1-S4.

In the embodiment of the present disclosure, in the definition of the multi-classification model, the definition of the category is not only according to the overlapping rate of the sample and the target, but also according to the relative sizes of the background and the target in different samples, so that the semantic features of each category become clear, and the distinction between the categories is more obvious. The problem of class imbalance existing in the two classification models is naturally relieved, and meanwhile, the problem of deviation in the models is reduced to a certain extent. For example, for a sample with IoU <0.7, the type of sample to which the sample belongs can be determined according to the target-background ratio in the sample. For example, in the case where the sample does not contain a target and contains a portion of the background, it may be determined that the sample belongs to the first negative sample type S0; or in the case where the sample contains a portion of the object and contains a portion of the background, it may be determined that the sample belongs to the second negative sample type S1; or in the case where the sample contains all of the target and contains part of the background, it may be determined that the sample belongs to the third negative sample type S2; or in the case where the sample contains a portion of the object and does not contain a background, it may be determined that the sample belongs to the fourth negative sample type S3. For example, for samples IoU <0.7, samples belonging to the first negative sample type S0 refer to samples of pure background, samples belonging to the second negative sample type S1 refer to samples where the target is incomplete and includes some background, samples belonging to the third negative sample type S2 refer to samples containing the target but the target size is too small, and samples belonging to the fourth negative sample type S3 refer to samples containing parts of the target and not containing background.

A negative sample classification method according to an embodiment of the disclosed principles is not limited to the above-described method. For example, the sample type to which the sample belongs may also be determined according to the probability that the sample belongs to a positive sample type. For example, for a sample of IoU <0.7, where the probability that the sample belongs to the positive sample type is greater than or equal to 0 and less than 0.2, the sample is determined to belong to the first negative sample type S0; or determining that the sample belongs to the second negative sample type S1 in the case that the probability that the sample belongs to the positive sample type is greater than or equal to 0.2 and less than 0.4; or in the case that the probability that the sample belongs to the positive sample type is greater than or equal to 0.4 and less than 0.6, determining that the sample belongs to the third negative sample type S2; or in the case where the probability that the sample belongs to the positive sample type is 0.6 or more and less than 0.8, it is determined that the sample belongs to the fourth negative sample type S3.

It should be understood that, although the above-described classification methods based on the target and background fractions and the classification method based on the probability that a sample belongs to the positive sample type are listed separately for the negative sample classification, the subdivision method of the negative sample is not limited to the above-described method, nor is the number of types of negative samples limited to the above-described four. For example, the subdivided negative sample types may also be derived based on the position of the center point or the center of gravity of the object in the sample frame, for example, assuming that the sample frame is a square shape with a side length of 1 and a center of (0,0), samples in which the center point or the center of gravity of the object is 0 to 0.25, 0.25 to 0.5, 0.5 to 0.75, 0.75 to infinity in the vertical/horizontal direction from (0,0), respectively, may be classified as the first to fourth negative sample types.

Based on the positive sample type and the plurality of negative sample types as described above, the sample type to which the initial target region obtained in S102 belongs can be determined.

In an embodiment of the present disclosure, determining the sample type to which the initial target region belongs may include: and determining the sample type of the initial target area according to the ratio of the target to the background in the initial target area. For example, similar to the sample classification described above, in the case where the initial target region does not contain the target and contains a portion of the background, determining that the initial target region belongs to a first negative sample type; or determining that the initial target region belongs to a second negative example type if the initial target region includes a portion of the target and includes a portion of the background; or determining that the initial target region belongs to a third negative example type if the initial target region includes all of the target and includes a portion of the background; or determining that the initial target region belongs to a fourth negative example type if the initial target region includes a portion of the target and does not include the background.

In another embodiment of the present disclosure, determining the sample type to which the initial target region belongs comprises: determining a sample type to which the initial target region belongs according to a probability that the initial target region belongs to the positive sample type. For example, similar to the second sample classification method described above, in a case where the probability that the initial target region belongs to the positive sample type is greater than or equal to 0 and less than 0.2, it is determined that the initial target region belongs to a first negative sample type; or determining that the initial target region belongs to a second negative sample type if the probability that the initial target region belongs to the positive sample type is greater than or equal to 0.2 and less than 0.4; or determining that the initial target region belongs to a third negative sample type if the probability that the initial target region belongs to the positive sample type is greater than or equal to 0.4 and less than 0.6; or determining that the initial target region belongs to a fourth negative sample type if the probability that the initial target region belongs to the positive sample type is greater than or equal to 0.6 and less than 0.8.

In addition, the type of the sample to which the initial target region belongs may be determined according to other methods, depending on which classification method is used for training the classification model used in the present application. For example, if the present application trains a classifier using the method based on the positions of the center point and the center of gravity of the target as described above, the classifier classifies the initial target region according to the same rule.

After determining the sample type to which the initial target region belongs in step S103, the method 100 proceeds to step S104, and a plurality of candidate target regions are generated in the current image frame based on the sample type to which the initial target region belongs.

As previously mentioned, an "initial target region" (which may also be referred to as a "reference target region") refers to a roughly determined region of the current image frame that may contain a target to be tracked. In the present disclosure, the "initial target region" is determined from the prediction result of the previous frame, that is, the "initial target region" is a region in the current frame that is positioned the same as the prediction result of the previous frame and is equal in size. Based on the initial target region, in combination with the sample type described for the initial target region, the method 100 will generate a plurality of candidate target regions in the current image frame in S103.

In this context, the term "candidate target region" refers to a series of regions generated around the initial target region from which the final predicted region is determined. In actual target tracking, the target tends to move continuously over time. Therefore, the "initial target region" can only roughly locate the approximate position of the target in the current frame, and to further obtain the accurate position of the target in the current frame, a large number of candidate regions need to be generated around the initial target region, and then a region with the highest probability of existence of the target is determined from the candidate regions as the final predicted position of the target in the current frame.

Herein, a "candidate target region" may refer to a rectangular box that may contain a target. Hereinafter, "candidate target region" may be used interchangeably with "candidate box".

In the conventional candidate region generation strategy, some candidate frames are generally generated around the initial target region at random, so that the target candidate frame of the current frame is determined according to the positions of the candidate frames, and the classification model is used to find out the positive sample with the highest scoreThe candidate frame of (2) is the position of the target in the current frame. The existing random candidate box generation strategy contains two parameters: the target position of the previous frame is determined as a reference by a displacement factor v and a scaling factor r, wherein the center of the candidate frame is in the distribution range of the current frame, and the displacement factor v generally simulates the movement of the target. In the conventional method, the displacement factor v is usually set to a fixed empirical value. The scaling factor r determines the extent of variation of the candidate frame with respect to the target size of the previous frame, and in prior art methods the initial target region is typically multiplied by r^-1,r⁰,r¹A certain value to simulate the size change of the target. In conventional approaches, the scaling factor r is also typically set to a fixed empirical value.

In the case where the panning and scaling factors are fixed values, if the number of positive samples sampled is small, this may result in a lack of self-adjustment functionality. On the other hand, to increase the candidate's positive sample probability, we adaptively reset the sample's translation and scaling factors according to the candidate's negative sample type. Specifically, we adjust the search area of the target according to the ratio of the target to the background, i.e. the smaller the ratio of the target, the larger the panning/zooming factor, the more target portions are contained. By using the proposed candidate frame generation method, more accurate candidate targets can be obtained, and more positive samples are provided for the next model update.

Fig. 5 shows a schematic diagram of a conventional randomly generated candidate box. As shown in fig. 5, three candidate boxes are randomly generated around the initial target region according to a fixed displacement factor v and a fixed scaling factor r, the three candidate boxes being at equal distances from the initial target region and having the same size.

However, the conventional random candidate box generation has a problem that the existing tracking algorithm mainly follows gaussian random distribution when generating the candidate box. When the tracking result of the previous frame is not accurate enough, the candidate frame generated according to the result also deviates from the real frame, which causes error accumulation, so that the longer the tracking time is, the larger the deviation of the tracking result from the real frame is. As shown in fig. 5, the randomly generated candidate box does not completely cover the target (player in the competitive game), which will result in a deviation of the final predicted result from the reality, and the longer the tracking time, the larger the deviation.

In order to solve the above problem, an embodiment according to the principles of the present disclosure proposes an adaptive candidate box generation method. Fig. 6 illustrates a flow diagram of an adaptive candidate block generation method 600, according to an embodiment of the disclosed principles.

As shown in fig. 6, an adaptive candidate block generation method 600 according to an embodiment of the disclosed principles includes: in step S601, randomly generating a plurality of initial candidate target regions around the initial target region of the current frame; then, in step S602, based on the sample type to which the initial target region belongs, the initial candidate target regions are subjected to displacement or scaling transformation, so as to obtain the candidate target regions.

An embodiment of a corresponding candidate box generation policy will be described below based on a negative sample classification method based on the ratio of target and background as described above. It should be understood that "first negative example type", "second negative example type", "third negative example type", and "fourth negative example type" hereinafter respectively correspond to the first to fourth negative example types S0-S3 classified by the above negative example classification method based on the ratio of the target and the background. Also, the "positive sample" type corresponds to a positive sample type classified based on the IoU threshold.

In an embodiment of the present disclosure, performing a displacement or scaling transformation on the plurality of initial candidate target regions based on a sample type to which the initial target region belongs to obtain the plurality of candidate target regions includes: under the condition that the initial target area is determined to belong to the positive sample type, directly determining the plurality of initial candidate target areas as a plurality of candidate target areas without performing displacement or scaling transformation on the plurality of initial candidate target areas; under the condition that the initial target region is determined to belong to a first negative sample type, performing displacement or scaling transformation on the plurality of initial candidate target regions by using a first displacement factor and a first scaling factor, and determining the plurality of transformed initial candidate target regions as the plurality of candidate target regions; or under the condition that the initial target region is determined to belong to a second negative sample type, performing displacement or scaling transformation on the plurality of initial candidate target regions by using a second displacement factor and a second scaling factor, and determining the plurality of transformed initial candidate target regions as the plurality of candidate target regions; or under the condition that the initial target region is determined to belong to a third negative sample type, performing displacement or scaling transformation on the plurality of initial candidate target regions by using a third displacement factor and a third scaling factor, and determining the plurality of transformed initial candidate target regions as the plurality of candidate target regions; or under the condition that the initial target region is determined to belong to a fourth negative sample type, performing displacement or scaling transformation on the plurality of initial candidate target regions by using a fourth displacement factor and a fourth scaling factor, and determining the plurality of transformed initial candidate target regions as the plurality of candidate target regions.

Wherein the first to fourth shifting factors are shifting factors corresponding to the first to fourth negative example types, respectively, and the first to fourth scaling factors are scaling factors corresponding to the first to fourth negative example types, respectively.

For example, the adaptive candidate frame generation strategy according to the principles of the present disclosure may utilize a multi-classification model to perform a class determination for the image content in the base frame (i.e., the initial target region) in the current frame, and then dynamically update the existing target candidate frame according to the class to which the image block in the base frame belongs.

1) When the basic frame belongs to the first negative sample type, it indicates that the initial target region does not contain the target, and the search range needs to be expanded, so the displacement factor v is multiplied by a factor u to increase the displacement. In this case, no scaling factor may be used.

2) When the basic frame belongs to the second negative sample type, the initial target area is shown to contain a partial target and a partial background, the displacement factor is taken as v, and the scaling factor is { r^-1,r⁰,r¹}。

3) When the basic frame belongs to the third negative sample type, the basic frame is larger relative to the target, and a candidate frame needs to be generatedTime multiplied by { r^-2,r^-1,r⁰And generating more smaller candidate frames to improve the target hit probability.

4) When the basic frame belongs to the fourth negative sample type, which indicates that the basic frame is small relative to the target, it needs to be multiplied by { r } when generating the candidate frame⁰,r¹,r²And (5) generating more larger candidate frames to improve the target hit probability.

Where v and r are empirically determined constants and the factor u is a value greater than 1. For example, the displacement factor v may be a scalar corresponding to a variance variable of the gaussian distribution. Specifically, the distribution of the center coordinates of the generated candidate frames is centered on the predicted center position (fixed point) of the target in the previous frame, and the coordinates of the candidate frames generated in the next frame follow a gaussian distribution. Here we make an assumption: the movement of the object between two frames is independent in both horizontal and vertical X, Y directions, and the sampling range in both X, Y directions is the same. That is, the candidate box coordinates obey two one-dimensional gaussian distributions of the same variance in the X and Y directions, respectively:

where p (x) and p (y) are the horizontal and vertical positions, respectively, of the center position of the candidate frame.

When it is determined that the initial target region belongs to the first negative sample type S0, i.e., there is no target in the initial target region, we wish to expand the search range of the candidate box, so the variance of the gaussian distribution is multiplied by a coefficient u (e.g., an empirical value is taken to be 1.5), i.e., the above equation becomes:

fig. 7 shows a schematic diagram of candidate boxes generated by an adaptive candidate box generation method according to an embodiment of the disclosed principles. It can be seen that the candidate boxes generated in fig. 7 are better able to cover the target (e.g., the player in the competitive game). That is, compared with fig. 6, the candidate frames obtained according to the random distribution are randomly generated, and therefore the probability of hitting the target is lower, while the candidate frames generated according to the candidate frame generation strategy proposed by us can be set to generate candidate frames with a larger size and adjust the displacement to improve the probability of hitting the target when the candidate frames are smaller than the target real frames due to the benefit of the multi-classification model.

Embodiments of the corresponding candidate box generation strategy are described above for a negative sample classification method based on target and background ratios. It should be understood that the adaptive candidate block generation method disclosed herein is not limited to the above-described method.

For example, a similar method or other adaptive candidate box generation method may also be employed for classification methods based on the probability that a sample belongs to a positive sample type. By way of example and not limitation, for a classification method based on a probability that a sample belongs to a positive sample type, in a case where the probability that the initial target region belongs to the positive sample type is greater than or equal to 0 and less than 0.2, it is determined that the initial target region belongs to a first negative sample type, a coefficient u that is large as described above is employed for the first negative sample type, e.g., u is 2, and { r is multiplied when a candidate box is generated^-2,r^-1,r⁰Generating more smaller candidate frames to improve the target hit probability; or in the case that the probability that the initial target region belongs to the positive sample type is greater than or equal to 0.2 and less than 0.4, determining that the initial target region belongs to a second negative sample type, adopting a medium coefficient u for the second negative sample type, for example, u is 1.25, and multiplying by { r } when generating a candidate frame^-1,r⁰,r¹}; or in the initial target areaWhen the probability of the positive sample type is more than or equal to 0.4 and less than 0.6, determining that the initial target area belongs to a third negative sample type, and adopting { r } for the third negative sample type^-1,r⁰,r¹It is scaled and no additional panning factor u is used; or in the case that the probability that the initial target region belongs to the positive sample type is greater than or equal to 0.6 and less than 0.8, determining that the initial target region belongs to a fourth negative sample type, adopting a smaller coefficient u for the fourth negative sample type with a higher probability, and not scaling the fourth negative sample type.

Alternatively, for a classification method based on the probability that a sample belongs to a positive sample type, a conventional random candidate box generation method may also be employed.

In addition, for the above-described negative sample classification method based on the center point or the barycentric position of the target, an adaptive candidate frame generation method similar to the above-described method may be adopted, or a conventional random candidate frame generation method may be adopted. The present disclosure is not intended to limit the scope of the present disclosure to only the above combinations, and those skilled in the art can variously combine various negative example classification methods and candidate box generation methods as necessary without departing from the spirit and scope of the present disclosure.

After generating a plurality of candidate target regions according to the method, the method 100 determines a prediction target region from the plurality of candidate target regions in step S105.

In some embodiments, determining a predicted target region from the plurality of candidate target regions comprises: calculating probabilities that the plurality of candidate target regions belong to the positive sample type; and determining the candidate target region with the highest probability as the prediction target region.

It should be understood that when each of a plurality of candidate target regions according to the present disclosure is further input into a multi-classification based classifier, the classifier will output a probability score that each candidate target region belongs to each classification. For example, if one of the candidate target regions Cnd1 is input to the classifier, its output is Cnd1 probability scores (ranging between 0 and 1) that belong to the respective classification. Taking the above five classification classifiers as an example, the output of the classifier is:

1) cnd1 score for positive samples: 0.75;

2) cnd1 belonging to the first negative sample type has a score of: 0.01;

3) cnd1 belonging to the second negative example type has a score of: 0.35;

4) cnd1 belonging to the third negative sample type has a score of: 0.25;

5) cnd1 belonging to the fourth negative example type has a score of: 0.4.

taking the three candidate boxes (Cnd1, Cnd2, and Cnd3) generated in fig. 7 as an example, if Cnd1 has a probability score of 0.75 for belonging to a positive sample, Cnd2 has a probability score of 0.9 for belonging to a positive sample, and Cnd3 has a rate score of 0.65 for belonging to a positive sample, Cnd2 having the highest score is determined as the prediction target region.

It should be understood that, for simplicity of explanation, only an example of three candidate boxes is shown in the above embodiment but in practical applications, the number of candidate boxes may be greater than or equal to three, for example, several tens or even several hundreds, in order to cover the target to the maximum extent and improve the accuracy of prediction. More or fewer candidate boxes may be selected as desired.

By this, the prediction of the position of the object in the current image frame has been completed. It should be understood that tracking the target is a continuous process, and when the predicted position of the target in the current image frame is obtained, the predicted position can be used as an input to predict the position of the target in the next image frame using a similar process as in S101-S105 described above. In this case, the prediction target region in the current image frame determined by the method described above is used to determine the initial target region in the next frame, i.e., the regions located at the same position and having the same size in the next frame are determined as the initial target region.

Further alternatively, the motion trajectory of the object may be generated based on the prediction result in each frame. For example, each frame of images may be stitched together to form a panorama, and then the determined predicted target areas may be connected in time series to form a trajectory representing a movement trajectory or a movement route of the target in the environment. Alternatively, geographical position information of each frame of image taken may be recorded (for example, using GPS), and then the prediction target area may be mapped onto a corresponding map position according to the prediction result, and a movement route obtained by tracking the target may be indicated by connecting chronological geographical position information into a trajectory.

A tracking method according to an embodiment of the disclosed principles is implemented by a neural network-based tracking model. FIG. 8 illustrates a schematic diagram of training and updating of a target tracking model, in accordance with an embodiment of the disclosed principles.

As shown in fig. 8, classifier 801 includes a plurality of feature extraction layers (e.g., C1, C2, and C3) and a plurality of fully connected layers (e.g., f1, f2, and f 3). Before being used for target tracking, the classifier 801 is initialized, i.e., the tracking model is trained in advance. Wherein pre-training the tracking model comprises: obtaining an initial sample library based on random sampling; and inputting the samples in the initial sample library into the tracking model to obtain the classifiers of the positive sample type and the negative sample types. For example, the training set required to train the tracking model includes five types of samples, S0, S1, S2, S3, and S4. The sample types S0-S4 correspond to the positive sample type and the first to fourth negative sample types, respectively, as described above. The samples comprising the five types S0-S4 are obtained by random sampling, e.g. by randomly sampling several samples of the same size around the object, and the type to which each sample belongs has been labeled. The sample set is input to classifier 801 to initially train classifier 801 to obtain a multi-class classifier.

For simplicity of illustration, the tracking model shown in FIG. 8 is functionally divided into a training model and an online update model. Wherein the training model performs an initialization process of the tracking model as described above. Furthermore, after initializing and putting into use the classifier 801, if the tracking model has classified a plurality of candidate target regions (e.g., 200) according to the method described in fig. 1, and the position of the target at the t-th frame (i.e., the position of the predicted target region) is determined according to the probability scores of the candidate target regions, a set of labeled (i.e., classified) samples may be generated, and these samples (e.g., 200) may be placed in the training set for updating the parameters of the classifier. Therefore, the tracking model of the application adopts an online updating strategy, namely, the positive samples and the negative samples generated in the tracking process can be further used for learning of the tracking model synchronously, so that the parameters of the model are continuously updated, the model can be adaptively adjusted along with the change of time, and the tracking precision is improved.

The embodiment of the disclosure also provides a target tracking device based on the neural network. FIG. 9 illustrates a block diagram of a target tracking device 900 in accordance with an embodiment of the principles of the present disclosure.

The target tracking device 900 includes an image acquisition module 901 configured to acquire a current image frame; an initial target region determination module 902 configured to determine an initial target region in the current image frame; a sample type determining module 903 configured to determine a sample type to which the initial target region belongs, wherein the sample type includes a positive sample type and a plurality of negative sample types; a candidate target region generation module 904 configured to generate a plurality of candidate target regions in the current image frame based on a sample type to which the initial target region belongs; and a prediction target region determination module 905 configured to determine a prediction target region from the plurality of candidate target regions. These modules may be implemented in software, hardware, firmware, or any combination thereof.

In the embodiment of the present disclosure, for example, the image acquisition module 901 may be implemented as a separate camera (such as a stereo camera, an infrared camera, etc.), and the image acquisition module 901 may also be integrated in various electronic terminals, including but not limited to a mobile phone, a tablet computer, a drone, a navigator, etc., for capturing an image sequence of targets of various places as images to be measured. The image acquisition module 901 may be any one of network cameras located in public places such as airports, train stations, subway stations, and the like.

The term "current image frame" may be an image in a time-sequential image sequence that corresponds to the current time instant, or an image frame that is cut from a piece of video. Alternatively, an image frame captured in real time by a camera of the surrounding environment at the current time may be used as the current image frame (also referred to as "image under test"). The image to be detected can be stored in a memory of a local computer, and a required image to be detected can be searched in various platforms of the internet by utilizing a search engine and the like.

In embodiments of the present disclosure, the initial target area determination module 902 may be implemented as software, hardware, or a combination of both. The term "initial target region" (which may also be referred to as "reference target region") refers to a region in the current image frame that may contain a target to be tracked, which is roughly determined according to various ways. The initial target region is referred to as an "initial target region" because the initial target region does not represent a final prediction result of the current image frame, and a series of candidate target regions are generated around the initial target region based on a method described later, and a final prediction target region is selected from the plurality of candidate target regions.

According to the embodiment of the disclosure, the target tracking device carries out target tracking on each image frame in an image sequence or a video segment according to time sequence. That is, it is assumed that the target tracking for the previous frame has been completed before the current image frame is acquired. In some embodiments, determining an initial target region in the current image frame comprises: determining a region in the current image frame having the same position as the prediction target region determined in the previous image frame as the initial target region in the current image frame. For example, assuming that the current image frame is the t-th frame (t is an integer greater than 0), and assuming that target tracking has been performed for the t-1 th frame and the final prediction target region in the t-1 th frame is a tracking frame of size MxN centered at (xt-1, yt-1) in the previous process, an MxN rectangular frame at the same position in the current image frame t (i.e., centered at (xt-1, yt-1)) is taken as the initial target region in the process of S102.

In an embodiment of the present disclosure, after determining an initial target region in the current image frame, a sample type to which the initial target region belongs is determined using the sample type determination module 903. The sample type determination module 903 (also referred to as a "classifier" in this application) is implemented by a deep learning neural network, such as a convolutional neural network, which includes a plurality of feature extraction layers and a plurality of fully-connected layers.

In the present application, it is not simply classified into positive and negative examples according to the conventional binary classification method, but into five different classifications S0-S4. For example, where S0 is a positive sample type, S1-S4 are a first negative sample type, a second negative sample type, a third negative sample type, and a fourth negative sample type based on different semantic information. It should be understood that the division of the negative sample types into the four negative sample types S1-S4 is only exemplary in this application, and the number of the negative sample types may be adjusted as needed, for example, the number of the negative sample types may be less, such as 2 or 3, or may be more, such as 5 or 6, so that the number of samples among the categories is as balanced as possible to avoid the problem of skewed distribution of the categories.

In the embodiment of the disclosure, in the definition of the multi-classification model, the definition of the category is not only according to the overlapping rate of the sample and the target, but also according to the relative sizes of the background and the target in different samples, so that the semantic features of each category become clear, and the distinction between the categories is more obvious. The problem of class imbalance existing in the two classification models is naturally relieved, and meanwhile, the problem of deviation in the models is reduced to a certain extent. For example, for a sample with IoU <0.7, the type of sample to which the sample belongs can be determined according to the target-background ratio in the sample. For example, in the case where the sample does not contain a target and contains a portion of the background, it may be determined that the sample belongs to the first negative sample type S0; or in the case where the sample contains a portion of the object and contains a portion of the background, it may be determined that the sample belongs to the second negative sample type S1; or in the case where the sample contains all of the target and contains part of the background, it may be determined that the sample belongs to the third negative sample type S2; or in the case where the sample contains a portion of the object and does not contain a background, it may be determined that the sample belongs to the fourth negative sample type S3. For example, for samples IoU <0.7, samples belonging to the first negative sample type S0 refer to samples of pure background, samples belonging to the second negative sample type S1 refer to samples where the target is incomplete and includes some background, samples belonging to the third negative sample type S2 refer to samples containing the target but the target size is too small, and samples belonging to the fourth negative sample type S3 refer to samples containing parts of the target and not containing background.

A negative sample classification method according to an embodiment of the disclosed principles is not limited to the above-described method. For example, the sample type to which the sample belongs may also be determined according to the probability that the sample belongs to a positive sample type. For example, for a sample of IoU <0.7, where the probability that the sample belongs to a positive sample type is greater than or equal to 0 and less than 0.2, it is determined that the sample belongs to a first negative sample type S0; or determining that the sample belongs to the second negative sample type S1 in the case that the probability that the sample belongs to the positive sample type is greater than or equal to 0.2 and less than 0.4; or in the case that the probability that the sample belongs to the positive sample type is greater than or equal to 0.4 and less than 0.6, determining that the sample belongs to the third negative sample type S2; or in the case where the probability that the sample belongs to the positive sample type is 0.6 or more and less than 0.8, it is determined that the sample belongs to the fourth negative sample type S3.

It should be understood that, although the above-described classification methods based on the target and background fractions and the classification method based on the probability that a sample belongs to the positive sample type are listed separately for the negative sample classification, the subdivision method of the negative sample is not limited to the above-described method, nor is the number of types of negative samples limited to the above-described four. For example, subdivided negative sample types may also be obtained based on the position of the center point or the center of gravity of the target in the sample frame, and for example, assuming that the sample frame is a square shape with a side length of 1 and a center of (0,0), samples having distances (0,0) of the center point or the center of gravity of the target in the vertical/horizontal direction of 0 to 0.25, 0.25 to 0.5, 0.5 to 0.75, 0.75 to infinity, respectively, may be classified as first to fourth negative sample types.

Based on the positive sample type and the plurality of negative sample types as described above, a sample type to which the initial target region belongs may be determined.

In an embodiment of the present disclosure, determining the sample type to which the initial target region belongs may include: and determining the sample type of the initial target area according to the ratio of the target to the background in the initial target area. For example, similar to the sample classification described above, in the case where the initial target region does not contain the target and contains a portion of the background, determining that the initial target region belongs to a first negative sample type; or if the initial target region includes a portion of the target and includes a portion of the background, determining that the initial target region is of a second negative example type; or determining that the initial target region belongs to a third negative example type if the initial target region includes all of the target and includes a portion of the background; or determining that the initial target region belongs to a fourth negative example type if the initial target region includes a portion of the target and does not include the background.

In an embodiment of the present disclosure, the candidate target region generation module 904 will generate a plurality of candidate target regions in the current image frame based on the sample type to which the initial target region determined as described above belongs.

Herein, the "initial target region" is determined according to a prediction result of a previous frame, that is, the "initial target region" is a region in the current frame that is positioned the same as and is equal in size to the prediction result of the previous frame. Based on the initial target region, in conjunction with the sample type described for the initial target region, the candidate target region generation module 904 generates a plurality of candidate target regions in the current image frame.

In order to solve the problem that the conventional random generation of candidate frames causes large deviation, an embodiment according to the principles of the present disclosure provides an adaptive candidate frame generation method. Specifically, the candidate target region generating module 904 first randomly generates a plurality of initial candidate target regions around the initial target region of the current frame; and then based on the sample type of the initial target area, carrying out displacement or scaling transformation on the plurality of initial candidate target areas to obtain the plurality of candidate target areas.

1) When the basic frame belongs to the first negative sample type, it indicates that the initial target region does not include the target, and the search range needs to be expanded, so the displacement factor v is multiplied by a factor u to increase the displacement amount. In this case, no scaling factor may be used.

3) When the basic frame belongs to the third negative sample type, it is described that the basic frame is large relative to the target, and it is necessary to multiply { r } in generating the candidate frame^-2,r^-1,r⁰And (4) generating more smaller candidate boxes to improve the target hit probability.

4) When the basic frame belongs to the fourth negative sample type, which indicates that the basic frame is small relative to the target, it needs to be multiplied by { r } when generating the candidate frame⁰,r¹,r²And generating more larger candidate frames to improve the target hit probability.

Where v and r are empirically determined constants and the factor u is a value greater than 1. For example, the displacement factor v may be a scalar quantity, which corresponds to a variance variable of a gaussian distribution. Specifically, the distribution of the center coordinates of the generated candidate frames is centered on the predicted center position (fixed point) of the target in the previous frame, and the coordinates of the candidate frames generated in the next frame follow a gaussian distribution. Here we make an assumption: the movement of the object between two frames is independent in both horizontal and vertical X, Y directions, and the sampling range in both X, Y directions is the same. That is, the candidate box coordinates obey two one-dimensional gaussian distributions of the same variance in the X and Y directions, respectively:

in an embodiment of the present disclosure, after a plurality of candidate target regions are generated according to the method, the prediction target region determination module 905 determines a prediction target region from the plurality of candidate target regions.

1) cnd1 score for positive samples: 0.75;

2) cnd1 belonging to the first negative sample type has a score of: 0.01;

3) cnd1 belonging to the second negative example type has a score of: 0.35;

4) cnd1 belonging to the third negative sample type has a score of: 0.25;

5) cnd1 belonging to the fourth negative example type has a score of: 0.4.

taking the three candidate boxes (Cnd1, Cnd2, and Cnd3) generated in fig. 7 as an example, if a probability score of Cnd1 belonging to a positive sample is 0.75, a probability score of Cnd2 belonging to a positive sample is 0.9, and a rate score of Cnd3 belonging to a positive sample is 0.65, the Cnd2 having the highest score is determined as the prediction target region.

It should be understood that, for simplicity of explanation, only examples of three candidate boxes are shown in the above embodiments. However, in practical applications, the number of candidate boxes may be greater than or equal to three, for example, several tens or even several hundreds, in order to cover the target to the maximum extent and improve the accuracy of prediction. More or fewer candidate boxes may be selected as desired.

It should be understood that in the embodiment of the present disclosure, the target tracking apparatus 900 may further include more modules, and is not limited to the image acquisition module 901, the initial target region determination module 902, the sample type determination module 903, the candidate target region generation module 904, and the predicted target region determination module 905 described above.

For example, the target tracking device 900 may include a training data set construction module for constructing a training data set. For example, the target tracking device 900 may include a training control module adapted to train the target tracking device 900 using the training data set. Additionally, the target tracking device 900 may include an online update module adapted to update the target tracking device 900 with data generated during tracking.

For example, after the target tracking device 900 has been trained and put into service, if the target tracking device 900 has classified a plurality of candidate target regions (e.g., 200) according to the method described in FIG. 1, and the position of the target at the t-th frame (i.e., the position of the predicted target region) is determined based on the probability scores of the candidate target regions, a set of labeled (i.e., classified) samples may be generated and placed into the training set for updating the parameters of the target tracking device 900 (e.g., 200). It can be seen that the target tracking apparatus 900 of the present application adopts an online updating strategy, that is, the positive sample and the negative sample generated in the tracking process can be further used for learning of the target tracking apparatus 900 synchronously, so as to continuously update the parameters of the target tracking apparatus 900, so that the target tracking apparatus 900 can perform adaptive adjustment along with the change of time, thereby improving the tracking accuracy.

In addition, the target tracking apparatus 900 may further include a trajectory generation module (not shown), for example, the trajectory generation module may generate a motion trajectory of the target based on the prediction result in each frame. For example, each frame of images may be stitched together to form a panorama, and then the determined predicted target areas may be connected in time series to form a trajectory representing a movement trajectory or a movement route of the target in the environment. Alternatively, it is possible to record geographical position information (for example, using GPS) of taking each frame image, then map the prediction target region onto a corresponding map position according to the prediction result, and indicate a movement route obtained by tracking the target by connecting chronological geographical position information into a trajectory.

Optionally, the target tracking apparatus 900 may also include a communication module to communicate with a server or other device, either wired or wirelessly. Additionally, the target tracking device 900 may also include an input/output module to, for example, receive input from a user. Additionally, for example, the target tracking device 900 may also include additional display modules to display the final generated predicted position or motion trajectory to a user, for example.

The embodiment of the disclosure also provides target tracking equipment based on the neural network. Fig. 10 illustrates a schematic diagram of a target tracking device 1000 in accordance with an embodiment of the principles of the present disclosure.

As shown in fig. 10, the object tracking device 1000 according to the present embodiment includes a processor 1010, a storage section 1020, a communication section 1030, an input/output section 1040, and a display section 1050, which are coupled to an I/O interface 1060.

The processor 1010 is a program control device such as a microprocessor, for example, which operates according to a program installed in the object tracking device 1000. The storage portion 1020 is, for example, a storage element such as a ROM or a RAM. A program to be executed by the processor 1010 or the like is stored in the storage section 1020. The communication section 1030 is, for example, a communication interface such as a wireless LAN module. The input/output portion 1040 is, for example, an input/output port such as an HDMI (registered trademark) (high definition multimedia interface) port, a USB (universal serial bus) port, or an AUX (auxiliary) port. The display portion 1050 is, for example, a display such as a liquid crystal display or an organic EL (electroluminescence) display.

The object tracking device 1000 shown in fig. 10 may be used to implement the object tracking method disclosed herein. For example, the target tracking method according to embodiments of the present disclosure may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program comprising program code for performing the above-described object tracking method. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 1030 or installed from the storage section 1020. When executed by the object tracking device 1000, the computer program may perform the functions defined in the object tracking method provided by the embodiments of the present disclosure. The target tracking method has been described in detail above with reference to the drawings, and is not described herein again.

Embodiments of the present disclosure also provide a non-transitory computer-readable storage medium. Fig. 11 shows a schematic diagram of a computer-readable storage medium 1100 according to an embodiment of the disclosed principles. The computer-readable storage medium 1100 has stored thereon computer program instructions 1101, wherein the computer program instructions 1101 are executed by a processor to perform a target tracking method provided by an embodiment of the present disclosure.

In the above description, the present invention has been described based on the embodiments. The present embodiment is merely illustrative, and those skilled in the art will understand that the combination of the constituent elements and processes of the present embodiment can be modified in various ways, and such modifications are also within the scope of the present invention.

Claims

1. A target tracking method based on a neural network comprises the following steps:

acquiring a current image frame;

determining an initial target region in the current image frame;

determining a sample type to which the initial target region belongs, wherein the sample type comprises a positive sample type and a plurality of negative sample types;

generating a plurality of candidate target regions in the current image frame based on a sample type to which the initial target region belongs; and

determining a predicted target region from the plurality of candidate target regions.

2. The method of claim 1, wherein determining an initial target region in the current image frame comprises:

determining a region in the current image frame having the same position as the prediction target region determined in the previous image frame as the initial target region in the current image frame.

3. The method of claim 1, wherein determining a sample type to which the initial target region belongs comprises:

and determining the sample type of the initial target area according to the ratio of the target to the background in the initial target area.

4. The method of claim 3, wherein,

determining that the initial target region belongs to a first negative example type if the initial target region does not include the target and includes a portion of the background; or

Determining that the initial target region belongs to a second negative example type if the initial target region includes a portion of the target and includes a portion of the background; or

Determining that the initial target region belongs to a third negative example type if the initial target region includes all of the target and includes a portion of the background; or

Determining that the initial target region belongs to a fourth negative example type if the initial target region includes a portion of the target and does not include the background.

5. The method of claim 1, wherein determining a sample type to which the initial target region belongs comprises:

determining a sample type to which the initial target region belongs according to a probability that the initial target region belongs to the positive sample type.

6. The method of claim 5, wherein,

determining that the initial target region belongs to a first negative sample type if the probability that the initial target region belongs to the positive sample type is greater than or equal to 0 and less than 0.2; or

Determining that the initial target region belongs to a second negative sample type if the probability that the initial target region belongs to the positive sample type is greater than or equal to 0.2 and less than 0.4; or

Determining that the initial target region belongs to a third negative sample type if the probability that the initial target region belongs to the positive sample type is greater than or equal to 0.4 and less than 0.6; or

Determining that the initial target region belongs to a fourth negative sample type if the probability that the initial target region belongs to the positive sample type is greater than or equal to 0.6 and less than 0.8.

7. The method of any one of claims 1 to 6,

determining that the initial target region belongs to a positive sample type if an overlap of the initial target region and a true target region is greater than or equal to a first threshold.

8. The method of claim 1, wherein generating a plurality of candidate target regions in the current image frame based on a sample type to which the initial target region belongs comprises:

randomly generating a plurality of initial candidate target regions around the initial target region of the current frame;

and carrying out displacement or scaling transformation on the plurality of initial candidate target regions based on the sample type of the initial target region to obtain the plurality of candidate target regions.

9. The method of claim 8, wherein performing a shift or scale transformation on the plurality of initial candidate target regions based on the sample type to which the initial target region belongs to obtain the plurality of candidate target regions comprises:

and under the condition that the initial target region is determined to belong to the positive sample type, directly determining the initial candidate target regions as the candidate target regions without performing displacement or scaling transformation on the initial candidate target regions.

10. The method according to claim 8 or 9, wherein performing a shift or scale transformation on the plurality of initial candidate target regions based on the sample type to which the initial target region belongs to obtain the plurality of candidate target regions comprises:

under the condition that the initial target region is determined to belong to a first negative sample type, performing displacement or scaling transformation on the plurality of initial candidate target regions by using a first displacement factor and a first scaling factor, and determining the plurality of transformed initial candidate target regions as the plurality of candidate target regions; or

Under the condition that the initial target region is determined to belong to a second negative sample type, performing displacement or scaling transformation on the plurality of initial candidate target regions by using a second displacement factor and a second scaling factor, and determining the plurality of transformed initial candidate target regions as the plurality of candidate target regions; or

Under the condition that the initial target region is determined to belong to a third negative sample type, performing displacement or scaling transformation on the plurality of initial candidate target regions by using a third displacement factor and a third scaling factor, and determining the plurality of transformed initial candidate target regions as the plurality of candidate target regions; or

And under the condition that the initial target region is determined to belong to a fourth negative sample type, performing displacement or scaling transformation on the plurality of initial candidate target regions by using a fourth displacement factor and a fourth scaling factor, and determining the plurality of transformed initial candidate target regions as the plurality of candidate target regions.

11. The method of any of claims 1 to 10, wherein determining a predicted target region from the plurality of candidate target regions comprises:

calculating probabilities that the plurality of candidate target regions belong to the positive sample type; and

and determining the candidate target region with the highest probability as the prediction target region.

12. The method of any of claims 1 to 11, further comprising:

the neural network is trained in advance and,

wherein pre-training the neural network comprises:

obtaining an initial sample library based on random sampling;

and inputting the samples in the initial sample library into the neural network to obtain the classifiers of the positive sample type and the negative sample types.

13. The method of any of claims 1 to 12, further comprising:

training the neural network using the plurality of candidate target region updates.

14. A neural network-based target tracking apparatus, comprising:

an image acquisition module configured to acquire a current image frame;

an initial target region determination module configured to determine an initial target region in the current image frame;

a sample type determination module configured to determine a sample type to which the initial target region belongs, wherein the sample type includes a positive sample type and a plurality of negative sample types;

a candidate target region generation module configured to generate a plurality of candidate target regions in the current image frame based on a sample type to which the initial target region belongs; and

a prediction target region determination module configured to determine a prediction target region from the plurality of candidate target regions.

15. The apparatus of claim 14, wherein determining an initial target region in the current image frame comprises:

16. The apparatus of claim 14, wherein determining a sample type to which the initial target region belongs comprises:

17. The apparatus of claim 14, wherein determining a sample type to which the initial target region belongs comprises:

18. The apparatus of claim 14, wherein determining a predicted target region from the plurality of candidate target regions comprises:

and determining the candidate target area with the highest probability as the prediction target area.

19. A neural network-based target tracking device, comprising:

a processor;

a memory storing one or more computer program modules;

wherein the one or more computer program modules are configured to, when executed by the processor, perform the object tracking method of any one of claims 1-13.

20. A non-transitory computer readable storage medium having stored thereon computer instructions, wherein the computer instructions, when executed by a processor, perform the object tracking method of any one of claims 1-13.