CN111199189A

CN111199189A - Target object tracking method and system, electronic equipment and storage medium

Info

Publication number: CN111199189A
Application number: CN201911314566.0A
Authority: CN
Inventors: 谷宇章; 邱守猛; 袁泽强; 阮有志; 杨洪业; 张晓林
Original assignee: Shanghai Institute of Microsystem and Information Technology of CAS
Current assignee: Shanghai Institute of Microsystem and Information Technology of CAS
Priority date: 2019-12-18
Filing date: 2019-12-18
Publication date: 2020-05-26

Abstract

The method comprises the steps of determining a plurality of search areas of a sample object from a comparison picture according to a first position area of the sample object in a sample picture and a plurality of preset scale values, determining a first feature set of the sample object from the first position area, determining a second feature set of the sample object in each search area from the plurality of search areas, determining a plurality of matching value sets according to the first feature set and the plurality of second feature sets, determining a plurality of loss values according to difference values of the plurality of matching value sets and the preset matching values to adjust parameters of a training tracking model, obtaining the trained tracking model, and improving the accuracy and robustness of the tracking target object based on the trained tracking model.

Description

Target object tracking method and system, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of computer vision, in particular to a target object tracking method, a target object tracking system, electronic equipment and a storage medium.

Background

The target object tracking technology is a research hotspot in the field of computer vision, is widely applied to the civil and commercial fields, and particularly is used for automatically analyzing video contents in intelligent video monitoring, automatic driving, unmanned supermarkets and human-computer interaction, so that a large amount of manpower and material resources can be saved, and huge economic benefits can be brought.

The traditional target object tracking technology performs target object tracking by training a classifier based on color, texture or other characteristic information in different frames of a video, however, in continuous frames of the video, due to short interval time and small change of the characteristic information, the accuracy and robustness of a determined position region of a target object are low, and the increasing accuracy requirements of people cannot be met. With the rise of deep learning algorithms in recent years, more and more target object tracking technologies introduce the deep learning algorithms to improve the accuracy and robustness of target object tracking.

Disclosure of Invention

The embodiment of the application provides a target object tracking method, which comprises the following steps:

acquiring a reference picture and a picture to be tracked;

determining an identification object in a reference picture;

tracking the identification object in the reference picture and the picture to be tracked based on the trained tracking model to obtain the position of the identification object in the picture to be tracked;

the training step of the tracking model comprises the following steps:

acquiring a sample picture and a comparison picture; the sample picture contains a sample object, and at least part of the area of the contrast picture contains the sample object;

determining a first position area of a sample object in a sample picture;

determining a plurality of search areas of the sample object from the comparison picture according to the first position area and a plurality of preset scale values; the plurality of preset scale values comprise at least two scale values;

constructing a preset machine learning model, and determining the preset machine learning model as a current machine learning model;

determining a first set of features of the sample object from the first location region, a second set of features of the sample object in each search region from the plurality of search regions, based on the current machine learning model;

determining a plurality of sets of matching values from the first set of features and the plurality of second sets of features; each set of matching values in the plurality of sets of matching values and each second set of features in the plurality of second sets of features are in one-to-one correspondence;

determining a plurality of loss values according to the plurality of matching value sets and a preset matching value;

when the loss values are larger than a preset threshold value, performing back propagation based on the loss values, updating the current machine learning model to obtain an updated machine learning model, and re-determining the updated machine learning model as the current machine learning model; repeating the steps: determining a first set of features of the sample object from the first location region, a second set of features of the sample object in each search region from the plurality of search regions, based on the current machine learning model;

and when the target loss value in the loss values is smaller than or equal to the preset threshold value, determining the current machine learning model as a tracking model, and determining model parameters corresponding to the target loss value as parameters of the tracking model.

Further, the plurality of preset scale values comprise a first scale value and a second scale value;

determining a plurality of search regions of the sample object from the comparison picture according to the first position region and a plurality of preset scale values, comprising:

determining a first search area of the sample object from the contrast picture according to the first position area and the first scale value; the first search area comprises a first location area;

determining a second search area of the sample object from the contrast picture according to the first position area and the second scale value; the second search area includes a first location area.

Further, determining a plurality of sets of matching values from the first set of features and the plurality of second sets of features includes:

and performing convolution operation on the first feature set and each second feature set in the plurality of second feature sets to obtain a plurality of matching value sets.

Further, determining a plurality of loss values according to the plurality of matching value sets and a preset matching value includes:

determining a largest matching value of each set of matching values of the plurality of sets of matching values;

and determining a plurality of loss values according to the difference value between the maximum matching value and the preset matching value.

Correspondingly, the embodiment of the present application further provides a target object tracking system, including:

the first acquisition module is used for acquiring a reference picture and a picture to be tracked;

a first determination module for determining an identification object in a reference picture;

and the tracking module is used for tracking the identification object in the reference picture and the picture to be tracked based on the trained tracking model to obtain the position of the identification object in the picture to be tracked.

Further, the system also includes a training module;

the training module includes:

the second acquisition module is used for acquiring a sample picture and a comparison picture; the sample picture contains a sample object, and at least part of the area of the contrast picture contains the sample object;

a second determining module, configured to determine a first position region of the sample object in the sample picture;

a third determining module, configured to determine a plurality of search regions of the sample object from the comparison picture according to the first position region and a plurality of preset scale values; the plurality of preset scale values comprise at least two scale values;

the building module is used for building a preset machine learning model and determining the preset machine learning model as a current machine learning model;

a fourth determination module for determining a first set of features of the sample object from the first location region based on the current machine learning model, and determining a second set of features of the sample object in each search region from the plurality of search regions;

a fifth determining module for determining a plurality of sets of matching values from the first set of features and the plurality of second sets of features; each set of matching values in the plurality of sets of matching values and each second set of features in the plurality of second sets of features are in one-to-one correspondence;

the sixth determining module is used for determining a plurality of loss values according to the plurality of matching value sets and a preset matching value;

the updating module is used for performing back propagation on the basis of the loss values when the loss values are larger than a preset threshold value, updating the current machine learning model to obtain an updated machine learning model, and determining the updated machine learning model as the current machine learning model again; repeating the steps: determining a first set of features of the sample object from the first location region, a second set of features of the sample object in each search region from the plurality of search regions, based on the current machine learning model;

and the seventh determining module is used for determining the current machine learning model as the tracking model and determining the model parameters corresponding to the target loss value as the parameters of the tracking model when the target loss value is less than or equal to the preset threshold value in the loss values.

Further, the third determining module includes:

a first determination unit, configured to determine a first search region of the sample object from the contrast picture according to the first position region and the first scale value; the first search area comprises a first location area;

a second determination unit, configured to determine a second search region of the sample object from the contrast picture according to the first position region and the second scale value; the second search area includes a first location area.

Further, the fifth determining module includes:

and the convolution unit is used for performing convolution operation on the first feature set and each second feature set in the plurality of second feature sets to obtain a plurality of matching value sets.

Accordingly, an embodiment of the present application further provides an electronic device, which includes a processor and a memory, where the memory stores at least one instruction, at least one program, a code set, or a set of instructions, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by the processor to implement the target object tracking method.

Accordingly, an embodiment of the present application further provides a computer-readable storage medium, in which at least one instruction, at least one program, a code set, or a set of instructions is stored, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by a processor to implement the target object tracking method.

The embodiment of the application has the following beneficial effects:

the embodiment of the application discloses a target object tracking method, a target object tracking system, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a reference picture and a picture to be tracked, determining an identification object in the reference picture, and tracking the identification object in the reference picture and the picture to be tracked based on a trained tracking model to obtain the position of the identification object in the picture to be tracked; the training step of the tracking model comprises the steps of obtaining a sample picture and a contrast picture, wherein the sample picture contains a sample object, at least part of the contrast picture contains the sample object, determining a first position area of the sample object in the sample picture, determining a plurality of search areas of the sample object from the contrast picture according to the first position area and a plurality of preset scale values, the plurality of preset scale values comprise at least two scale values, constructing a preset machine learning model, determining the preset machine learning model as a current machine learning model, determining a first feature set of the sample object from the first position area based on the current machine learning model, determining a second feature set of the sample object in each search area from the plurality of search areas, determining a plurality of matching value sets according to the first feature set and the plurality of second feature sets, and combining each matching value set in the plurality of matching value sets and each second feature set in the plurality of second feature sets into a pair Determining a plurality of loss values according to the plurality of matching value sets and a preset matching value, when the plurality of loss values are greater than a preset threshold value, performing back propagation based on the plurality of loss values, updating the current machine learning model to obtain an updated machine learning model, re-determining the updated machine learning model as the current machine learning model, and repeating the steps: the method comprises the steps of determining a first feature set of a sample object from a first position area based on a current machine learning model, determining a second feature set of the sample object in each search area from a plurality of search areas, determining the current machine learning model as a tracking model when a target loss value is smaller than or equal to a preset threshold value in a plurality of loss values, and determining model parameters corresponding to the target loss value as parameters of the tracking model. Based on the embodiment of the application, a plurality of search regions of a sample object are determined from a comparison picture according to a first position region and a plurality of preset scale values of the sample object in a sample picture, a first feature set of the sample object is determined from the first position region, a second feature set of the sample object in each search region is determined from the plurality of search regions, a plurality of matching value sets are determined according to the first feature set and the plurality of second feature sets, a plurality of loss values are determined according to differences of the plurality of matching value sets and the preset matching values to adjust parameters of a training tracking model, the trained tracking model is obtained, and accuracy and robustness of tracking the target object can be improved based on the trained tracking model.

Drawings

In order to more clearly illustrate the technical solutions and advantages of the embodiments of the present application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a schematic diagram of an application environment provided by an embodiment of the present application;

fig. 2 is a schematic flowchart of a target object tracking method according to an embodiment of the present application;

FIG. 3 is a schematic flow chart diagram illustrating a method for training a tracking model according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a target object tracking system provided in an embodiment of the present application;

fig. 5 is a schematic diagram of a system for training a tracking model according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings. It should be apparent that the described embodiment is only one embodiment of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

An "embodiment" as referred to herein relates to a particular feature, structure, or characteristic that may be included in at least one implementation of the present application. The terms "first", "second", "third", "fourth", "fifth", "sixth" and "seventh" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, features defined as "first," "second," "third," "fourth," "fifth," "sixth," and "seventh" may explicitly or implicitly include one or more of the features. Moreover, the terms "first," "second," "third," "fourth," "fifth," "sixth," and "seventh," etc., are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in other sequences than described or illustrated herein. Furthermore, the terms "comprises," comprising, "and" as well as any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps, units, or modules is not necessarily limited to those steps, units, or modules expressly listed, but may include other steps, units, or modules not expressly listed or inherent to such process, method, system, article, or apparatus.

Please refer to fig. 1, which is a schematic diagram of an application environment according to an embodiment of the present application, including: the tracking method comprises a server 101 and a client 103, wherein the server 101 acquires a reference picture and a picture to be tracked, and determines an identification object in the reference picture, and the server 101 tracks the identification object in the reference picture and the picture to be tracked based on a trained tracking model to obtain the position of the identification object in the picture to be tracked. The target object tracking system in the server 101 acquires a sample picture and a comparison picture, the sample picture contains a sample object, at least a partial region of the comparison picture contains the sample object, a first position region of the sample object in the sample picture is determined, a plurality of search regions of the sample object are determined from the comparison picture according to the first position region and a plurality of preset scale values, the plurality of preset scale values comprise at least two scale values, the system constructs a preset machine learning model, the preset machine learning model is determined as a current machine learning model, a first feature set of the sample object is determined from the first position region based on the current machine learning model, a second feature set of the sample object in each search region is determined from the plurality of search regions, and a plurality of matching value sets are determined according to the first feature set and the plurality of second feature sets, each matching value set in the multiple matching value sets and each second feature set in the multiple second feature sets correspond to each other one by one, the system determines multiple loss values according to the multiple matching value sets and preset matching values, when the multiple loss values are larger than a preset threshold value, back propagation is carried out on the basis of the multiple loss values, the current machine learning model is updated to obtain an updated machine learning model, the updated machine learning model is re-determined to be the current machine learning model, and the steps are repeated: the method comprises the steps of determining a first feature set of a sample object from a first position area based on a current machine learning model, determining a second feature set of the sample object in each search area from a plurality of search areas, determining the current machine learning model as a tracking model when a target loss value is smaller than or equal to a preset threshold value in a plurality of loss values, and determining model parameters corresponding to the target loss value as parameters of the tracking model.

A specific embodiment of a target object tracking method according to the present application is described below, and fig. 2 is a schematic flow chart of a target object tracking method according to the present application, where the present specification provides the method operation steps as shown in the embodiment or the flow chart, but more or less operation steps may be included based on conventional or non-inventive labor. The order of steps recited in the embodiments is only one of many possible orders of execution and does not represent the only order of execution, and in actual execution, the steps may be performed sequentially or in parallel as in the embodiments or methods shown in the figures (e.g., in the context of parallel processors or multi-threaded processing). Specifically, as shown in fig. 2, the method includes:

s201: and acquiring a reference picture and a picture to be tracked.

In the embodiment of the application, the server receives video data sent by the client and acquires a picture to be tracked of a reference picture in the video data. The reference picture and the picture to be tracked can be continuous image frames in the video or discontinuous image frames in the video, wherein the reference picture contains an identification object to be researched, and at least part of the area of the picture to be tracked contains the identification object to be researched, so that the research purpose is achieved: the tracking target object is the identification object tracked in the picture to be tracked.

S203: an identification object in a reference picture is determined.

In the embodiment of the application, the server determines the identification object from the reference picture based on the preset feature information of the identification object, wherein the reference picture may only contain the identification object or may contain a plurality of objects, the plurality of objects include the identification object and a non-identification object, and the feature information of the non-identification object is not matched with the preset feature information.

S205: and tracking the identification object in the reference picture and the picture to be tracked based on the trained tracking model to obtain the position of the identification object in the picture to be tracked.

In the embodiment of the application, the server tracks the identification object and the picture to be tracked in the reference picture based on the trained tracking model to obtain the position of the identification object in the picture with the tracking picture.

In the embodiment of the present application, a specific implementation manner of training a tracking model is introduced, as shown in fig. 3, which is a schematic flow chart of a method for training a tracking model, where the training step of the tracking model shown in the diagram includes:

s301: acquiring a sample picture and a comparison picture; the sample picture contains a sample object, and at least a part of the area of the contrast picture contains the sample object.

In this embodiment, the system obtains a sample picture and a contrast picture, where the sample picture contains a sample object, and at least a partial region of the contrast picture contains the sample object in the sample picture, where at least a partial region of the contrast picture contains the sample object, which may be an entire region of the contrast picture containing the sample object, or a partial region of the contrast picture containing the sample object, and a position of the sample object in the sample picture and a position of the sample object in the contrast picture may be relatively consistent or completely inconsistent, that is, the sample object in the contrast picture may perform a rotational motion or a curvilinear motion with respect to the sample object in the sample picture.

S303: a first location region of the sample object in the sample picture is determined.

In the embodiment of the application, the system determines the first position area of the sample object from the sample picture based on the preset characteristic information of the sample object.

S305: determining a plurality of search areas of the sample object from the comparison picture according to the first position area and a plurality of preset scale values; the plurality of preset scale values includes at least two scale values.

In the embodiment of the application, because the sample picture and the comparison picture are different image frames in the same video, and at least part of the area in the comparison picture contains the sample object, the system determines a plurality of search areas of the sample object from the comparison picture according to the first position area where the sample object is located in the sample picture and a plurality of preset scale values, wherein the scale values in the plurality of preset scale values correspond to the search areas in the plurality of search areas one to one. And the plurality of preset scale values comprises at least two scale values. The second features in the second feature set of the sample object in the search area determined according to the relatively small scale value in the plurality of preset scale values are finer, and the number of the second features in the second feature set of the sample object in the search area determined according to the relatively large scale value in the plurality of preset scale values is larger. It should be noted that the size value corresponding to each of the plurality of search regions determined according to the scale value of the plurality of preset scale values is larger than the corresponding size value of the first position region. For example, the system determines, according to a first position region of the sample object in the sample picture, a relative position region of the first position region in the comparison picture as a search region, wherein a size value corresponding to the first position region is 127 × 127, and size values corresponding to a plurality of search regions determined based on the plurality of preset scale values are 199 × 199 and 255 × 255.

In an optional embodiment of determining the plurality of search regions of the sample object from the comparison picture according to the first position region and the plurality of preset scale values, the plurality of preset scale values comprises a first scale value and a second scale value. The system determines a first search region of the sample object from the contrast picture based on the first location region and the first scale value, and the system determines a second search region of the sample object from the contrast picture based on the first location region and the second scale value, wherein the first search region includes the first location region and the second search region also includes the first location region.

In the embodiment of the present application, the determining the plurality of search regions of the sample object from the contrast image according to the first position region, the first scale value and the second scale value includes, but is not limited to:

the method comprises the steps of determining a search area of a sample object from a contrast picture based on a repetition sequence of a first scale value and a second scale value, for example, determining a search area of the sample object according to a first position area and the first scale value, determining a search area of the sample object according to the first position area and the second scale value, determining a search area of the sample object according to the first position area and the first scale value, determining a search area of the sample object according to the first position area and the second scale value, and so on to obtain a plurality of search areas.

Determining a search area of the sample object from the contrast picture based on the random sequence of the first scale value and the second scale value, for example, determining a search area of the sample object according to the first position area and the first scale value, and determining a search area of the sample object according to the first position area and the second scale value to obtain a plurality of search areas. For example, a search area of the sample object is determined according to the first position area and the first scale value, a search area of the sample object is determined according to the first position area and the second scale value, and a search area of the sample object is determined according to the first position area and the second scale value, so that a plurality of search areas are obtained.

S307: and constructing a preset machine learning model, and determining the preset machine learning model as the current machine learning model.

In the embodiment of the application, the system adopts an AexNet network to construct a preset machine learning model, and the overall architecture of the network is SimFC.

S309: based on the current machine learning model, a first set of features of the sample object is determined from the first location region, and a second set of features of the sample object in each search region is determined from the plurality of search regions.

In the embodiment of the application, feature extraction is performed on the first position area based on the current machine learning model to obtain a first feature set of the sample object in the first position area, and feature extraction is performed on the plurality of search areas based on the current machine learning model to obtain a second feature set of the sample object in each search area of the plurality of search areas. Each search area of the plurality of search areas comprises a search subarea set, the second characteristic set comprises a second characteristic subset, and the search subareas in the search subarea set are in one-to-one correspondence with the second characteristic subset.

S311: determining a plurality of sets of matching values from the first set of features and the plurality of second sets of features; each set of matching values of the plurality of sets of matching values has a one-to-one correspondence with each set of second features of the plurality of sets of second features.

In an embodiment of the application, the system determines a plurality of sets of matching values from the first set of features and the plurality of second sets of features. In an optional embodiment of determining the plurality of sets of matching values, the system performs a convolution operation on the first feature set and each of the plurality of second feature sets to obtain the plurality of sets of matching values.

S313: and determining a plurality of loss values according to the plurality of matching value sets and the preset matching value.

In the embodiment of the application, the system performs convolution operation according to the first feature set and each second feature set to obtain the matching value of the first feature set and each second feature subset in each second feature set, determines the maximum matching value in each matching value set of a plurality of matching value sets, and determines a plurality of loss values according to the difference value between the maximum matching value and a preset matching value by using a cross entropy loss function.

In an alternative embodiment, the training data set for the system is GOT10k, batchsize is 8, and 30 complete epochs are iterated, and the training is repeated based on two preset scale values, and the scale value is changed every 20 epochs.

S315: when the loss values are larger than a preset threshold value, performing back propagation based on the loss values, updating the current machine learning model to obtain an updated machine learning model, and re-determining the updated machine learning model as the current machine learning model; repeating the steps: based on the current machine learning model, a first set of features of the sample object is determined from the first location region, and a second set of features of the sample object in each search region is determined from the plurality of search regions.

S316: and when the target loss value in the loss values is smaller than or equal to the preset threshold value, determining the current machine learning model as a tracking model, and determining model parameters corresponding to the target loss value as parameters of the tracking model.

The tracking model after training is obtained based on the method for training the tracking model, and the tracking accuracy before training and the tracking accuracy after training are verified on the data sets of the OTB50, the OTB100 and the VOT2018 respectively based on the tracking model before training and the tracking model after training as shown in table 1 below:

TABLE 1

	OTB50	OTB100	VOT2018
				Before training	0.781	0.765	0.502
After training	0.857	0.803	0.516

By adopting the target object tracking method provided by the embodiment of the application, a plurality of search areas of a sample object are determined from a comparison picture according to a first position area and a plurality of preset scale values of the sample object in a sample picture, a first feature set of the sample object is determined from the first position area, a second feature set of the sample object in each search area is determined from the plurality of search areas, a plurality of matching value sets are determined according to the first feature set and the plurality of second feature sets, a plurality of loss values are determined according to the difference value between the plurality of matching value sets and the preset matching value to adjust the parameters of a training tracking model, the trained tracking model is obtained, and the accuracy and the robustness of the tracking target object can be improved based on the trained tracking model.

Fig. 4 is a schematic structural diagram of a target object tracking system provided in an embodiment of the present application, and as shown in fig. 4, the system includes:

the first obtaining module 401 is configured to obtain a reference picture and a picture to be tracked;

the first determining module 403 is used for determining an identification object in a reference picture;

the tracking module 405 is configured to track the identification object in the reference picture and the picture to be tracked based on the trained tracking model, so as to obtain a position of the identification object in the picture to be tracked.

In an embodiment of the present application, the system further includes a training module, as shown in fig. 5, which is a schematic diagram of a system for training a tracking model, and the system includes:

the second obtaining module 501 is configured to obtain a sample picture and a comparison picture; the sample picture contains a sample object, and at least part of the area of the contrast picture contains the sample object;

the second determining module 503 is configured to determine a first position region of the sample object in the sample picture;

the third determining module 505 is configured to determine a plurality of search regions of the sample object from the comparison picture according to the first position region and a plurality of preset scale values; the plurality of preset scale values comprise at least two scale values;

the building module 507 is configured to build a preset machine learning model, and determine the preset machine learning model as a current machine learning model;

the fourth determining module 509 is configured to determine a first feature set of the sample object from the first location region and a second feature set of the sample object in each of the plurality of search regions based on the current machine learning model;

the fifth determining module 511 is configured to determine a plurality of sets of matching values according to the first feature set and the plurality of second feature sets; each set of matching values in the plurality of sets of matching values and each second set of features in the plurality of second sets of features are in one-to-one correspondence;

the sixth determining module 513 is configured to determine a plurality of loss values according to the plurality of matching value sets and a preset matching value;

the updating module 515 is configured to, when the plurality of loss values are greater than the preset threshold, perform back propagation based on the plurality of loss values, update the current machine learning model to obtain an updated machine learning model, and determine the updated machine learning model as the current machine learning model again; repeating the steps: determining a first set of features of the sample object from the first location region, a second set of features of the sample object in each search region from the plurality of search regions, based on the current machine learning model;

the seventh determining module 516 is configured to determine the current machine learning model as the tracking model and determine the model parameters corresponding to the target loss value as the parameters of the tracking model when the target loss value is smaller than or equal to the preset threshold value among the plurality of loss values.

In this embodiment of the present application, the third determining module 505 includes:

In this embodiment of the application, the fifth determining module 511 includes:

The system and method embodiments in the embodiments of the present application are based on the same application concept.

The present application further provides an electronic device, which may be disposed in a server to store at least one instruction, at least one program, a code set, or a set of instructions related to implementing a target object tracking method in the method embodiment, where the at least one instruction, the at least one program, the code set, or the set of instructions is loaded from the memory and executed to implement the target object tracking method.

The present application further provides a storage medium, which may be disposed in a server to store at least one instruction, at least one program, a code set, or a set of instructions related to implementing a target object tracking method in the method embodiments, where the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by the processor to implement the target object tracking method.

Optionally, in this embodiment, the storage medium may be located in at least one network server of a plurality of network servers of a computer network. Optionally, in this embodiment, the storage medium may include, but is not limited to, a storage medium including: various media that can store program codes, such as a usb disk, a Read-only Memory (ROM), a removable hard disk, a magnetic disk, or an optical disk.

As can be seen from the above embodiments of the target object tracking method, system, electronic device, or storage medium provided in the present application, the method in the present application includes: acquiring a reference picture and a picture to be tracked, determining an identification object in the reference picture, and tracking the identification object in the reference picture and the picture to be tracked based on a trained tracking model to obtain the position of the identification object in the picture to be tracked; the training step of the tracking model comprises the steps of obtaining a sample picture and a contrast picture, wherein the sample picture contains a sample object, at least part of the contrast picture contains the sample object, determining a first position area of the sample object in the sample picture, determining a plurality of search areas of the sample object from the contrast picture according to the first position area and a plurality of preset scale values, the plurality of preset scale values comprise at least two scale values, constructing a preset machine learning model, determining the preset machine learning model as a current machine learning model, determining a first feature set of the sample object from the first position area based on the current machine learning model, determining a second feature set of the sample object in each search area from the plurality of search areas, determining a plurality of matching value sets according to the first feature set and the plurality of second feature sets, and combining each matching value set in the plurality of matching value sets and each second feature set in the plurality of second feature sets into a pair Determining a plurality of loss values according to the plurality of matching value sets and a preset matching value, when the plurality of loss values are greater than a preset threshold value, performing back propagation based on the plurality of loss values, updating the current machine learning model to obtain an updated machine learning model, re-determining the updated machine learning model as the current machine learning model, and repeating the steps: the method comprises the steps of determining a first feature set of a sample object from a first position area based on a current machine learning model, determining a second feature set of the sample object in each search area from a plurality of search areas, determining the current machine learning model as a tracking model when a target loss value is smaller than or equal to a preset threshold value in a plurality of loss values, and determining model parameters corresponding to the target loss value as parameters of the tracking model. Based on the embodiment of the application, a plurality of search regions of a sample object are determined from a comparison picture according to a first position region and a plurality of preset scale values of the sample object in a sample picture, a first feature set of the sample object is determined from the first position region, a second feature set of the sample object in each search region is determined from the plurality of search regions, a plurality of matching value sets are determined according to the first feature set and the plurality of second feature sets, a plurality of loss values are determined according to differences of the plurality of matching value sets and the preset matching values to adjust parameters of a training tracking model, the trained tracking model is obtained, and accuracy and robustness of tracking the target object can be improved based on the trained tracking model.

It should be noted that: the foregoing sequence of the embodiments of the present application is for description only and does not represent the superiority and inferiority of the embodiments, and the specific embodiments are described in the specification, and other embodiments are also within the scope of the appended claims. In some cases, the actions or steps recited in the claims can be performed in the order of execution in different embodiments and achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown or connected to enable the desired results to be achieved, and in some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

All the embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment is described with emphasis on differences from other embodiments. In particular, for the embodiment of the system, since it is based on the embodiment similar to the method, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims

1. A target object tracking method, comprising:

acquiring a reference picture and a picture to be tracked;

determining an identification object in the reference picture;

the training step of the tracking model comprises the following steps:

determining a first position region of the sample object in the sample picture;

determining a plurality of search areas of the sample object from the comparison picture according to the first position area and a plurality of preset scale values; the plurality of preset scale values comprises at least two scale values;

determining a plurality of sets of matching values from the first set of features and a plurality of the second sets of features; each set of matching values of the plurality of sets of matching values and each set of second features of the plurality of sets of second features are in one-to-one correspondence;

when the loss values are larger than the preset threshold value, performing back propagation based on the loss values, updating the current machine learning model to obtain an updated machine learning model, and determining the updated machine learning model as the current machine learning model again; repeating the steps: determining a first set of features of the sample object from the first location region, a second set of features of the sample object in each search region from the plurality of search regions, based on the current machine learning model;

and when a target loss value in the loss values is smaller than or equal to the preset threshold value, determining the current machine learning model as the tracking model, and determining model parameters corresponding to the target loss value as parameters of the tracking model.

2. The method of claim 1, wherein the plurality of preset scale values comprises a first scale value and a second scale value;

the determining a plurality of search regions of the sample object from the comparison picture according to the first position region and a plurality of preset scale values comprises:

determining a first search region of the sample object from the comparison picture according to the first position region and the first scale value; the first search area comprises the first location area;

and determining a second search region of the sample object from the comparison picture according to the first position region and the second scale value; the second search area includes the first location area.

3. The method of claim 1, wherein determining a plurality of sets of matching values from the first set of features and a plurality of the second sets of features comprises:

and performing convolution operation on the first feature set and each second feature set in the plurality of second feature sets to obtain the plurality of matching value sets.

4. The method of claim 1, wherein determining a plurality of loss values from the plurality of sets of match values and a preset match value comprises:

and determining the loss values according to the difference value between the maximum matching value and the preset matching value.

5. A target object tracking system, comprising:

a first determination module to determine an identification object in the reference picture;

6. The system of claim 5, further comprising: a training module;

the training module comprises:

a third determining module, configured to determine a plurality of search regions of the sample object from the comparison picture according to the first position region and a plurality of preset scale values; the plurality of preset scale values comprises at least two scale values;

a fourth determination module to determine a first set of features of the sample object from the first location region and a second set of features of the sample object in each search region from the plurality of search regions based on the current machine learning model;

a fifth determining module for determining a plurality of sets of matching values from the first set of features and a plurality of the second sets of features; each set of matching values of the plurality of sets of matching values and each set of second features of the plurality of sets of second features are in one-to-one correspondence;

a sixth determining module, configured to determine a plurality of loss values according to the plurality of matching value sets and a preset matching value;

an updating module, configured to perform back propagation based on the plurality of loss values when the plurality of loss values are greater than the preset threshold, update the current machine learning model to obtain an updated machine learning model, and determine the updated machine learning model as the current machine learning model again; repeating the steps: determining a first set of features of the sample object from the first location region, a second set of features of the sample object in each search region from the plurality of search regions, based on the current machine learning model;

a seventh determining module, configured to determine the current machine learning model as the tracking model and determine a model parameter corresponding to the target loss value as a parameter of the tracking model when a target loss value exists in the loss values and is less than or equal to the preset threshold.

7. The system of claim 5, wherein the third determination module comprises:

a first determining unit, configured to determine a first search region of the sample object from the comparison picture according to the first position region and the first scale value; the first search area comprises the first location area;

a second determining unit, configured to determine a second search region of the sample object from the comparison picture according to the first position region and the second scale value; the second search area includes the first location area.

8. The system of claim 5, wherein the fifth determination module comprises:

a convolution unit, configured to perform a convolution operation on the first feature set and each of the plurality of second feature sets to obtain the plurality of sets of matching values.

9. An electronic device, comprising a processor and a memory, wherein at least one instruction, at least one program, a set of codes, or a set of instructions is stored in the memory, and wherein the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement the target object tracking method of any one of claims 1-4.

10. A computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement the target object tracking method according to any one of claims 1 to 4.