CN112906580B

CN112906580B - Target tracking method and related device

Info

Publication number: CN112906580B
Application number: CN202110204499.8A
Authority: CN
Inventors: 陈亚松; 杨坤兴; 黄鹏; 吴忠人; 潘武
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2021-02-23
Filing date: 2021-02-23
Publication date: 2023-04-07
Anticipated expiration: 2041-02-23
Also published as: CN112906580A

Abstract

The application discloses a target tracking method and a related device, wherein the target tracking method comprises the following steps: extracting the features of the target image and the current frame image to respectively obtain a corresponding target feature map and a corresponding current frame feature map; based on the target feature map and the current frame feature map, using a high-level semantic feature recombination module to obtain a target recombination feature map and a current frame recombination feature map; based on the target reorganization feature map and the current frame reorganization feature map, completing feature interaction by using a size angle self-adaptive module to obtain a current frame output feature map after feature interaction; and outputting a characteristic diagram based on the current frame to obtain a target tracking result. Through the mode, the accuracy of the target tracking result can be improved.

Description

Target tracking method and related device

Technical Field

The application belongs to the technical field of target tracking, and particularly relates to a target tracking method and a related device.

Background

Video Object Tracking, also known as single Object Tracking, solves the task of Tracking to a specified Object position or segmentation result mask in each sequence of Video images. The key point of the technology is to extract given target characteristics and candidate target characteristics in each frame of video image sequence, establish the relation between the given target characteristics and the candidate target characteristics, and screen the target object with the best matching, namely the target obtained by tracking. In the field of computer vision, it is a challenging research task. The position of the target object in the video image sequence is determined through the technology, so that the analysis and understanding of the behavior of the target object in the video can be further realized, and the analysis and the processing of higher-level video content can be facilitated.

At present, some target tracking methods need to use the previous frame of image, which is equivalent to defining a precondition: an object in the next frame image may appear near the position of the object in the previous frame image. If the target position in the next frame image does not meet this premise, then the input image of the branch contains no target, and no useful features will be extracted, possibly resulting in tracking failure. In addition, because the existing target tracking result is mainly obtained based on the correlation relationship established by depth-wise convolution of the high-level semantic features, the noise of the high-level semantic features may influence the tracking result.

Disclosure of Invention

The application provides a target tracking method and a related device, which are used for improving the accuracy of a target tracking result.

In order to solve the technical problem, the application adopts a technical scheme that: provided is a target tracking method, including: extracting the features of the target image and the current frame image to respectively obtain a corresponding target feature map and a corresponding current frame feature map; based on the target feature map and the current frame feature map, using a high-level semantic feature reorganization module to obtain a target reorganization feature map and a current frame reorganization feature map; based on the target reorganization feature map and the current frame reorganization feature map, completing feature interaction by using a size angle self-adaptive module to obtain a current frame output feature map after feature interaction; and outputting a characteristic diagram based on the current frame to obtain a target tracking result.

Wherein, the step of using a high-level semantic feature reorganization module to obtain a target reorganization feature map and a current frame reorganization feature map based on the target feature map and the current frame feature map comprises: performing high-level semantic feature recombination on the target feature graph to obtain a corresponding target recombination feature graph; and performing high-level semantic feature recombination on the current frame feature map based on the target recombination feature map to obtain a current frame recombination feature map.

Wherein, the step of performing high-level semantic feature reorganization on the target feature map to obtain a corresponding target reorganization feature map comprises: obtaining a first matrix and a second matrix according to the target feature map, wherein feature points in the same channel in the target feature map are arranged in the same column in the first matrix line by line, and feature points in the same channel in the target feature map are arranged in the same row in the second matrix line by line; normalizing the first matrix by rows and normalizing the second matrix by columns; obtaining a first cosine similarity matrix according to the first matrix and the second matrix after normalization processing; normalizing the first cosine similarity matrix by rows to obtain a first recombination weight matrix; and obtaining a target recombination characteristic diagram according to the first recombination weight matrix and the first matrix.

Wherein, the step of performing high-level semantic feature reorganization on the current frame feature map based on the target reorganization feature map to obtain the current frame reorganization feature map comprises the following steps: obtaining a third matrix and a fourth matrix according to the target reorganization feature map, and obtaining a fifth matrix according to the current frame feature map, wherein feature points in the same channel in the target reorganization feature map are arranged in the same column of the third matrix row by row, feature points in the same channel in the target reorganization feature map are arranged in the same row of the fourth matrix row by row, and feature points in the same channel in the current frame feature image are arranged in the same column of the fifth matrix row by row; normalizing the fifth matrix by rows and normalizing the fourth matrix by columns; obtaining a second cosine similarity matrix according to the fifth matrix and the fourth matrix after normalization processing; normalizing the second cosine similarity matrix by rows to obtain a second recombination weight matrix; and obtaining a current frame recombination characteristic map according to the second recombination weight matrix and the third matrix.

Wherein the performing normalization processing comprises: by means of L ₂ And (5) processing by a norm normalization method.

Wherein, the step of completing feature interaction by using a size angle self-adaptive module based on the target reorganization feature map and the current frame reorganization feature map to obtain a current frame output feature map after feature interaction comprises: changing the target recombination characteristic map at different angles and/or different sizes; and performing depth-wise convolution on the current frame recombination characteristic diagram by using the changed target recombination characteristic diagram to obtain a current frame output characteristic diagram.

Wherein the step of changing the target recombination signature at different angles and different sizes comprises: respectively performing convolution processing on the target reorganization characteristic graph by using different rotation matrixes to obtain corresponding target rotation characteristic graphs, wherein the different rotation matrixes correspond to different rotation angles; reducing the width and the height of the target rotation characteristic graph by utilizing a bilinear difference downsampling mode; normalizing each channel of each target rotation characteristic diagram; and performing different expansion processing on each target rotation characteristic diagram after the normalization processing, wherein the different expansion processing corresponds to different expansion rates.

Wherein, the step of performing depth-wise convolution on the current frame reorganization feature map by using the changed target reorganization feature map to obtain a current frame output feature map comprises the following steps: regarding each target rotation feature map subjected to expansion processing, taking the feature of each channel as a convolution kernel, and performing depth-wise convolution processing on the current frame recombination feature map by using the convolution kernel to obtain a corresponding current frame processing sub-result; and taking the maximum value in all the current frame processing sub-results under each channel as the output result of the current channel, wherein the output results of all the channels form a current frame output characteristic diagram.

Wherein the rotation angles include-60 °, -30 °, 0 °, 30 °, 60 °; and/or the expansion ratio is a positive integer, and the expansion ratio comprises 1, 2 and 3.

In order to solve the above technical problem, another technical solution adopted by the present application is: provided is a target tracking device including: a processor and a memory, wherein the processor is coupled to the memory for implementing the target tracking method in any of the above embodiments.

In order to solve the above technical problem, another technical solution adopted by the present application is: there is provided an apparatus having a storage function, on which program data is stored, the program data being executable by a processor to implement the object tracking method described in any of the above embodiments.

Being different from the prior art situation, the beneficial effect of this application is: according to the target tracking method, target tracking is carried out based on the target image and the current frame image, iteration of continuous frame images is not needed, and propagation and accumulation of errors are reduced; the target position is not limited in the present application, and the position where the target appears in the continuous frames may be arbitrary, that is, the target tracking method provided in the present application has a wider applicability. In addition, in the feature interaction process, the target feature map and the current frame feature map are subjected to noise reduction (namely high-level semantic feature recombination) first, so that the correlation between the target image and the target in the current frame image is established more accurately, and the accuracy of the target tracking result is improved. In addition, a self-adaptive module of partial angle and/or scale is introduced, so that the tracking effect of the moving target with rotation and scale change is improved; the method is based on an end-to-end twin network model, and the participation of empirical parameters is reduced in the model reasoning process; when the result of the tracking target is finally obtained, the positioning and the segmentation of the tracking target can be simultaneously completed, and partial branch results can be used according to different granularity of specific tracking task requirements.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without inventive efforts, wherein:

FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a target tracking method according to the present application;

FIG. 2 is a schematic diagram of a frame of an embodiment corresponding to the target tracking method in FIG. 1;

FIG. 3 is a flowchart illustrating an embodiment corresponding to step S102 in FIG. 1;

FIG. 4 is a flowchart illustrating an embodiment corresponding to step S201 in FIG. 3;

FIG. 5 is a schematic diagram of a frame of the embodiment corresponding to step S102 in FIG. 1;

FIG. 6 is a flowchart illustrating an embodiment corresponding to step S202 in FIG. 3;

FIG. 7 is a flowchart illustrating an embodiment corresponding to step S103 in FIG. 1;

FIG. 8 is a schematic diagram of a frame of an embodiment corresponding to step S103 in FIG. 1;

FIG. 9 is a flowchart illustrating an embodiment corresponding to step S501 in FIG. 7;

FIG. 10 is a flowchart illustrating an embodiment corresponding to step S502 in FIG. 7;

FIG. 11 is a schematic structural diagram of an embodiment of a target tracking framework according to the present application;

FIG. 12 is a schematic diagram of an embodiment of a target tracking device according to the present application;

fig. 13 is a schematic structural diagram of an embodiment of a device with a storage function according to the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described clearly and completely with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1 and fig. 2, fig. 1 is a schematic flow diagram of an embodiment of a target tracking method of the present application, and fig. 2 is a schematic frame diagram of an embodiment corresponding to the target tracking method in fig. 1, where the target tracking method specifically includes:

s101: and performing feature extraction on the target image and the current frame image to respectively obtain a corresponding target feature map and a corresponding current frame feature map.

Specifically, the specific implementation process of step S101 may be: firstly, cutting a target image containing a target from a template image, and then adjusting the size of the target image to a first preset size (for example, 127 × 127 × 3 as shown in fig. 2) by using a bilinear difference method; and adjusting the size of the current frame image to a second preset size (e.g., 255 × 255 × 3 as shown in fig. 2). And then inputting the target image and the current frame image into a feature extraction network of a siamMask of ResNet-50 in parallel or respectively for feature extraction. In order to ensure the correspondence of each channel of high-level semantic features, two branches of the twin network share parameters, the number of the channels of the extracted target feature map and the current frame feature map is 256, and the width and the height are respectively 1/4 of the width and the height of an input image of each branch; for example, the size of the target feature map is 15 × 15 × 256 in fig. 2, and the size of the current frame feature map is 31 × 31 × 256 in fig. 2. Of course, in other embodiments, other networks may be used for feature extraction, and when other networks are used for feature extraction, the sizes of the first preset size and the second preset size are determined by the network input requirement.

S102: and using a high-level semantic Feature Reorganization (FRM) module to obtain a target reorganization feature map and a current frame reorganization feature map based on the target feature map and the current frame feature map.

Specifically, referring to fig. 3, fig. 3 is a schematic flowchart illustrating an embodiment corresponding to step S102 in fig. 1, where step S102 specifically includes:

s201: and performing high-level semantic Feature Reorganization (FRM) on the target feature map to obtain a corresponding target reorganization feature map.

Specifically, please refer to fig. 4 and 5, wherein fig. 4 is a flowchart illustrating an embodiment corresponding to step S201 in fig. 3, and fig. 5 is a frame diagram illustrating an embodiment corresponding to step S102 in fig. 1. The specific implementation process of the step S201 may be:

s301: and obtaining a first matrix and a second matrix according to the target characteristic diagram, wherein the characteristic points in the same channel in the target characteristic diagram are arranged in the same column in the first matrix row by row, and the characteristic points in the same channel in the target characteristic diagram are arranged in the same row in the second matrix row by row.

Specifically, the target feature map may be represented as h1 × w1 × c, h1 representing the height of the target feature map, w1 representing the width of the target feature map, and c representing the number of channels (e.g., 256 in fig. 2) of the target feature map.

The specific implementation process of step S301 may be: and performing reshape operation on the target feature map once to convert the dimension of the target feature map from h1 × w1 × c to c × (h 1w 1), namely obtaining a second matrix, wherein each row in the second matrix corresponds to all feature points in one channel in the target feature map. Further, a transpose operation is performed on the second matrix once to exchange the dimensions into (h 1w 1) × c, so as to obtain a first matrix, and each column in the first matrix corresponds to all feature points in one channel in the target feature map. Of course, in other embodiments, the reshape operation may be performed on the target feature map to obtain the first matrix, and then the transpose operation may be performed on the first matrix to obtain the second matrix.

For example, assume that a feature point in a certain channel in the target feature map includes a _n 、b _n 、c _n Wherein n is the number of channels; the first matrix can be represented as

The second matrix may represent->

S302: the first matrix is normalized by rows and the second matrix is normalized by columns.

Specifically, in this embodiment, the normalization weight is obtained by processing with an L2 norm normalization method.

S303: and obtaining a first cosine similarity matrix according to the first matrix and the second matrix after normalization processing.

Specifically, in the present embodiment, a first matrix (h 1w 1) × c and a second matrix c × (h 1w 1) may be subjected to matrix multiplication, and cosine values are calculated to obtain a first cosine similarity matrix having a size of (h 1w 1) × (h 1w 1).

S304: and normalizing the first cosine similarity matrix by rows to obtain a first recombination weight matrix.

Specifically, in this embodiment, an L2 norm normalization method is used to perform processing, so as to obtain a normalization weight.

S305: and obtaining a target recombination characteristic diagram according to the first recombination weight matrix and the first matrix.

Specifically, in this embodiment, a first reorganization weight matrix (h 1w 1) × (h 1w 1) and a first matrix (h 1w 1) × c may be subjected to matrix multiplication to obtain a first intermediate matrix of (h 1w 1) × c size, and then the first intermediate matrix reshape is operated to restore its dimension to h1 × w1 × c, so as to obtain the target reorganization feature map.

Of course, in other embodiments, the second matrix c × (h 1w 1) and the first reorganization weight matrix (h 1w 1) × (h 1w 1) may also be subjected to matrix multiplication to obtain a first intermediate matrix of c × (h 1w 1) size, and then the first intermediate matrix reshape is operated to restore its dimension to h1 × w1 × c, so as to obtain the target reorganization feature map.

The process of performing high-level semantic Feature Reconstruction (FRM) on the target feature map can reduce noise to a certain extent, and because the target image takes a target as a main body and has fewer background pixels, the features in the target reconstructed feature map obtained through the process are more compact; and the size of the target reorganization feature map after the high-level semantic feature reorganization FRM is the same as that of the target feature map, that is, the size of the feature map is not changed in the high-level semantic feature reorganization FRM process (as shown in fig. 2 and 5).

S202: and performing high-level semantic Feature Reconstruction (FRM) on the current frame feature map based on the target reconstruction feature map to obtain the current frame reconstruction feature map.

Specifically, referring to fig. 6, fig. 6 is a flowchart illustrating an embodiment corresponding to step S202 in fig. 3, where the specific implementation process of step S202 may include:

s401: and obtaining a third matrix and a fourth matrix according to the target reorganization feature map, and obtaining a fifth matrix according to the current frame feature map, wherein feature points in the same channel in the target reorganization feature map are arranged in the same column in the third matrix row by row, feature points in the same channel in the target reorganization feature map are arranged in the same row in the fourth matrix row by row, and feature points in the same channel in the current frame feature map are arranged in the same column in the fifth matrix row by row.

Specifically, the specific implementation process of step S401 may be: and performing reshape operation on the target reorganization feature map once to convert the dimension of the target reorganization feature map from h1 × w1 × c to c × (h 1w 1), so as to obtain a fourth matrix, wherein each row in the fourth matrix corresponds to all feature points in one channel in the target reorganization image. Further, a transpose operation is performed on the fourth matrix once to exchange the dimensions into (h 1w 1) × c, so as to obtain a third matrix, and each column in the third matrix corresponds to all feature points in one channel in the target feature map. Of course, in other embodiments, a reshape operation may be performed on the target reorganization feature map to obtain a third matrix, and then a transpose operation may be performed on the third matrix to obtain a fourth matrix.

Similarly, a reshape operation may be performed on the current frame feature map to convert the dimension of the current frame feature map from h2 × w2 × c to (h 2w 2) × c, so as to obtain a fifth matrix, where each column in the fifth matrix corresponds to all feature points in one channel in the current frame feature map.

S402: and normalizing the fifth matrix by rows and normalizing the fourth matrix by columns.

S403: and obtaining a second cosine similarity matrix according to the fifth matrix and the fourth matrix after normalization processing.

Specifically, in the present embodiment, a cosine value may be calculated by matrix-multiplying the fifth matrix (h 2w 2) × c and the fourth matrix c × (h 1w 1) to obtain a second cosine similarity matrix having a size of (h 2w 2) × (h 1w 1).

S404: and normalizing the second cosine similarity matrix by rows to obtain a second recombination weight matrix.

S405: and obtaining a current frame recombination characteristic map according to the second recombination weight matrix and the third matrix.

Specifically, in this embodiment, matrix multiplication may be performed on the second recombination weight matrix (h 2w 2) × (h 1w 1) and the third matrix (h 1w 1) × c to obtain a second intermediate matrix of (h 2w 2) × c size, and then the second intermediate matrix reshape is operated to restore the dimension to h2 × w2 × c, that is, the current frame recombination feature map is obtained.

More background pixels may exist in the current frame feature map, and a certain background noise can be filtered by using compact target features in the target recombination feature map; and the size of the current frame recombined feature map after the high-level semantic feature recombination FRM is the same as that of the current frame feature map, i.e. the feature map size is not changed in the high-level semantic feature recombination FRM process (as shown in fig. 2 and 4).

S103: and finishing the feature interaction by using a size angle self-adaptive module based on the target recombination feature map and the current frame recombination feature map so as to obtain a current frame output feature map after the feature interaction.

Specifically, in some cases, the angle and/or size of the target may be changed during the moving process, in order to improve the tracking effect, please refer to fig. 7-8, fig. 7 is a flowchart illustrating an embodiment corresponding to step S103 in fig. 1, fig. 8 is a frame diagram illustrating an embodiment corresponding to step S103 in fig. 1, where the step S103 specifically includes:

s501: the target reorganization profile is subjected to different angle and/or different size changes (i.e., the module illustrated in fig. 2 as a size angle adaptive SAM).

Specifically, referring to fig. 9, fig. 9 is a flowchart illustrating an embodiment corresponding to step S501 in fig. 7, where when the target reorganization feature map needs to be changed at different angles and different sizes, step S501 specifically includes:

s601: and respectively carrying out convolution processing on the target reorganization characteristic graph by using different rotation matrixes to obtain corresponding target rotation characteristic graphs, wherein the different rotation matrixes correspond to different rotation angles.

Specifically, in the present embodiment, the rotation angles include-60 °, -30 °, 0 °, 30 °, 60 °. The five rotation angles can be set in a manner of covering the target angle change condition in a large range. Of course, in other embodiments, the rotation angle may include more or less rotation angles, which is not limited in the present application.

S602: and reducing the width and the height of the target rotation characteristic map by utilizing a down-sampling mode of the bilinear difference value.

Specifically, as shown in fig. 8, for each target rotated image, downsampling may be performed using bilinear difference values to reduce its width and height to 1/2 of the input, i.e., to h1/2 × w1/2 × c. Of course, in other embodiments, each target rotated image may be reduced to other sizes.

S603: and carrying out normalization processing on each channel of each target rotation feature map.

Specifically, in the present embodiment, the processing is performed by using the L2 norm normalization method.

S604: and performing different expansion processing on each target rotation characteristic diagram after the normalization processing, wherein the different expansion processing corresponds to different expansion rates.

Specifically, in the present embodiment, the expansion ratio is a positive integer, and for example, the expansion ratio includes 1, 2, and 3. The setting mode of the positive integer can better control the size of the target rotation characteristic diagram and can enable the target to be more consistent with the target in the current frame recombination characteristic diagram. In other embodiments, the expansion ratio may include other expansion ratios, which are not limited in this application.

In other embodiments, only the target reorganization feature map needs to be changed at different angles, and in this case, the step S501 may include only the step S601 or include the steps S601 to S602. Of course, only different size changes may be required to the target reorganization feature map, and in this case, the step S501 may only include the step S602 to the step S604, or the step S603 to the step S604.

S502: and performing depth-wise convolution on the current frame recombination characteristic diagram by using the changed target recombination characteristic diagram to obtain a current frame output characteristic diagram.

Specifically, referring to fig. 10, fig. 10 is a schematic flowchart illustrating an embodiment corresponding to step S502 in fig. 7, where step S502 specifically includes:

s701: and regarding each target rotation feature map after expansion processing, taking the feature of each channel as a convolution kernel, and performing depth-wise convolution processing on the current frame recombination feature map by using the convolution kernel to obtain a corresponding current frame processing sub-result.

Specifically, for each target rotation feature map after expansion processing, taking each layer of expanded pyramid-shaped features as a convolution kernel, and performing depth-wise convolution with the current frame recombined image h2 xw 2 xc; the above process is equivalent to calculating the projection of the neighborhood feature vectors around each position of the current frame reorganization feature map on the expanded target rotation feature map in each channel as the possibility of the target center at the position. Thus, each expansion rate corresponds to a correlation matrix of h2 × w2 × c, each rotation angle corresponds to a set of correlation matrices G (i.e., current frame processing sub-results), and the size is h2 × w2 × c × N1, where N1 is the number of expansion rates, and the size of the correlation matrix obtained in step S601 is: h2 × w2 × c × N1N2, where N2 is the number of rotation angles.

Taking five rotation angles and three expansion ratios in fig. 8 as examples, each rotation angle corresponds to a set of correlation matrices G (i.e., current frame processing sub-results), and the size is h2 × w2 × c × 3, and the size of the correlation matrix obtained in step S701 above is: h2 × w2 × c × 15.

S702: and taking the maximum value in all the current frame processing sub-results under each channel as the output result of the current channel, wherein the output results of all the channels form a current frame output characteristic diagram.

Specifically, taking fig. 8 as an example, for each channel, 15 correlation matrices (i.e., current frame processing sub-results) can be obtained in total, a maximum value from the 15 correlation matrices can be selected as an output result of the current channel, and finally, the output results of all channels can form a current frame output feature map. Namely, the correlation relationship under the angle and scale which are most consistent between the current frame reorganization characteristic diagram and the target reorganization characteristic diagram can be obtained through the process.

S104: and outputting the characteristic diagram based on the current frame to obtain a target tracking result.

Specifically, in this embodiment, the target tracking result includes position information of the target and/or a segmentation result mask. As shown in fig. 2, a box branch and a Mask branch in the siamMask method may be adopted to generate the position information of the tracking target and the segmentation Mask result, respectively. In other embodiments, only one of the branches may be used to reduce the model inference time according to a specific application scenario.

Referring to fig. 11, fig. 11 is a schematic structural diagram of an embodiment of a target tracking framework according to the present application, where the target tracking framework includes a feature extraction module 10, a feature interaction module 12, and an acquisition result module 14; the feature extraction module 10 is configured to perform feature extraction on the target image and the current frame image to obtain a target feature map and a current frame feature map, respectively. The feature interaction module 12 is coupled to the feature extraction module 10, and configured to use the high-level semantic feature reorganization module to obtain a target reorganization feature map and a current frame reorganization feature map based on the target feature map and the current frame feature map; and finishing the feature interaction by using a size angle self-adaptive module based on the target recombination feature map and the current frame recombination feature map so as to obtain a current frame output feature map after the feature interaction. The obtaining result module 14 is coupled to the feature interaction module 12, and is configured to obtain a target tracking result based on the current frame output feature map.

Referring to fig. 12, fig. 12 is a schematic structural diagram of an embodiment of a target tracking device according to the present application. The object tracking device comprises a processor 20 and a memory 22 coupled to each other for cooperating with each other to implement the object tracking method described in any of the above embodiments. In the present embodiment, the processor 20 may also be referred to as a CPU (Central Processing Unit). The processor 20 may be an integrated circuit chip having signal processing capabilities. The Processor 20 may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

In addition, the target tracking device provided by the present application may further include other structures, such as a common display screen, a communication circuit, etc., which are not described in the present application.

Referring to fig. 13, fig. 13 is a schematic structural diagram of an embodiment of a device with a storage function according to the present application. The storage-enabled device 30 has stored thereon program data 300, the program data 300 being executable by a processor to implement the object tracking method described in any of the above embodiments. The program data 300 may be stored in the storage device in the form of a software product, and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. The aforementioned storage device includes: various media capable of storing program codes, such as a usb disk, a mobile hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or terminal devices, such as a computer, a server, a mobile phone, and a tablet.

In summary, in the target tracking method provided by the application, target tracking is performed based on a target image and a current frame image, iteration of continuous frame images is not needed, and error propagation and accumulation are reduced; the target position is not limited in the present application, and the position where the target appears in the continuous frames may be arbitrary, that is, the target tracking method provided in the present application has a wider applicability. In addition, in the feature interaction process, the target feature map is subjected to noise reduction (namely, high-level semantic feature recombination) firstly, and then the current frame feature map is subjected to recombination and noise reduction based on the noise-reduced target feature map, so that the correlation between the target image and the target in the current frame image is established more accurately, and the accuracy of the target tracking result is improved.

In addition, a self-adaptive module of partial angle and/or scale is introduced, so that the tracking effect of the moving target with rotation and scale change is improved; the method is based on an end-to-end twin network model, and the participation of empirical parameters is reduced in the model reasoning process; when the result of the tracking target is finally obtained, the positioning and the segmentation of the tracking target can be simultaneously completed, and partial branch results can be used according to different granularity of specific tracking task requirements.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings, or which are directly or indirectly applied to other related technical fields, are intended to be included within the scope of the present application.

Claims

1. A target tracking method, comprising:

extracting the characteristics of the target image and the current frame image to respectively obtain a corresponding target characteristic image and a corresponding current frame characteristic image;

performing high-level semantic feature recombination on the target feature map to obtain a corresponding target recombination feature map; performing high-level semantic feature recombination on the current frame feature map based on the target recombination feature map to obtain a current frame recombination feature map;

carrying out change of different angles and/or different sizes on the target reorganization feature graph, and carrying out depth-wise convolution on the current frame reorganization feature graph by using the changed target reorganization feature graph to obtain a current frame output feature graph;

and outputting a characteristic diagram based on the current frame to obtain a target tracking result.

2. The method for tracking the target according to claim 1, wherein the step of performing high-level semantic feature reorganization on the target feature map to obtain a corresponding target reorganization feature map comprises:

obtaining a first matrix and a second matrix according to the target feature map, wherein feature points in the same channel in the target feature map are arranged in the same column in the first matrix line by line, and feature points in the same channel in the target feature map are arranged in the same row in the second matrix line by line;

normalizing the first matrix by rows and normalizing the second matrix by columns;

obtaining a first cosine similarity matrix according to the first matrix and the second matrix after normalization processing;

normalizing the first cosine similarity matrix by rows to obtain a first recombination weight matrix;

and obtaining a target recombination characteristic diagram according to the first recombination weight matrix and the first matrix.

3. The target tracking method according to claim 2, wherein the step of performing high-level semantic feature reconstruction on the current frame feature map based on the target reconstruction feature map to obtain the current frame reconstruction feature map comprises:

obtaining a third matrix and a fourth matrix according to the target reorganization feature map, and obtaining a fifth matrix according to the current frame feature map, wherein feature points in the same channel in the target reorganization feature map are arranged in the same column of the third matrix row by row, feature points in the same channel in the target reorganization feature map are arranged in the same row of the fourth matrix row by row, and feature points in the same channel in the current frame feature image are arranged in the same column of the fifth matrix row by row;

normalizing the fifth matrix by rows and normalizing the fourth matrix by columns;

obtaining a second cosine similarity matrix according to the fifth matrix and the fourth matrix after normalization processing;

normalizing the second cosine similarity matrix by rows to obtain a second recombination weight matrix;

and obtaining a current frame recombination characteristic map according to the second recombination weight matrix and the third matrix.

4. The target tracking method according to claim 2 or 3, wherein the performing normalization processing includes:

by means of L ₂ And (5) processing by a norm normalization method.

5. The method of claim 1, wherein the step of varying the target reformation feature map at different angles and different sizes comprises:

respectively performing convolution processing on the target reorganization characteristic graph by using different rotation matrixes to obtain corresponding target rotation characteristic graphs, wherein the different rotation matrixes correspond to different rotation angles;

reducing the width and height of the target rotation characteristic graph by utilizing a down-sampling mode of a bilinear difference value;

normalizing each channel of each target rotation characteristic diagram;

and performing different expansion processing on each target rotation characteristic diagram after the normalization processing, wherein the different expansion processing corresponds to different expansion rates.

6. The target tracking method according to claim 5, wherein the step of performing depth-wise convolution on the current frame reorganization feature map by using the changed target reorganization feature map to obtain a current frame output feature map comprises:

for each expanded target rotation feature map, taking the feature of each channel as a convolution kernel, and performing depth-wise convolution processing on the current frame recombination feature map by using the convolution kernel to obtain a corresponding current frame processing sub-result;

and taking the maximum value in all the current frame processing sub-results under each channel as the output result of the current channel, wherein the output results of all the channels form a current frame output characteristic diagram.

7. The object tracking method according to claim 5,

the rotation angles include-60 degrees, -30 degrees, 0 degrees, 30 degrees and 60 degrees; and/or the presence of a gas in the gas,

the expansion ratio is a positive integer, and the expansion ratio comprises 1, 2 and 3.

8. An object tracking device, comprising:

a processor and a memory, wherein the processor is coupled to the memory for implementing the object tracking method of any of claims 1-7.

9. An apparatus having a storage function, characterized in that program data are stored thereon, which program data are executable by a processor to implement the object tracking method as claimed in any one of claims 1-7.