CN112037262A

CN112037262A - Target tracking method and device and electronic equipment

Info

Publication number: CN112037262A
Application number: CN202010915821.3A
Authority: CN
Inventors: 邓练兵; 余大勇; 朱俊
Original assignee: Zhuhai Dahengqin Technology Development Co Ltd
Current assignee: Zhuhai Dahengqin Technology Development Co Ltd
Priority date: 2020-09-03
Filing date: 2020-09-03
Publication date: 2020-12-04

Abstract

The invention discloses a target tracking method, a target tracking device and electronic equipment, and particularly relates to the technical field of video monitoring. The target tracking method comprises the following steps: acquiring a first preprocessed image frame and a second preprocessed image frame at the same moment; performing feature extraction based on the first pre-processed image frame and the second pre-processed image frame; and determining the same image frame by using the extracted features, carrying out image fusion on the same image frame, and outputting a target tracking result. The image frames are collected and processed through the camera to obtain the image frames in the same format, local area and point feature extraction matching is conducted on the image frames to guarantee that the same feature points can be matched accurately, finally the same image frames are fused to obtain a final fusion result, the fact that multiple frames of the same image frames need to be processed when target tracking is conducted is reduced, the target tracking process is simplified, and target tracking accuracy and target tracking efficiency are improved.

Description

Target tracking method and device and electronic equipment

Technical Field

The invention relates to the technical field of video monitoring, in particular to a target tracking method and device and electronic equipment.

Background

With the continuous development of video monitoring technology, in order to ensure the tracking effect of a target object, the field of view areas of adjacent cameras are generally arranged in an overlapping manner. However, under the condition that the field of view areas between adjacent cameras overlap, when a target object enters the field of view overlapping area, the adjacent cameras track and detect the target object, so that when a target tracking result is output, multiple frames of the same image frame need to be processed, and the efficiency of acquiring the target tracking result is affected, and therefore how to realize high-efficiency target tracking still belongs to the problem to be solved urgently.

Disclosure of Invention

In view of this, embodiments of the present invention provide a target tracking method, an apparatus and an electronic device, so as to solve the problem in the prior art that needs to be solved.

According to a first aspect, an embodiment of the present invention provides a target tracking method, including: acquiring a first image frame acquired by a first camera and a second image frame acquired by a second camera at the same moment, and performing batch normalized processing on the first image frame and the second image frame to obtain a first preprocessed image frame and a second preprocessed image frame; performing multi-scale region feature detection based on the first and second pre-processed image frames to obtain a local region of the first image frame and a local region of the second image frame; extracting image feature points by using the local region of the first image frame and the local region of the second image frame to obtain the local region point feature of the first image frame and the local region point feature of the second image frame; performing same feature matching according to the local region point feature of the first image frame and the local region point feature of the second image frame to determine the same image frame in the first image frame and the second image frame; and carrying out image fusion on the same image frame in the first image frame and the second image frame, and outputting a target tracking result.

The image frames are collected and processed through the camera to obtain the image frames in the same format, local area and point feature extraction matching is conducted on the image frames to guarantee that the same feature points can be matched accurately, finally the same image frames are fused to obtain a final fusion result, the fact that multiple frames of the same image frames need to be processed when target tracking is conducted is reduced, the target tracking process is simplified, and target tracking accuracy and target tracking efficiency are improved.

With reference to the first aspect, in a first implementation manner of the first aspect, acquiring a first image frame acquired by a first camera and a second image frame acquired by a second camera, and performing batch normalization on the first image frame and the second image frame to obtain a first pre-processed image frame and a second pre-processed image frame includes: and converting the first image frame acquired by the first camera and the second image frame acquired by the second camera into a uniform format, and outputting a first pre-processing image frame and a second pre-processing image frame in the uniform format.

The first image frame and the second image frame are subjected to batch normalization processing, so that the output formats of the first image frame and the second image frame are consistent, and the target tracking result is prevented from being influenced due to non-uniform image formats during feature extraction and image fusion.

With reference to the first aspect, in a second implementation manner of the first aspect, performing multi-scale region feature detection based on the first and second pre-processed image frames to obtain a local region of the first image frame and a local region of the second image frame includes: performing region framing based on image features in the first pre-processed image frame and the second pre-processed image frame, and outputting an image framing region in the first pre-processed image frame and an image framing region in the second pre-processed image frame; and carrying out multi-scale feature detection on the image framing area in the first pre-processing image frame and the image framing area in the second pre-processing image frame to determine a local area of the first image frame and a local area of the second image frame.

The outline range of the target tracking object is determined by carrying out multi-scale regional feature detection on the preprocessed image frame, and the target detection efficiency and the target tracking speed are further improved.

With reference to the second implementation manner of the first aspect, in the third implementation manner of the first aspect, before performing image feature point extraction using the local region of the first image frame and the local region of the second image frame to obtain a local region point feature of the first image frame and a local region point feature of the second image frame, the method further includes: and respectively carrying out scale transformation of local area features on the basis of the local area of the first image frame and the local area of the second image frame so as to determine the local area image frame of the first image and the local area image frame of the second image.

The determined local area is subjected to scale transformation, so that scale deformation is prevented from being generated in the process of feature extraction, scale information of the local area is unified, and a basis is provided for correctly extracting feature points subsequently.

With reference to the first aspect or the third implementation manner of the first aspect, in a fourth implementation manner of the first aspect, performing image feature point extraction using the local region of the first image frame and the local region of the second image frame to obtain a local region point feature of the first image frame and a local region point feature of the second image frame includes: and detecting the inner point feature and the outer point feature of the local area of the image frame based on the local area of the first image frame and the local area of the second image frame to determine the local area point feature of the first image frame and the local area point feature of the second image frame.

The same image frame is further accurately acquired by comparing the point characteristics in the local area, the point characteristics outside the local area are used for preventing the missed frame selection characteristic points, and the detection of the point characteristics inside and outside the local area is utilized, so that the same characteristic points can be accurately acquired, and the target tracking efficiency is improved.

With reference to the first aspect, in a fifth implementation manner of the first aspect, the determining, according to the same feature matching between the local area point feature of the first image frame and the local area point feature of the second image frame, the same image frame in the first image frame and the second image frame further includes: judging whether the position and quantity information of the local area point features of the first image frame is the same as the position and quantity information of the local area point features of the second image frame; and if the position and quantity information of the local area point features of the first image frame are the same as those of the local area point features of the second image frame, determining that the first image frame and the second image frame are the same image frame.

By judging the position relation and the point feature quantity of the first image frame and the second image frame, whether the same image frame exists in the first image frame and the second image frame or not is accurately identified, preparation is made for subsequent image frame fusion, and target tracking efficiency is indirectly improved.

With reference to the first aspect, in a sixth implementation manner of the first aspect, outputting a target tracking result based on image fusion performed on the same image frame of the first image frame and the second image frame includes: fusing the same image frames by using a recursive filter according to the same image frames to determine a unique image frame; and performing cascade output on the motion trail of the tracking target based on the unique image frame and the image frames except the unique image frame.

The same image frames are fused, and only the only image frame after image fusion is reserved, so that the situation that multiple frames of the same image frame need to be processed when target tracking is carried out is avoided, the target tracking process is simplified, and the target tracking precision and the target tracking efficiency are improved. And the motion trail of the tracking target can be intuitively obtained by cascade output of the motion trail of the tracking target, so that reference information is provided for predicting the motion direction of the tracking target.

According to a second aspect, an embodiment of the present invention provides a target tracking apparatus, including: the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a first image frame acquired by a first camera and a second image frame acquired by a second camera at the same moment, and carrying out batch normalized processing on the first image frame and the second image frame to obtain a first preprocessed image frame and a second preprocessed image frame; a detection module, configured to perform multi-scale region feature detection based on the first and second pre-processed image frames to obtain a local region of the first image frame and a local region of the second image frame; the extraction module is used for extracting image feature points by using the local region of the first image frame and the local region of the second image frame to obtain the local region point feature of the first image frame and the local region point feature of the second image frame; the matching module is used for performing same feature matching according to the local region point feature of the first image frame and the local region point feature of the second image frame so as to determine the same image frame in the first image frame and the second image frame; and the output module is used for carrying out image fusion on the same image frame in the first image frame and the second image frame and outputting a target tracking result.

The method comprises the steps that a first preprocessing image frame and a second preprocessing image frame are obtained through an obtaining module, and the first preprocessing image frame and the second preprocessing image frame are sent to a detection module to be subjected to local area feature detection to obtain a local area of the first image frame and a local area of the second image frame; sending the obtained local area of the first image frame and the local area of the second image frame to an extraction module to extract the local area point characteristics of the first image frame and the local area point characteristics of the second image frame; and finally, sending the same image frames in the first image frame and the second image frame to an output module to obtain a target tracking result. Therefore, the requirement of processing multiple frames of same image frames during target tracking is reduced, the target tracking process is simplified, and the target tracking precision and the target tracking efficiency are improved.

According to a third aspect, an embodiment of the present invention provides an electronic device, including: a memory and a processor, the memory and the processor being communicatively connected to each other, the memory storing therein computer instructions, and the processor executing the computer instructions to perform the target tracking method according to the first aspect or any one of the embodiments of the first aspect.

According to a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium storing computer instructions for causing a computer to execute the target tracking method described in the first aspect or any one of the implementation manners of the first aspect.

Drawings

The features and advantages of the present invention will be more clearly understood by reference to the accompanying drawings, which are illustrative and not to be construed as limiting the invention in any way, and in which:

fig. 1 is a flowchart of a target tracking method according to an embodiment of the present invention;

fig. 2 is a flowchart of a method of step S4 in a target tracking method according to an embodiment of the present invention;

FIG. 3 is a block diagram of a target tracking apparatus according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Reference numerals

1-an acquisition module; 2-a detection module; 3-an extraction module; 4-a matching module; 5-an output module; 6-a memory; 7-a processor; 8-bus.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention discloses a target tracking method, and reference is made to fig. 1, which is a flowchart of the target tracking method provided by the embodiment of the invention, and as shown in fig. 1, the method includes:

and S1, acquiring a first image frame acquired by the first camera and a second image frame acquired by the second camera at the same time, and performing batch normalization processing on the first image frame and the second image frame to obtain a first preprocessed image frame and a second preprocessed image frame. In this embodiment, the first camera and the second camera may be adjacent cameras (for example, cameras arranged on a shoreside line), and there is an overlapping area between the viewing distances of the cameras, so that the first camera and the second camera both acquire the same tracking target at the same time and in the same place. The optional first camera and the second camera may also be opposite cameras (for example, cameras arranged at two sides of the river channel), and the cameras arranged opposite to each other often have a sight distance overlapping area in order to ensure that the cameras at two sides of the river channel can acquire an optimal river channel picture. In this embodiment, the first image frame and the second image frame are subjected to batch normalization processing, which may be operations of unifying image formats, equalizing processing, denoising images, and the like of the first image frame and the second image frame. Therefore, the consistency of the image frames is ensured, and the target tracking efficiency is indirectly improved. The image frames collected by the camera can be video image frames provided with continuous cameras or opposite cameras in a specified tracking route, and the tracking target can be a person or an object.

S2, performing multi-scale region feature detection based on the first pre-processed image frame and the second pre-processed image frame to obtain a local region of the first image frame and a local region of the second image frame. In this embodiment, the multi-scale region feature detection may be extracting contour information of the tracking target or software selecting a region. Optionally, step S2 further includes:

performing region framing based on image features in the first pre-processed image frame and the second pre-processed image frame, and outputting an image framing region in the first pre-processed image frame and an image framing region in the second pre-processed image frame; specifically, a software selection mode is used for framing the tracking target in the image frame in the preprocessing image frame, the framing mode can be a square frame or a circular frame, the view field range of the camera needs to be considered, the preferred shape of the frame is the circular frame, and the tracking target information can be acquired in the circular frame to the maximum extent.

And carrying out multi-scale feature detection on the image framing area in the first pre-processing image frame and the image framing area in the second pre-processing image frame to determine a local area of the first image frame and a local area of the second image frame. The method comprises the steps of determining scale information of different framing areas by carrying out multi-scale detection on the framing areas, calculating the area of the framing areas according to the obtained scale information, outputting the local area of a first image frame and the local area of a second image frame, and further improving target detection efficiency and target tracking speed. The area of the frame selection area can be calculated to compare whether the image frame selection area in the first pre-processed image frame is equal to the image frame selection area in the second pre-processed image frame, if so, the same image frame existing in the first image frame and the second image frame can be preliminarily determined, the target tracking precision is improved, if not, the corresponding image frame can be directly excluded, and preparation is made for accurately acquiring the same image frame for subsequent acquisition.

S3, image feature point extraction is performed using the local region of the first image frame and the local region of the second image frame to obtain local region point features of the first image frame and local region point features of the second image frame. In this embodiment, corner or blob information of the tracking target is extracted from the local region of the first image frame and the local region of the second image frame, thereby determining the local region point feature. For example: may be a contour blob of a target tracking object (person or vessel).

Optionally, before executing step S3, the method further includes: and respectively carrying out scale transformation of local area features on the basis of the local area of the first image frame and the local area of the second image frame so as to determine the local area image frame of the first image and the local area image frame of the second image. Specifically, the local regions of the image frames with different scales may be scaled to make the scales of the local regions in the image frames the same, so as to facilitate feature point extraction and improve the detection efficiency of the same feature point. The point characteristics of the tracking target in the local area of the first image frame and the local area of the second image frame are extracted, so that high-efficiency target tracking detection is realized. Meanwhile, the local area of the image frame is subjected to scale transformation, so that a basis can be provided for correctly extracting the feature points.

Optionally, step S3 may further include: and detecting the inner point characteristic and the outer point characteristic of the local area of the image frame based on the local area of the first image frame and the local area of the second image frame to determine the local area point characteristic of the first image frame and the local area point characteristic of the second image frame. Specifically, feature point extraction may be performed on the acquired image frame and the scaled local region. The same image frame is further accurately acquired by detecting and extracting the internal and external point features of the local area, wherein the detection and extraction of the external point features of the local area are used for preventing the missing feature points of frame selection, and the detection of the internal point features of the local area is used for ensuring that the same feature points can be accurately acquired so as to improve the target tracking efficiency.

And S4, performing same feature matching according to the local region point feature of the first image frame and the local region point feature of the second image frame to determine the same image frame in the first image frame and the second image frame. In this embodiment, whether the local area point feature of the first image frame is the same as the local area point feature of the second image frame may be determined in a manner of relational mapping, image frame-by-image comparison, and image frame superposition, so as to determine whether the same image frame exists in the first image frame and the second image frame, and prepare for performing the same image frame fusion subsequently. For example: the method comprises the steps of obtaining feature points in a first image frame and feature points in a second image frame, mapping the two image frames with each other, determining whether feature points in a local area or a partial area are overlapped with each other or corresponding feature points exist in the first image frame, and determining the feature points to be the same feature points, so that the same image frame is determined.

S5, image fusion is performed based on the same image frame in the first image frame and the second image frame, and the target tracking result is output. Optionally, step S5 may further include:

fusing the same image frames by using a recursive filter according to the same image frames to determine a unique image frame; wherein the same image frame may also be normalized by using kalman filtering to obtain a unique image frame.

And outputting the motion trail of the tracking target in a cascade mode based on the unique image frame and the image frames except the unique image frame. The unique image frame and the image frames except the unique image frame may be spliced to obtain the motion track of the tracking target.

Optionally, in this embodiment, the proposed methods for local region extraction and point feature extraction are both conventional extraction methods, and are not described herein again.

The embodiment of the present invention discloses a target tracking method, as an optional implementation manner of the embodiment of the present invention, as shown in fig. 2, step S4 further includes:

s40, it is determined whether the position and number information of the local area point feature of the first image frame is the same as the position and number information of the local area point feature of the second image frame. Specifically, the position and quantity information of the local area point features of the first image frame and the second image frame may be obtained, the point features of the first image frame and the point features of the second image are compared in a mutual mapping manner, whether the two image frames overlap each other in the mapping manner is judged, so as to determine that the two image frames are the same point features in position, and then, whether the two image frames are the same in quantity is judged, so as to further confirm that the feature points in the image frames are the same.

S41, if the position and number information of the local area point feature of the first image frame are both the same as the position and number information of the local area point feature of the second image frame, determining that the first image frame and the second image frame are the same image frame. The image frames determined to be the same as the first image frame and the second image frame can also be marked or stored to facilitate subsequent extraction of the image frames, so that the target tracking efficiency is improved.

S42, if the position and quantity information of the local area point feature of the first image frame are different from the position and quantity information of the local area point feature of the second image frame, the process proceeds to step S40 to perform image frame matching for the same frame as the next frame. Specifically, when the position information and the quantity information are different, the determination result may be directly output, and the next frame of image detection is performed.

Correspondingly, referring to fig. 3, an embodiment of the present invention further provides a target tracking apparatus, including:

the acquiring module 1 is configured to acquire a first image frame acquired by a first camera and a second image frame acquired by a second camera at the same time, and perform batch normalization processing on the first image frame and the second image frame to obtain a first preprocessed image frame and a second preprocessed image frame, where specific contents may refer to step S1;

a detecting module 2, configured to perform multi-scale region feature detection based on the first pre-processed image frame and the second pre-processed image frame to obtain a local region of the first image frame and a local region of the second image frame, where specific contents may refer to step S2;

an extracting module 3, configured to perform image feature point extraction by using a local region of the first image frame and a local region of the second image frame to obtain a local region point feature of the first image frame and a local region point feature of the second image frame, where specific contents may refer to step S3;

a matching module 4, configured to perform same feature matching according to the local area point feature of the first image frame and the local area point feature of the second image frame to determine an image frame that is the same as the first image frame in the second image frame, where the specific content may refer to step S4;

and an output module 5, configured to perform image fusion based on the same image frame in the first image frame and the second image frame, and output a target tracking result, where specific contents may refer to those in step S5.

An embodiment of the present invention further provides an electronic device, as shown in fig. 4, the electronic device may include a processor 7 and a memory 6, where the processor 7 and the memory 6 may be connected by a bus 8 or in another manner, and fig. 4 takes the example of connection by a bus as an example.

The processor 7 may be a Central Processing Unit (CPU). The Processor 7 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or combinations thereof.

The memory 6, as a non-transitory computer readable storage medium, may be used for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., the acquisition module 1, the detection module 2, the extraction module 3, the matching module 4, and the output module 5 shown in fig. 3) corresponding to the object tracking method in the embodiment of the present invention. The processor 7 executes various functional applications and data processing of the processor by running non-transitory software programs, instructions and modules stored in the memory 6, namely, implements the target tracking method in the above method embodiments.

The memory 6 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by the processor 7, and the like. Further, the memory 6 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 6 may optionally include memory located remotely from the processor 7, and these remote memories may be connected to the processor 7 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The one or more modules are stored in the memory 6 and when executed by the processor 7 perform the object tracking method in the embodiment shown in fig. 1-2.

The specific details of the electronic device and the target tracking apparatus provided above may be understood by referring to the corresponding related descriptions and effects in the embodiments shown in fig. 1 to fig. 2, and are not described herein again.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.

Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims

1. A target tracking method, comprising:

acquiring a first image frame acquired by a first camera and a second image frame acquired by a second camera at the same moment, and performing batch normalized processing on the first image frame and the second image frame to obtain a first preprocessed image frame and a second preprocessed image frame;

performing multi-scale region feature detection based on the first and second pre-processed image frames to obtain a local region of the first image frame and a local region of the second image frame;

extracting image feature points by using the local region of the first image frame and the local region of the second image frame to obtain the local region point feature of the first image frame and the local region point feature of the second image frame;

performing same feature matching according to the local region point feature of the first image frame and the local region point feature of the second image frame to determine the same image frame in the first image frame and the second image frame;

and carrying out image fusion on the same image frame in the first image frame and the second image frame, and outputting a target tracking result.

2. The method of claim 1, wherein obtaining a first image frame captured by a first camera and a second image frame captured by a second camera, and performing batch normalization on the first image frame and the second image frame to obtain a first pre-processed image frame and a second pre-processed image frame comprises:

and converting the first image frame acquired by the first camera and the second image frame acquired by the second camera into a uniform format, and outputting a first pre-processing image frame and a second pre-processing image frame in the uniform format.

3. The method of claim 1, wherein the performing multi-scale region feature detection based on the first and second pre-processed image frames to obtain local regions of the first and second image frames comprises:

performing region framing based on image features in the first pre-processed image frame and the second pre-processed image frame, and outputting an image framing region in the first pre-processed image frame and an image framing region in the second pre-processed image frame;

and carrying out multi-scale feature detection on the image framing area in the first pre-processing image frame and the image framing area in the second pre-processing image frame to determine a local area of the first image frame and a local area of the second image frame.

4. The method of claim 3, wherein before the image feature point extraction using the local region of the first image frame and the local region of the second image frame to obtain the local region point feature of the first image frame and the local region point feature of the second image frame, the method further comprises: and respectively carrying out scale transformation of local area features on the basis of the local area of the first image frame and the local area of the second image frame so as to determine the local area image frame of the first image and the local area image frame of the second image.

5. The method according to claim 1 or 4, wherein the image feature point extraction using the local region of the first image frame and the local region of the second image frame to obtain the local region point feature of the first image frame and the local region point feature of the second image frame comprises:

and detecting the inner point feature and the outer point feature of the local area of the image frame based on the local area of the first image frame and the local area of the second image frame to determine the local area point feature of the first image frame and the local area point feature of the second image frame.

6. The method of claim 1, wherein said performing identical feature matching based on local area point features of the first image frame and local area point features of a second image frame to determine identical image frames of the first image frame and the second image frame further comprises:

judging whether the position and quantity information of the local area point features of the first image frame is the same as the position and quantity information of the local area point features of the second image frame;

and if the position and quantity information of the local area point features of the first image frame are the same as those of the local area point features of the second image frame, determining that the first image frame and the second image frame are the same image frame.

7. The method of claim 1, wherein the outputting a target tracking result based on image fusion of the same image frame of the first image frame and the second image frame comprises:

fusing the same image frames by using a recursive filter according to the same image frames to determine a unique image frame;

and performing cascade output on the motion trail of the tracking target based on the unique image frame and the image frames except the unique image frame.

8. An object tracking device, comprising:

the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a first image frame acquired by a first camera and a second image frame acquired by a second camera at the same moment, and carrying out batch normalized processing on the first image frame and the second image frame to obtain a first preprocessed image frame and a second preprocessed image frame;

a detection module, configured to perform multi-scale region feature detection based on the first and second pre-processed image frames to obtain a local region of the first image frame and a local region of the second image frame;

the extraction module is used for extracting image feature points by using the local region of the first image frame and the local region of the second image frame to obtain the local region point feature of the first image frame and the local region point feature of the second image frame;

the matching module is used for performing same feature matching according to the local region point feature of the first image frame and the local region point feature of the second image frame so as to determine the same image frame in the first image frame and the second image frame;

and the output module is used for carrying out image fusion on the same image frame in the first image frame and the second image frame and outputting a target tracking result.

9. An electronic device, comprising:

a memory and a processor, the memory and the processor being communicatively coupled to each other, the memory having stored therein computer instructions, the processor executing the computer instructions to perform the object tracking method of any of claims 1-7.

10. A computer-readable storage medium storing computer instructions for causing a computer to perform the object tracking method of any one of claims 1-7.