CN111598923B

CN111598923B - Target tracking method and device, computer equipment and storage medium

Info

Publication number: CN111598923B
Application number: CN202010382627.3A
Authority: CN
Inventors: 彭瑾龙; 王昌安; 罗泽坤; 李剑; 邰颖; 王亚彪; 汪铖杰; 李季檩; 吴永坚; 黄飞跃
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-05-08
Filing date: 2020-05-08
Publication date: 2022-09-20
Anticipated expiration: 2040-05-08
Also published as: CN111598923A

Abstract

The embodiment of the application discloses a target tracking method and device, computer equipment and a storage medium, and belongs to the technical field of computers. The method comprises the following steps: the method comprises the steps of obtaining a plurality of image frames which are arranged in sequence, obtaining a plurality of image groups which are arranged in sequence, obtaining feature vectors of a preset number of image frames in each image group, carrying out target detection on the feature vectors of the preset number of image frames in each image group, respectively obtaining at least one alternative area chain of each image group and a corresponding first target feature map, carrying out adjustment processing on the at least one alternative area chain of each image group, respectively obtaining at least one target area chain of each image group, and creating a target area combination chain of at least one target according to the target area chains of the plurality of image groups and the arrangement sequence of the plurality of image groups. And the target area chain is acquired by taking the image group as a unit, so that the accuracy of target detection is improved, and the accuracy of a target area combination chain is improved.

Description

Target tracking method and device, computer equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a target tracking method, a target tracking device, computer equipment and a storage medium.

Background

With the development of computer technology, the application of target tracking technology is more and more extensive. The target tracking technology realizes the tracking of the target by determining the area where the target in the plurality of image frames is located, and can be applied to the fields of video people stream statistics, video monitoring, suspect tracking and the like.

In the related art, a target tracking method is provided, which performs target detection according to a feature vector of each image frame, and determines a target area of each image frame, so that a moving track of a target can be determined according to the target areas of a plurality of image frames. However, the above method only uses a single image frame as a unit, and determines the target area of each image frame separately, which is not accurate.

Disclosure of Invention

The embodiment of the application provides a target tracking method and device, computer equipment and a storage medium, which can improve the accuracy of a determined target area combination chain. The technical scheme is as follows:

in one aspect, a target tracking method is provided, and the method includes:

acquiring a plurality of image frames arranged in sequence;

taking every preset number of image frames in the plurality of image frames as an image group to obtain a plurality of image groups which are arranged in sequence, wherein any two adjacent image groups comprise at least one same image frame and at least one different image frame;

acquiring the feature vectors of the preset number of image frames in each image group;

performing target detection on the feature vectors of the preset number of image frames in each image group to respectively obtain at least one alternative region chain of each image group and a corresponding first target feature map, wherein the alternative region chain comprises a plurality of alternative regions located in different image frames of the corresponding image group, and the first target feature map comprises the probability that a plurality of alternative regions contained in each alternative region chain of the corresponding image group belong to the same target;

adjusting at least one alternative region chain of each image group according to the feature vectors of the preset number of image frames in each image group and the corresponding first target feature map to respectively obtain at least one target region chain of each image group, wherein the target region chain comprises a plurality of target regions which are located in different image frames of the corresponding image group and belong to the same target;

and creating a target area combination chain of at least one target according to the target area chains of the plurality of image groups and the arrangement sequence of the plurality of image groups.

In one possible implementation, the method further includes:

adding the matching relation between the interruption target area and the prediction target area in the target area matching relation between the second image group and the first image group.

In another possible implementation, the acquiring a plurality of image frames in a sequential order includes:

and acquiring video data, and performing frame extraction processing on the video data to obtain the plurality of image frames.

In another possible implementation manner, after creating a target region combination chain of at least one target according to the target region chains of the plurality of image groups and the arrangement order of the plurality of image groups, the method further includes:

adding the moving track of the at least one target in the video data according to the target area combination chain of the at least one target to obtain updated video data;

and playing the updated video data to display the picture of the at least one target moving according to the corresponding moving track.

In another aspect, an apparatus for tracking a target is provided, the apparatus comprising:

the image frame acquisition module is used for acquiring a plurality of image frames which are arranged in sequence;

the image group acquisition module is used for taking every preset number of image frames in the plurality of image frames as an image group to obtain a plurality of image groups which are arranged in sequence, wherein any two adjacent image groups comprise at least one same image frame and at least one different image frame;

the feature vector acquisition module is used for acquiring feature vectors of the preset number of image frames in each image group;

the target detection module is used for performing target detection on the feature vectors of the preset number of image frames in each image group to respectively obtain at least one alternative region chain of each image group and a corresponding first target feature map, wherein the alternative region chain comprises a plurality of alternative regions located in different image frames of the corresponding image group, and the first target feature map comprises the probability that the plurality of alternative regions contained in each alternative region chain of the corresponding image group belong to the same target;

the adjustment processing module is configured to perform adjustment processing on at least one candidate region chain of each image group according to the feature vectors of the preset number of image frames in each image group and the corresponding first target feature map, so as to obtain at least one target region chain of each image group, where the target region chain includes a plurality of target regions that are located in different image frames of the corresponding image group and belong to the same target;

and the area combination chain creating module is used for creating an object area combination chain of at least one object according to the object area chains of the plurality of image groups and the arrangement sequence of the plurality of image groups.

In one possible implementation manner, the feature vector obtaining module includes:

a first determining unit, configured to determine a feature vector of a designated image frame in a second image group as a feature vector of a corresponding designated image frame in a first image group, where the second image group is a previous image group of the first image group in the plurality of image groups, and the designated image frame is a same image frame in the first image group and the second image group;

and the feature extraction unit is used for extracting features of other image frames except the specified image frame in the first image group to obtain feature vectors of the other image frames.

In another possible implementation manner, the target detection module includes:

and the object detection unit is used for carrying out object detection on the feature vectors of the preset number of image frames in each image group to respectively obtain at least one alternative region chain and a corresponding first object feature map of each image group and a second object feature map corresponding to the first image frame of each image group, wherein the second object feature map comprises the probability that at least one alternative region of the first image frame of the corresponding image group contains an object.

In another possible implementation manner, the adjusting processing module includes:

an adjusting processing unit, configured to perform adjustment processing on at least one candidate region chain of each image group according to the feature vectors of the preset number of image frames in each image group, the first target feature map corresponding to each image group, and the second target feature map corresponding to the first image frame of each image group, to obtain at least one target region chain of each image group, respectively, where the second target feature map includes a probability that at least one candidate region of the first image frame of the corresponding image group contains a target.

In another possible implementation manner, the adjustment processing unit is further configured to fuse the feature vectors of the preset number of image frames in each image group, the first target feature map corresponding to each image group, and the second target feature map corresponding to the first image frame of each image group, so as to obtain an aggregate feature vector of each image group; and adjusting at least one candidate region chain of each image group according to the aggregation feature vector of each image group to respectively obtain at least one target region chain of each image group.

the target detection unit is used for carrying out target detection according to the feature vector of the first image frame in any image group to obtain at least one alternative area of the first image frame in the image group;

the candidate region processing unit is used for processing the at least one candidate region according to the feature vectors of the preset number of image frames in the image group to respectively obtain candidate regions of other image frames in the image group;

and the target area association unit is used for associating the matched target areas belonging to different image frames in the preset number of image frames to obtain at least one alternative area chain of the image group.

In another possible implementation manner, the target detection unit is further configured to perform target detection according to the feature vector of the first image frame, so as to obtain a plurality of candidate regions of the first image frame and probabilities corresponding to the plurality of candidate regions; and selecting at least one alternative area with the probability greater than a preset threshold value from the multiple alternative areas.

In another possible implementation manner, the feature vector obtaining module includes:

the feature vector acquisition unit is used for calling a feature extraction model and acquiring feature vectors of the preset number of image frames in each image group;

the target detection module comprises:

and the target detection unit is used for calling a feature detection model, carrying out target detection on the feature vectors of the preset number of image frames in each image group, and respectively obtaining at least one candidate region chain and a corresponding first target feature map of each image group.

In another possible implementation manner, the region combination chain creating module includes:

a second determining unit, configured to, for any two adjacent image groups, take the same image frame in the any two image groups as a specified image frame, and take a target area of the specified image frame as a specified target area;

the image processing device comprises an image group specifying unit, a region matching relation determining unit and a target region matching unit, wherein the image group specifying unit is used for specifying a target region matching relation between any two image groups according to a specified image frame and a specified target region in any two image groups, and the target region matching relation comprises a matching relation between any target region of any specified image frame in a first image group and a target region belonging to the same target in the same specified image frame in a second image group;

a region chain matching relationship determining unit, configured to determine a target region chain matching relationship between any two image groups according to the target region chains of any two image groups and the target region matching relationship, where the target region chain matching relationship includes a matching relationship between any item tag region chain of the first image group and a target region chain belonging to the same target of the second image group;

and the combining unit is used for combining other target areas except the specified target area in any target area chain with another target area chain matched with any target area chain according to the target area chain matching relation to obtain the target area combined chain.

In another possible implementation manner, the region matching relationship determining unit is further configured to determine multiple groups of candidate matching relationships according to the specified image frames and the specified target regions in any two image groups, where the candidate matching relationships include a matching relationship between each specified target region in any specified image frame in the first image group and any specified target region in the same specified image frame in the second image group, and the multiple groups of candidate matching relationships are different; respectively determining the sum of the similarity of every two matched specified target areas in each group of alternative matching relations as the matching degree of each group of alternative matching relations; and selecting the candidate matching relationship with the maximum matching degree from the multiple groups of candidate matching relationships, and determining the candidate matching relationship as the target area matching relationship.

In another possible implementation manner, the second image frame of the second image group is the same as the first image frame of the first image group, and the second image group is a previous image group of the first image group in the plurality of image groups;

the adjustment processing module further includes:

and the target region predicting unit is used for predicting a prediction target region which belongs to the same target as the interruption target region in the first image frame of the first image group if the first target region chain of the second image group comprises any interruption target region in the second image frame of the second image group, and the interruption target region is not matched with each specified target region in the first image frame of the first image group.

In another possible implementation manner, the apparatus further includes:

the target area prediction module is used for mapping the prediction target area according to the feature vectors of the preset number of image frames in the first image group to respectively obtain the prediction target areas of other image frames in the first image group;

and the target area association module is used for associating the prediction target areas belonging to different image frames in the preset number of image frames to obtain a prediction target area chain of the first image group.

In another possible implementation manner, the apparatus further includes:

a matching relationship adding module, configured to add a matching relationship between the interruption target region and the prediction target region in a target region matching relationship between the second image group and the first image group.

In another possible implementation manner, the image frame acquisition module includes:

and the image frame acquisition unit is used for acquiring video data and performing frame extraction processing on the video data to obtain the plurality of image frames.

In another possible implementation manner, the apparatus further includes:

the video data updating module is used for adding the moving track of the at least one target in the video data according to the target area combination chain of the at least one target to obtain updated video data;

and the video data playing module is used for playing the updated video data so as to display the picture of the at least one target moving according to the corresponding moving track.

In another aspect, a computer device is provided, the computer device comprising a processor and a memory, the memory having stored therein at least one instruction, the at least one instruction being loaded and executed by the processor to implement the target tracking method as described in the above aspect.

In another aspect, a computer-readable storage medium is provided, having stored therein at least one instruction, which is loaded and executed by a processor, to implement the target tracking method according to the above aspect.

The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:

according to the method, the device, the computer equipment and the storage medium provided by the embodiment of the application, a plurality of image groups are created according to a plurality of image frames, the target area chain is obtained by taking the image groups as a unit, the target areas of different image frames in each image group have association relations, the accuracy of target detection is improved, the alternative area chain of each image group is adjusted through the obtained first target characteristic map of each image group, the target areas contained in the obtained target area chain belong to the same target, the accuracy of the obtained target area chain is improved, the different image groups comprise the same image frame, the association relations exist among the target area chains of the image groups, the target area combination chain can be formed, the tracking of the target is realized, and the accuracy of the target area combination chain is improved.

At least one target area chain of each image group is determined according to the feature vectors of the preset number of image frames of each image group through the feature extraction model and the feature detection model, and the accuracy of the determined target area chain is improved, so that the accuracy of the target area combination chain is improved, and the accuracy of the determined moving track of the target is also improved.

When the target area chains of any two image groups are combined, the target area chain matching relationship between the designated target areas of the same designated image frames in the two image groups is determined by determining the target area matching relationship between the designated target areas of the same designated image frames in the two image groups, so that the target area chains of different image groups can be combined according to the determined target area chain matching relationship, the accuracy of the obtained target area combination chain is improved, and the accuracy of the determined target moving track is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic illustration of an implementation environment provided by an embodiment of the present application;

fig. 2 is a flowchart of a target tracking method provided in an embodiment of the present application;

fig. 3 is a flowchart of a target tracking method provided in an embodiment of the present application;

FIG. 4 is a schematic diagram of a target region chain according to an embodiment of the present disclosure;

fig. 5 is a schematic diagram of a target area matching relationship provided in an embodiment of the present application;

FIG. 6 is a schematic diagram illustrating a target movement track according to an embodiment of the present disclosure;

FIG. 7 is a flowchart of a target tracking method provided in an embodiment of the present application;

fig. 8 is a flowchart for determining a target area matching relationship according to an embodiment of the present disclosure;

FIG. 9 is a flowchart of acquiring a target region chain of an image group according to an embodiment of the present disclosure;

FIG. 10 is a flowchart of acquiring a target region chain of an image group according to an embodiment of the present disclosure;

FIG. 11 is a flowchart illustrating a method for obtaining a target region chain of a plurality of image groups according to an embodiment of the present disclosure;

fig. 12 is a schematic structural diagram of a target tracking apparatus according to an embodiment of the present application;

fig. 13 is a schematic structural diagram of a target tracking apparatus according to an embodiment of the present application;

fig. 14 is a schematic structural diagram of a terminal according to an embodiment of the present application;

fig. 15 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present application more clear, the embodiments of the present application will be further described in detail with reference to the accompanying drawings.

The terms "first," "second," and the like as used herein may be used herein to describe various concepts that are not limited by these terms unless otherwise specified. These terms are only used to distinguish one concept from another. For example, a first image group may be referred to as a second image group, and similarly, a second image group may be referred to as a first image group, without departing from the scope of the present application.

As used herein, the terms "at least one," "a plurality," "each," and "any," at least one of which includes one, two, or more than two, and a plurality of which includes two or more than two, each of which refers to each of the corresponding plurality, and any of which refers to any of the plurality. For example, the plurality of elements includes 3 elements, each of which refers to each of the 3 elements, and any one of the 3 elements refers to any one of the 3 elements, which may be a first one, a second one, or a third one.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.

According to the scheme provided by the embodiment of the application, the feature extraction model and the feature detection model can be trained based on the machine learning technology of artificial intelligence, and the target tracking method is realized by utilizing the trained feature extraction model and feature detection model.

The target tracking method provided by the embodiment of the application can be used in a computer device, the computer device acquires a plurality of image frames which are sequentially arranged, each preset number of the image frames are used as an image group, a plurality of image groups which are sequentially arranged are acquired, feature vectors of the preset number of the image frames in each image group are acquired, target detection is performed on the feature vectors of the preset number of the image frames in each image group, at least one alternative area chain and a corresponding first target feature map of each image group are respectively acquired, at least one alternative area chain of each image group is adjusted according to the feature vectors of the preset number of the image frames in each image group and the corresponding first target feature map, at least one object area chain of each image group is respectively acquired, and according to the object area chains of the plurality of image groups and the arrangement sequence of the plurality of image groups, a target region combination chain of at least one target is created.

The computer equipment comprises a terminal or a server, wherein the server can be an independent physical server, can also be a server cluster or a distributed system formed by a plurality of physical servers, and can also be a cloud server for providing basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, Network service, cloud communication, middleware service, domain name service, security service, CDN (Content Delivery Network), big data, an artificial intelligence platform and the like. The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.

Fig. 1 is a schematic structural diagram of an implementation environment provided in an embodiment of the present application, and as shown in fig. 1, the implementation environment includes a terminal 101 and a server 102. The terminal 101 establishes a communication connection with the server 102, and performs interaction through the established communication connection.

The method comprises the steps that a terminal 101 shoots a target, a plurality of image frames which are arranged in sequence are obtained, the plurality of image frames are sent to a server 102, the server 102 obtains the plurality of image frames, each preset number of image frames in the plurality of image frames are used as an image group, a plurality of image groups which are arranged in sequence are obtained, feature vectors of the preset number of image frames in each image group are obtained, target detection is carried out on the feature vectors of the preset number of image frames in each image group, at least one alternative area chain and a corresponding first target feature map of each image group are obtained respectively, at least one object area chain of each image group is obtained through adjustment processing according to the feature vectors of the preset number of image frames in each image group and the corresponding first target feature maps, at least one alternative area chain of each image group is obtained respectively, and according to the object area chains of the plurality of image groups and the arrangement sequence of the plurality of image groups, creating a target area combination chain of at least one target, adding a moving track of the at least one target in the video data by the server 102 according to the target area combination chain of the at least one target to obtain updated video data, sending the video data enough to be updated to the terminal 101, and playing the updated video data by the terminal 101 to display a picture of the at least one target moving according to the corresponding moving track.

The method provided by the embodiment of the application can be used for a target tracking scene.

For example, in a people stream statistics scenario:

the method comprises the steps that a terminal shoots an intersection, video data are obtained, frame extraction processing is carried out on the video data, after a plurality of image frames are obtained, a target area combination chain of at least one target is created by adopting the target tracking method provided by the embodiment of the application, the number of the targets appearing in the video data is determined according to the target area combination chain of the at least one target appearing in the plurality of image frames, and therefore people flow statistics of the intersection is achieved.

As another example, under a suspected infected person tracking scenario:

during epidemic prevention and control, a terminal acquires a plurality of image frames of an epidemic prevention and control area, a target area combination chain of at least one target is created by adopting the target tracking method provided by the embodiment of the application, the moving track of the at least one target is determined according to the target area combination chain of the at least one target, so that the moving track of the at least one target in the epidemic prevention and control area can be determined, the at least one target can be determined to be in the epidemic prevention and control area through the determined moving track, the at least one target is determined to be a suspected infector, the at least one target can be isolated subsequently, the spread of viruses is avoided, and the effectiveness of the epidemic prevention and control is improved.

Fig. 2 is a flowchart of a target tracking method provided in an embodiment of the present application, which is applied to a computer device, and as shown in fig. 2, the method includes:

201. a plurality of image frames in a sequential arrangement is acquired.

The image frame is an image including an object, and the image frame may be obtained by shooting the object or by performing frame extraction processing on video data. The plurality of image frames may be arranged according to the sequence of the shooting time, or may be arranged according to the sequence number of the plurality of image frames, or may be arranged according to other sequences, etc.

202. And taking each preset number of image frames in the plurality of image frames as an image group to obtain a plurality of image groups which are arranged in sequence.

The preset number may be any set number, such as 2, 3, or 5. The arrangement order of the plurality of image groups may be the arrangement order according to the generation time sequence of the image groups, or the arrangement order of the image frames included in the image groups, for example, the arrangement order according to the time sequence of the image frames included in the image groups.

In the plurality of image groups, image frames included in different image groups are not completely the same, and any two adjacent image groups include at least one same image frame and at least one different image frame. For example, the plurality of image frames arranged in sequence include an image frame 1, an image frame 2, an image frame 3, and an image frame 4, and the preset number is 2, then the image frame 1 and the image frame 2 may be regarded as an image group a, the image frame 2 and the image frame 3 may be regarded as an image group B, the image frame 3 and the image frame 4 may be regarded as an image group C, and the obtained plurality of image groups are arranged according to the order of the image frames, that is, the order of the plurality of image groups is: image group A, image group B and image group C.

203. And acquiring the feature vectors of a preset number of image frames in each image group.

The feature vector is a vector for representing a feature of an image frame, and the feature vectors of different image frames are different.

204. And performing target detection on the feature vectors of a preset number of image frames in each image group to respectively obtain at least one alternative region chain and a corresponding first target feature map of each image group.

Each candidate region chain represents the association relation among candidate regions which may belong to the same target in a plurality of image frames, and each candidate region chain comprises a plurality of candidate regions which are positioned in different image frames of the corresponding image group.

The first target feature map includes probabilities that a plurality of candidate regions included in each candidate region chain of the corresponding image group belong to the same target, the higher the probability is, the more likely the plurality of candidate regions included in the corresponding candidate region chain belong to the same target, and the lower the probability is, the more unlikely the plurality of candidate regions included in the corresponding candidate region chain belong to the same target.

Processing the feature vectors of a preset number of image frames in any image group to obtain at least one alternative region of each image frame in the preset number of image frames, and associating the alternative regions which may belong to the same target in the preset number of image frames to obtain at least one alternative region chain of the image group. And identifying the contents contained in the multiple candidate regions contained in each candidate region chain, and determining the probability that the multiple candidate regions in each candidate region chain belong to the same target, so as to obtain a first target feature map of the image group.

205. And adjusting at least one alternative region chain of each image group according to the feature vectors of a preset number of image frames in each image group and the corresponding first target feature maps to respectively obtain at least one target region chain of each image group.

The target area chain comprises a plurality of target areas which are positioned in different image frames of the corresponding image group and belong to the same target. The target area is an area where a target is located in the image frame, and the target may be a person, a car, or the like.

For any image group, any image frame of the image group may include at least one object, and any object may appear in at least one image frame of the image group, according to feature vectors of a preset number of image frames of the image group and corresponding first object feature maps, an adjustment process is performed on an alternative region included in each alternative region chain of the image group, so that an object is included in an object region obtained after the adjustment process, and at least one object region chain of the image group is obtained by performing an adjustment process on an alternative region included in at least one alternative region chain of the image group, where one object has one object region chain.

206. And creating a target area combination chain of at least one target according to the target area chains of the plurality of image groups and the arrangement sequence of the plurality of image groups.

Each target area combination chain comprises a plurality of target areas belonging to the same target, and the target areas are respectively positioned in different image frames. For example, the target region combination chain of the target 1 includes 4 target regions, a first target region is located in a fourth image frame of the plurality of image frames, a second target region is located in a fifth image frame of the plurality of image frames, a third target region is located in a sixth image frame of the plurality of image frames, and a fourth target region is located in a seventh image frame of the plurality of image frames.

Since each image group has at least one object region chain, and the adjacent image groups include at least one different image frame, object region chains of any object in the multiple image groups are associated according to the arrangement sequence of the multiple image groups, object regions of the object in different image frames can be obtained, and thus an object region combination chain of the object is created.

According to the method provided by the embodiment of the application, a plurality of image groups are created according to a plurality of image frames, the image groups are used as units to acquire the target area chains, association relations exist among target areas of different image frames in each image group, the accuracy of target detection is improved, the acquired first target feature map of each image group is used for adjusting the alternative area chains of each image group, the target areas contained in the acquired target area chains belong to the same target, the accuracy of the acquired target area chains is improved, the different image groups comprise the same image frame, association relations exist among the target area chains of the image groups, and the target area combination chains can be formed, so that the tracking of the target is realized, and the accuracy of the target area combination chains is improved.

Fig. 3 is a flowchart of a target tracking method provided in an embodiment of the present application, which is applied to a computer device, and as shown in fig. 3, the method includes:

301. the computer equipment acquires video data and performs frame extraction processing on the video data to obtain a plurality of image frames.

Wherein the video data is a continuous sequence of images, the video data comprising a plurality of successive image frames. The video data can be obtained by shooting by a computer device or receiving video data sent by other devices.

The video data is acquired for the purpose of target tracking, and one or more image frames in the video data comprise a target to be tracked, wherein the target can be a pedestrian, an automobile and the like. The video data is subjected to frame extraction processing, image frames are extracted from a continuous image sequence, and therefore a plurality of image frames are obtained, and the obtained image frames can be arranged according to time sequence, so that the target can be tracked through the image frames subsequently.

In one possible implementation, this step 301 may include: the method comprises the steps of obtaining video data, and carrying out frame extraction processing on the video data according to a preset time interval to obtain a plurality of images.

The preset time interval may be any time length, such as 0.1 second, 0.2 second, and the like. In a plurality of images obtained by performing frame extraction processing at preset time intervals, the interval between any two adjacent images is the preset time interval.

It should be noted that, in the embodiment of the present application, a plurality of image frames are obtained by performing frame extraction processing on video data, and in another embodiment, step 301 does not need to be performed, and a plurality of image frames in sequence may be obtained in other manners. For example, a computer device acquires video data, acquires each image frame included in the video data, and obtains a plurality of image frames.

302. The computer device takes every preset number of image frames in the plurality of image frames as an image group to obtain a plurality of image groups arranged in sequence.

The preset number may be any set number, such as 2, 3, or 5. The arrangement order of the plurality of image groups may be the order of the generation time of the image groups, or the arrangement order of the image frames included in the image groups, for example, the arrangement order of the image frames included in the image groups.

In the plurality of image groups, image frames included in different image groups are not completely identical, and any two adjacent image groups include at least one identical image frame and at least one different image frame. For example, the plurality of image frames arranged in sequence include an image frame 1, an image frame 2, an image frame 3, and an image frame 4, and the preset number is 2, then the image frame 1 and the image frame 2 may be regarded as an image group a, the image frame 2 and the image frame 3 may be regarded as an image group B, the image frame 3 and the image frame 4 may be regarded as an image group C, and the obtained plurality of image groups are arranged according to the sequence of the image frames, that is, the sequence of the plurality of image groups is: image group A, image group B and image group C.

In one possible implementation, this step 302 may include: according to the arrangement sequence of the image frames, every preset number of adjacent image frames are used as an image group, and a plurality of image groups arranged in sequence are obtained.

Wherein the preset number of adjacent image frames indicates that the preset number of image frames are adjacent to each other in the arrangement order of the plurality of images. For example, the plurality of image frames includes: the preset number of the image frames 1, 2, 3, 4, 5 and 6 is 3, and the preset number of the adjacent image frames may be the image frames 1, 2 and 3, the image frames 2, 3 and 4, the image frames 3, 4 and 5, or the image frames 4, 5 and 6.

When a plurality of image groups are obtained, according to the arrangement sequence of a plurality of image frames, each image frame is sequentially used as a starting image frame from the first image of the plurality of images, a preset number of image frames are selected as an image group from the starting image frame, a plurality of image groups are obtained, the preset number of image frames included in each image group are adjacent to each other in the plurality of image frames which are arranged in sequence, and the plurality of image groups can be arranged according to the arrangement sequence of the starting image frame of each image group in the plurality of images. For example, the plurality of image frames includes: the image processing method comprises the following steps that image frames 1, 2, 3 and 4 are preset in number of 2, and the image frames 1, 2 and 3 are respectively used as first image frames, so that a plurality of image groups are obtained: the image group A comprises an image frame 1 and an image frame 2, the image group B comprises an image frame 2 and an image frame 3, and the image group C comprises an image frame 3 and an image frame 4.

303. And calling a feature extraction model by the computer equipment to acquire feature vectors of a preset number of image frames in each image group.

Wherein the feature extraction model is a model for extracting a feature vector of the image frame. The feature vector is a vector for representing features of the image frame, and the feature vector is different for different image frames. The preset number of image frames of any image group are input into the feature extraction model, and the feature extraction model outputs the feature vector of each image frame in the image group, so that the feature vectors of the preset number of image frames in each image group in the plurality of image groups can be acquired. The Feature extraction model may be ResNet50(Residual Network 50, Residual Network model 50), FPN (Feature Pyramid Network model), Google Network (a Network model), Shuffle Network (a Network model), or a Network model composed of ResNet50 and FPN.

In one possible implementation manner, the computer device sequentially obtains the feature vectors of the image frames in each image group according to the arrangement order of the image groups, and this step 303 may include the following two manners:

the first mode is as follows: in response to the first image group being a first image group of the plurality of image groups, the computer device invokes a feature extraction model to perform feature extraction on each image frame in the first image group to obtain a feature vector for each image frame in the first image group.

The second mode is as follows: and in response to the first image group not being the first image group in the plurality of image groups, determining the feature vector of the specified image frame in the second image group as the feature vector of the corresponding specified image frame in the first image group, calling a feature extraction model, and performing feature extraction on other image frames except the specified image frame in the first image group to obtain the feature vectors of the other image frames.

The second image group is a previous image group of the first image group in the plurality of image groups, and the designated image frame is the same image frame in the first image group and the second image group. Since at least one identical image frame is included in the first image group and the second image group, the identical image frames in the first image group and the second image group can be both regarded as the specified image frame. The feature vector of the specified image frame in the second image group is obtained by calling a feature extraction model, namely the feature vector is obtained after the feature extraction model is called to extract the features of the specified image frame in the second image group. Since the computer device sequentially acquires the feature vectors of the image frames in each image group in the order of arrangement of the plurality of image groups, the feature vector of each image frame in the second image group, that is, the feature vector of the specified image frame in the second image group, has already been acquired by calling the feature extraction model before the feature vectors of the image frames in the first image group are acquired.

In the embodiment of the application, the computer device calls the feature extraction model according to the arrangement sequence of the plurality of image groups, and sequentially obtains the feature vectors of the image frames in each image group. Because any two adjacent image groups comprise at least one same image frame, in order to avoid repeated extraction of feature vectors of the same image frame, when the feature vectors of the image frames in the first image group are obtained, the feature vectors of the designated image frames in the second image group can be obtained and used as the feature vectors of the designated image frames in the first image group, so that when the feature vectors of a preset number of image frames in the first image group are obtained, only the feature vectors of other image frames except the designated image frames need to be obtained, the time for obtaining the feature vectors of the image frames is saved, and a feature vector sharing mechanism is realized.

In a possible implementation manner, the computer device calls the feature extraction model according to the arrangement sequence of the plurality of image groups, sequentially obtains the feature vectors of the image frames in each image group, calls the feature extraction model, stores the feature vectors of the preset number of image frames in the current image group into the memory after obtaining the feature vectors of the preset number of image frames in the current image group, responds to the calling of the feature extraction model, stores the feature vectors of the preset number of image frames in the next image group into the memory after obtaining the feature vectors of the preset number of image frames in the next image group, and deletes the feature vectors of the preset number of image frames in the previous image group.

When the computer equipment sequentially acquires the feature vectors of a preset number of image frames in a plurality of image groups according to the arrangement sequence of the image groups, the feature vectors of the image frames in the current image group are stored in the memory after the feature vectors of the image frames in the current image group are acquired each time, so that the feature vectors of the appointed image frames in two image groups can be acquired when the feature vectors of the image frames in the next image group are acquired, repeated acquisition of the feature vectors of the same image frame is avoided, the time for acquiring the feature vectors of the image frames is saved, and the feature vectors of the image frames in the previous image group are always stored in the memory, so that a memory sharing mechanism is realized, and the time consumption for acquiring the feature vectors of the image frames is reduced.

304. The computer device calls a feature detection model to perform target detection on feature vectors of a preset number of image frames in each image group, and at least one alternative region chain and a corresponding first target feature map of each image group and a second target feature map corresponding to the first image frame of each image group are obtained respectively.

The feature detection model is used for acquiring at least one candidate region chain in the image group, a corresponding first target feature map and a second target feature map corresponding to a first image frame of the image group according to feature vectors of the image frames in the image group. The chain of candidate regions represents an association between candidate regions that may belong to the same object in a plurality of image frames, and the chain of candidate regions includes a plurality of candidate regions located in different image frames of the corresponding image group. The candidate region is a region where an object is located in the image frame, and the object may be a person, a car, or the like.

The second object feature map comprises the probability that at least one candidate region of the first image frame of the corresponding image group contains the object, the higher the probability is, the more likely the corresponding candidate region contains the object is, and the lower the probability is, the less likely the corresponding candidate region contains the object is.

Calling a feature detection model to process feature vectors of a preset number of image frames in any image group to obtain at least one alternative region in each image frame in the preset number of image frames, and associating the alternative regions which possibly belong to the same target in the preset number of image frames to obtain an alternative region chain, thereby obtaining at least one alternative region chain of the image group. And identifying the contents contained in the multiple candidate regions contained in each candidate region chain, determining whether the candidate regions of different image frames contain the same target, and obtaining the probability that the multiple candidate regions in each candidate region chain belong to the same target, thereby obtaining a first target feature map of the image group. And identifying the content contained in the candidate area of the first image frame in the image group, and determining whether the candidate area of the first image frame contains the target, thereby obtaining a second target feature map corresponding to the first image frame in the image group.

In a possible implementation manner, the process of obtaining at least one candidate region chain of each image group may include the following steps 3041 and 3043:

3041. and calling a feature detection model, and carrying out target detection according to the feature vector of the first image frame in any image group to obtain at least one alternative region of the first image frame in the image group.

The candidate area is used for representing the area where the target is located in the first image frame. When the feature detection model is called to detect the target, the target in the image frame can be detected in the form of a detection frame, and when the target is included in the detection frame, the area where the detection frame is located is determined as the alternative area, so that at least one alternative area in the first image frame is obtained.

In one possible implementation, this step 3041 may include: calling a feature detection model, carrying out target detection according to the feature vector of the first image frame to obtain a plurality of candidate regions of the first image frame and corresponding probabilities of the candidate regions, and selecting at least one candidate region with the probability larger than a preset threshold value from the candidate regions.

The probability corresponding to the candidate region is used to represent the probability that the candidate region includes the target, the higher the probability is, the lower the probability is, the higher the probability is, the probability is. The preset threshold may be any set value, such as 0.5, 0.7, and the like, by setting the preset threshold, the probability is greater than the preset threshold, it may be determined that the candidate region includes the target, and the probability is less than the preset threshold, it may be determined that the candidate region does not include the target.

By selecting at least one candidate region with the probability greater than a preset threshold from the multiple candidate regions of the first image frame, the accuracy of the determined at least one candidate region is improved, and the accuracy of at least one candidate region chain obtained through the at least one candidate region subsequently is improved.

3042. And processing at least one alternative region according to the feature vectors of a preset number of image frames in the image group to respectively obtain alternative regions of other image frames in the image group.

Because the first image frame of the image group comprises at least one alternative region, at least one alternative region is processed according to the feature vectors of a preset number of image frames in the image group to obtain alternative regions of other image frames in the image group, and at least one alternative region in each other image frame is matched with at least one alternative region in the first image frame one by one.

In one possible implementation, this step 3042 may include: and mapping the at least one alternative region according to the feature vectors of a preset number of image frames in the image group to respectively obtain alternative regions with the same position as the at least one alternative region in other image frames of the image group.

The candidate region may be represented in the form of a detection box or in the form of coordinates, and if the candidate region is a rectangle, the candidate region may be represented in the form of coordinates of the upper left corner of the rectangle and the lower right corner of the rectangle, or in the form of coordinates of the lower right corner of the rectangle and the upper left corner of the rectangle.

After at least one candidate region of a first image frame of any image group is determined, mapping at least one candidate region into other image frames of the image group according to the position of the at least one candidate region, and enabling the position of the candidate region in the other image frames to be the same as the position of a corresponding candidate region in the first image frame, thereby obtaining the candidate regions of the other image frames of the image group.

3043. And associating the matched alternative regions belonging to different image frames in the preset number of image frames to obtain at least one alternative region chain of the first image group.

Because each image frame comprises at least one alternative region and the alternative regions in different image frames are matched one by one, the matched alternative regions in different image frames are associated according to the matching relation of the alternative regions in different image frames to obtain at least one alternative region chain.

305. And the computer equipment calls a feature detection model, and adjusts at least one alternative region chain of each image group according to the feature vectors of a preset number of image frames in each image group, the first target feature map corresponding to each image group and the second target feature map corresponding to the first image frame of each image group to respectively obtain at least one target region chain of each image group.

For any image group, any image frame of the image group may include at least one object, and any object may appear in at least one image frame of the image group, according to feature vectors of a preset number of image frames of the image group, a corresponding first object feature map, and a second object feature map corresponding to a first image frame of the image group, an adjustment process is performed on an alternative region included in each alternative region chain of the image group, so that an object is included in an object region obtained after the adjustment process, an object region chain corresponding to the alternative region chain is obtained by performing an adjustment process on an alternative region included in any alternative region chain, and an object region in the object region chain includes the same object, thereby obtaining at least one object region chain of the image group. The target area is used to indicate an area where a target is located in the image frame, and the target area may be indicated in the form of a detection frame or in the form of coordinates, where if the target area is a rectangle, the target area may be indicated by coordinates of an upper left corner and a lower right corner of the rectangle, or may be indicated by coordinates of a lower right corner and an upper left corner of the rectangle.

Because a plurality of image frames in the same image group are adjacent, and target area change difference of a target is small in different image frames in the same image group, at least one alternative area chain of the image group can be determined through an alternative area in a first image frame in the same image group, and then the alternative area in each alternative area chain is zoomed and translated, so that the target area of the target in each image frame can be determined, and the target area chain can be obtained. As shown in fig. 4, the image group includes an image frame 1 and an image frame 2, the image frame 1 includes two target regions, the image frame 2 includes two target regions, and the two target regions belonging to the same target in the image frame 1 and the image frame 2 are associated to obtain two target region chains.

Because the first target feature map includes the probability that a plurality of candidate regions included in each candidate region chain of the corresponding image group belong to the same target, and the second target feature map includes the probability that at least one candidate region of the first image frame of the corresponding image group includes the target, the candidate region chains are adjusted through the first target feature map and the second target feature map, so that a plurality of target regions included in the target region chains all include the target, and the targets included in the plurality of target regions belong to the same target.

In one possible implementation, this step 305 may include: calling a feature detection model, fusing feature vectors of a preset number of image frames in each image group, a first target feature map corresponding to each image group and a second target feature map corresponding to a first image frame of each image group to obtain a polymerization feature vector of each image group, and adjusting at least one alternative region chain of each image group according to the polymerization feature vector of each image group to respectively obtain at least one target region chain of each image group.

For any image group, when fusing the feature vectors of a preset number of image frames in the image group, the first target feature map corresponding to the image group, and the second target feature map corresponding to the first image frame of the image group, the feature vectors of each image frame may be sequentially spliced according to the arrangement order of the preset number of image frames, and the spliced feature vectors, the first target feature map corresponding to the image group, and the second target feature map corresponding to the first image frame of the image group are fused, so as to obtain the aggregated feature vector of the image group.

And then, calling a feature detection model to adjust at least one alternative region chain of any image group according to the aggregation feature vector of the image group, so that each target region in the obtained target region chain comprises a target, the target regions contained in each target region chain belong to the same target, and the target region in each image frame is associated with the target regions of other image frames, thereby improving the accuracy of detecting the target region and the accuracy of the target region chain.

In addition, the first target feature map and the second target feature map can both be represented by vectors, so that when acquiring the aggregation feature vector of the image group, the feature vectors of each image frame are sequentially spliced according to the arrangement sequence of a preset number of image frames, elements located at the same position in the spliced feature vector, the vector of the first target feature map and the vector of the second target feature map are multiplied, and the multiplied elements are combined to form the aggregation feature vector of the image group. That is, when acquiring the aggregate feature vector of the image group, the aggregate feature vector can be obtained by Element-Wise Product or other methods.

In addition, for the feature detection model, the feature detection model includes a classification layer, a verification layer, and a regression layer. Through a classification layer of the feature detection model, target detection is carried out according to a feature vector of a first image frame in the first image group, and a target or a background in the first image frame can be distinguished, so that a second target feature map is obtained, wherein the second target feature map comprises the probability that a candidate region in the first image frame contains the target. Through a verification layer of the feature detection model, at least one candidate region chain is obtained according to feature vectors of a preset number of image frames in the first image group, and whether the candidate regions in each candidate region chain contain the same target or not is verified to obtain a first target feature map of the image group. Processing at least one candidate region chain of the image group according to the feature vectors of a preset number of image frames in the first image group, the first target feature map of the image group and the second target feature map of the first image frame of the image group through a regression layer of a feature detection model to obtain at least one target region chain of the image group.

It should be noted that, in the embodiment of the present application, the target region chain of each image group is obtained through at least one candidate region chain of each image group, the corresponding first target feature map, and the second target feature map corresponding to the first image frame of each image group, in another embodiment, without performing step 304-305, a feature detection model may be invoked, carrying out target detection on the feature vectors of a preset number of image frames in each image group to respectively obtain at least one alternative region chain and a corresponding first target feature map of each image group, according to the feature vectors of a preset number of image frames in each image group and the corresponding first target feature map, and adjusting at least one candidate region chain of each image group to respectively obtain at least one target region chain of each image group.

In addition, before calling the feature detection model, the feature detection model needs to be trained, and the feature detection model may be trained by the computer device or may be sent to the computer device after being trained by another device.

When training the feature detection model, the computer device obtains a plurality of sample sets, each sample set comprises feature vectors of a preset number of image frames of a sample image group and a sample target area chain of the sample image group, the feature vectors of the preset number of image frames of the sample image group are used as the input of the feature detection model, the sample target area chains of the sample image group are used as the output of the feature detection model, and the feature detection model is trained.

In the process of training the feature detection model, calling the feature detection model, processing feature vectors of a preset number of image frames of any sample image group to obtain a prediction target area chain of the sample image group, determining the difference between the prediction target area chain and the sample target area chain according to the prediction target area chain of the sample image group and the sample target area chain of the sample image group, and adjusting the feature detection model according to the difference to reduce the difference between the prediction target area chain and the sample target area chain.

In the training process of the feature detection model, determining an output value of a loss function of the feature detection model according to a prediction target area chain of the sample image group and a sample target area chain of the sample image group, adjusting the feature detection model according to the output value of the loss function, and stopping training the feature detection model to obtain the trained feature detection model in response to the fact that the output value of the loss function of the feature detection model is smaller than a preset loss threshold. The output value of the loss function is a value calculated according to a difference between the prediction target region chain and the sample target region chain, and the preset loss threshold may be any set value, such as 0.3, 0.2, and the like.

In addition, the feature detection model may include a classification layer, a verification layer, and a regression layer, and when the feature detection model is trained, a Focal Loss function (focai Loss function) may be used for both the classification layer and the verification layer, and an absolute Loss function (absolute Loss function) may be used for the regression layer, and output values of the Loss functions of the classification layer, the verification layer, and the regression layer are weighted and summed to serve as output values of the Loss function of the feature detection model. The weights of the loss functions of the classification layer, the verification layer and the regression layer can be set at will.

It should be noted that, in the embodiment of the present application, at least one target region chain of each image group is obtained through the feature extraction model and the feature detection model, but in another embodiment, when the step 303 and the step 305 are executed, the corresponding step may be directly executed by a computer device without invoking the feature extraction model and the feature detection model.

306. The computer device regards the same image frame in any two adjacent image groups as a designated image frame and regards the target area of the designated image frame as a designated target area.

Since any two adjacent image groups include at least one same image frame and at least one different image frame, the same image frame can be used as a designated image frame, and a target area included in the designated image frame can be used as a designated target area.

307. The computer device determines a target area matching relationship between any two image groups according to the designated image frame and the designated target area in any two image groups.

The target area matching relationship is used for representing the matching relationship between the designated target areas of any one of the same designated image frames in any two image groups, and the target area matching relationship comprises the matching relationship between any one of the target areas of any one of the designated image frames in the first image group and the target area belonging to the same target in the same designated image frame in the second image group. When determining the target region matching relationship between any two image groups, the target region matching relationship may be determined by a KM (Kuhn-Munkres, bipartite graph maximum weight matching) algorithm, a bipartite graph matching algorithm, a greedy method, or the like.

For example, if the designated image frame in the first image group includes a target area 1, a target area 2, and a target area 3, and the same designated image frame in the second image group includes a target area 4, a target area 5, and a target area 6, the target area matching relationship between the two image groups is shown in table 1, where the target area 1 matches the target area 4, the target area 2 matches the target area 4, and the target area 3 matches the target area 6.

TABLE 1

In any two image groups, the same designated image frame comprises the same target, designated target areas in the same designated image frame in the two image groups are matched, and the matching relation between the designated target areas belonging to the same target is determined.

For example, image group a includes image frame 1 and image frame 2, image group B includes image frame 2 and image frame 3, image group a includes two chains of target regions, and image group B includes two chains of target regions. The image frame 2 of the image group a and the image frame 2 of the image group B each include two target regions, and the determined matching relationship between the image frame 2 of the image group a and the target regions of the image frame 2 of the image group B is as shown in fig. 5.

In one possible implementation, the step 307 may include the following steps 3071-3073:

3071. and determining multiple groups of alternative matching relations according to the specified image frames and the specified target areas in any two image groups.

The candidate matching relations comprise matching relations between each designated target region of any designated image frame in the first image group and any designated target region of the same designated image frame in the second image group, and the multiple groups of candidate matching relations are different.

The same designated image frame may include a plurality of designated target regions, and when the plurality of designated target regions of the designated image frames of the two image groups are matched, there are a plurality of matching manners, and a plurality of matching relationships may be obtained, and then the plurality of matching relationships are all used as candidate matching relationships. For example, if any one of the designated image frames in the first image group includes the designated target area 1 and the designated target area 2, and the same designated image frame in the second image group includes the designated target area 3 and the designated target area 4, 2 sets of matching relationships can be determined, where the first set of matching relationships is: the designated target area 1 is matched with the designated target area 3, and the designated target area 2 is matched with the designated target area 4; the second group of matching relations are: the designated target area 1 matches the designated target area 4, and the designated target area 2 matches the designated target area 3.

3072. And respectively determining the sum of the similarity of every two matched specified target areas in each group of candidate matching relations as the matching degree of each group of candidate matching relations.

The similarity of the two matched specified target regions is used for representing the similarity of the two specified regions, the greater the similarity is, the greater the possibility that the two specified regions belong to the same target is, and the smaller the similarity is, the less the possibility that the two specified regions belong to the same target is. The matching degree of the alternative matching relationship is used for representing the matching degree between the designated target regions matched in the alternative matching relationship, the greater the matching degree is, the more accurate the designated target regions matched pairwise in the matching relationship is, and the smaller the matching degree is, the less accurate the designated target regions matched pairwise in the matching relationship is.

In one possible implementation, determining the similarity of two matching specified target regions may include: and determining the ratio of the intersection area of the two specified target areas to the union area of the two specified target areas as the similarity between the two specified image frames.

The ratio of the intersection area of the two designated target regions to the union area of the two designated target regions is also the intersection ratio of the two designated target regions. Since the regions where the targets included in the same designated image frame are located are fixed, if two designated target regions belong to the same target, the two designated target regions should be completely overlapped, that is, the intersection area of the two designated target regions is equal to the union area of the two designated target regions, and if the two designated target regions do not belong to the same target, the intersection area of the two designated target regions is small, and the union area of the two designated target regions is large, so that the similarity of the two designated image frames can be determined through the intersection area and the union area of the two designated target regions.

In one possible implementation, determining the intersection area and the union area of the two specified target regions may include: and determining the intersection area and the union area of the two specified target areas according to the coordinates of the two specified target areas.

For example, if both the two specified target regions are rectangles, the intersection area between the two rectangles is the intersection area of the two specified target regions, and the union area between the two rectangles is the union area of the two specified target regions. The coordinates of the two designated target areas are the coordinates of the four corners of the two rectangles, so that the overlapping area of the two rectangles can be determined through the coordinates of the four corners of the two rectangles, the overlapping area is used as the intersection area of the two designated target areas, and the difference between the sum of the areas of the two rectangles and the overlapping area is used as the union area of the two designated target areas.

3073. And selecting the candidate matching relationship with the maximum matching degree from the multiple groups of candidate matching relationships, and determining the candidate matching relationship as the target region matching relationship.

The larger the matching degree is in the multiple alternative matching relations, the more accurate the designated target area which shows pairwise matching in the matching relation is, so that the alternative matching relation with the maximum matching degree is selected from the multiple groups of alternative matching relations and determined as the target matching relation, and the accuracy of the target matching relation is improved.

308. And the computer equipment determines the target area chain matching relationship between any two image groups according to the target area chain and the target area matching relationship of any two image groups.

The target region chain matching relationship is used for representing the matching relationship between the target region chains of any two image groups, and the target region chain matching relationship comprises any item target region chain of a first image group and the matching relationship between the item target region chain of a second image group and the target region chain of the same target, so as to represent that the two matched target region chains belong to the same target. For example, the first image group includes a target region chain 1 and a target region chain 2, the second image group includes a target region chain 3 and a target region chain 4, and the target region chain matching relationship between the first image group and the second image group is as follows: target region chain 1 matches target region chain 3, and target region chain 2 matches target region chain 4.

Since the object region matching relationship is used to represent the matching relationship between the specified object regions of any one of the same specified image frames in any two image groups, and different object region chains of each image group respectively contain one specified object region, the matching relationship between the object region chains of two image groups can be determined according to the matching relationship between the specified object regions of any one of the same specified image frames, so that any two object region chains containing the matching object regions are matched.

For example, the first image group includes an object region chain 1 and an object region chain 2, the second image group includes an object region chain 3 and an object region chain 4, the specified image frame of the first image group includes a specified object region a and a specified object region B, the specified image frame of the second image group includes a specified object region C and a specified object region D, the object region chain 1 includes the specified object region a, the object region chain 2 includes the specified object region B, the object region chain 3 includes the specified object region C, the object region chain 4 includes the specified object region D, and the object region matching relationship is: matching the designated target area A with the designated target area C, and matching the designated target area B with the designated target area D, wherein the determined target area chain matching relationship is as follows: target region chain 1 matches target region chain 3, and target region chain 2 matches target region chain 4.

In addition, the same target identifier may be set for the target region chains matched in any two image groups, where the target identifier is used to indicate the target to which the target region chain belongs, so that the target region chains belonging to the same target may be determined subsequently according to the target identifier.

309. And the computer equipment combines other target areas except the specified target area in any target area chain with another target area chain matched with the any target area chain according to the matching relation of the target area chains to obtain a target area combined chain.

Wherein the other target area is a target area included in other image frames except the designated image frame in the image group. The target area combination chain represents a combination chain formed by associating target areas belonging to the same target in a plurality of image frames.

Because two adjacent image groups include at least one same designated image frame and at least one different image frame, designated target regions included in the same designated image frame may be the same, and two matched target region chains may include the same designated target region, when the two matched target region chains in the two image groups are combined, only other target regions except for the designated target region in any item target region chain are combined with another target region chain, and a target region chain formed by the target regions included in the union image frame of the two image groups can be obtained. In the above manner, the target region chains of the plurality of image groups are combined, thereby obtaining a target region combination chain.

In addition, for any two adjacent image groups in the plurality of image groups, the second image frame of the first image group is the same as the first image frame of the second image group, after the target area chains of the plurality of image groups and the target area chain matching relationship between every two adjacent image groups are obtained, the matching relationship between the target area chains of the plurality of image groups is determined, and the target areas of the first image frames of other image groups except the last image group in the plurality of image groups are combined with the target area chain of the last image group to obtain a target area combination chain.

When combining the object region chains of a plurality of image groups, for any two adjacent image groups, any object region chain of the second image group contains an interruption object region of a designated image frame, the object region chain is independently used as an object region combination chain, and when combining with the object region chain of the next image group of the second image group, the object region combination chain is combined with the object region chain matched with the next image group to obtain a new object region combination chain.

It should be noted that, in the embodiment of the present application, the target area combination chain is obtained according to the target area chain matching relationship, but in another embodiment, the step 306 and the step 309 do not need to be executed, and the target area combination chain of at least one target can be created according to the target area chains of the plurality of image groups and the arrangement order of the plurality of image groups.

It should be noted that, the embodiment of the present application is only described in terms of the designated target areas that are matched with the designated image frames of the two image groups, and in another embodiment, the designated image frames of the two image groups further include the designated target areas that cannot be matched with each other. When the second image frame of the second image group is identical to the first image frame of the first image group, determining the target region combination chain may further comprise the following steps 3091-3094:

3091. and the first target area chain of the second image group comprises any one interruption target area in the second image frame of the second image group, and the interruption target area is not matched with each specified target area in the first image frame of the first image group, so that the prediction target area belonging to the same target as the interruption target area is predicted in the first image frame of the first image group.

The second image group is a previous image group of the first image group in the plurality of image groups.

That is, the interruption target region is detected in the second image frame of the second image group based on the feature vectors of the preset number of image frames of the second image group, and a target region belonging to the same target as the interruption target region is not detected in the first image frame of the first image group based on the feature vectors of the preset number of image frames of the first image group.

In the first image frame of the first image group, the target to which the interruption target region belongs may be blocked by another object so that the target cannot be detected, or the target may not exist in the first image frame of the first image group. Since the target may reappear in other image frames after the first image frame of the first image group when the target is blocked by other objects and thus cannot be detected, in order to ensure the consistency of the target in the target area of different image frames, the target area may be predicted for the interruption to determine the predicted target area of the target in the next image frame.

In one possible implementation, obtaining the predicted target region may include: determining a first image frame of a second image group and a first image frame of the first image group, determining interval duration in video data, determining a moving direction of a target to which an interruption target region belongs, calling a speed model, determining a moving distance of the target according to the interval duration, determining a first target region belonging to the same target as the interruption target region in the first image frame of the second image group, translating the first target region according to the moving direction and the moving distance to obtain a translated second target region, and adding a predicted target region with the same coordinates as the second target region in the first image frame of the first image group according to the coordinates of the second target region.

In one possible implementation, determining a moving direction of the target to which the interruption target region belongs may include: and determining a third target area belonging to the same target as the interruption target area in the first image frame of the previous image group of the second image group, wherein the connecting line direction of the coordinates of the reference point in the third target area and the coordinates of the reference point in the second target area is used as the moving direction of the target belonging to the interruption target area. The reference point is any point of the target area, for example, the target area is a rectangle, and the upper left corner point of the rectangle is used as the reference point, or the center point of the rectangle is used as the reference point.

3092. In the target region matching relationship between the second image group and the first image group, the matching relationship of the interruption target region and the prediction target region is added.

By adding the matching relationship between the interrupt target region and the prediction target region in the target region matching relationship between the second image group and the first image group, the matching relationship between the target region chain containing the interrupt target region in the first image group and the prediction target region chain in the second image group can be determined subsequently through the matching relationship, so that the target region chain and the prediction target region chain can be combined subsequently, and a target region combination chain is obtained.

3093. According to the feature vectors of a preset number of image frames in the first image group, mapping the prediction target area to respectively obtain the prediction target areas of other image frames in the first image group, and associating the prediction target areas belonging to different image frames in the preset number of image frames to obtain a prediction target area chain of the first image group.

After determining the prediction target region in the first image frame in the first image group, the other image frames except the first image frame in the first image group may be predicted in the same manner as in step 304, so as to obtain the prediction target regions in the other image frames.

And obtaining a prediction target area chain by associating prediction target areas which belong to the same target and belong to different image frames in the second image group, so that the prediction target area chain of the first image group is matched with the target area chain containing the interrupted target area.

And after the prediction target region chain is obtained through prediction in the first image group, the prediction target region chain of the first image group can be used as a real target region chain to be matched with the target region chain of the next image group, and the target region chain of the first image group and the target region chain of the next image group are combined through the matching relationship of the first image group and the target region chain of the next image group to obtain a target region combination chain.

3094. And if the third image group comprises a second target area chain matched with the prediction target area chain, combining the first target area chain, the prediction target area chain and the second target area chain according to the matching relation between the interrupt target area and the prediction target area to obtain a target area combined chain.

Wherein the third image group is a subsequent image group of the first image group.

The same appointed image frame in the second image group and the first image group is used as a first appointed image frame, the same appointed image frame in the first image group and the third image group is used as a second appointed image frame, when a first target region chain, a prediction target region chain and a second target region chain are combined, target regions in other image frames except the first appointed image frame in the first target region chain are combined with the prediction target region chain to obtain a first target region combination chain, and target regions in other image frames except the second appointed image frame in the first target region combination chain are combined with the second target region chain to obtain a second target region combination chain, namely the target region combination chain formed by combining the first target region chain, the prediction target region chain and the second target region chain.

It should be noted that, in the embodiment of the present application, only the third image group includes the second target region chain matching the prediction target region chain, but in another embodiment, the third image group does not include the second target region chain matching the prediction target region chain, according to the above-mentioned step 3091 and 3093, the prediction target region chain associated with the prediction target region of the different image frames in the third image group is obtained, the above-mentioned steps are repeatedly executed, and the fourth image group includes the third target region chain matching the target region chain associated with the prediction target region of the previous image group, and then the first target region chain, the third target region chain, and the prediction target region chain predicted in the image group between the first image group and the fourth image group are associated to obtain the target region combination chain. The fourth image group is a first number of image groups after the second image group, the first number is not more than a preset number, and the preset number can be an arbitrarily set numerical value, such as 10, 8 and the like.

In addition, in response to the preset number of image groups subsequent to the second image group not including the target region chain matching the prediction target region chain of the previous image group, the interruption target region of the previous image group is not predicted any more in the preset number of image groups subsequent to the one image group.

310. And the computer equipment adds the moving track of at least one target in the video data according to the target area combination chain of the at least one target to obtain the updated video data.

Since the computer device determines the object region combination chain of the at least one object, i.e. determines the position of the at least one object in the plurality of image frames, the movement trajectory of the at least one object may be determined, adding the movement trajectory of the at least one object in the video data.

In addition, when the moving track of any object is determined, the reference point of the object area combination chain of the object in the object areas of the image frames is determined, so that a plurality of image frames containing the object are obtained, and each image frame contains one reference point of the object. Adding a line connected with the reference point in each image frame, wherein the line is formed by connecting the reference points in the previous image frames according to the positions of the reference points in the previous image frames so as to simulate the effect that the target moves along the line in a plurality of image frames, and the line can represent the moving track of the target. The reference point may be any point of the target area, for example, the target area is a rectangle, and the reference point is a middle point of a lower side of the rectangle.

311. And playing the updated video data by the computer equipment to display a picture of at least one target moving according to the corresponding moving track.

And displaying the moving track of at least one target when the updated video data is played, and displaying a picture of at least one target moving according to the corresponding moving track.

In addition, the target area may be displayed in the form of a detection box, and when the updated video data is played, the target area of the currently played video frame may be displayed in the form of a detection box, and the moving track of the target in the video frame before the currently played video frame may be displayed. As shown in fig. 6, fig. 6 shows any video frame in the played video data, and in fig. 6, the target areas of a plurality of targets in the video frame and the moving track of each target are displayed.

According to the method provided by the embodiment of the application, a plurality of image groups are created according to a plurality of image frames, the image groups are taken as a unit to obtain the target area chains, association relations exist among the target areas of different image frames in each image group, the accuracy of target detection is improved, the alternative area chains of each image group are adjusted through the obtained first target feature map of each image group, the target areas contained in the obtained target area chains belong to the same target, the accuracy of the obtained target area chains is improved, the different image groups comprise the same image frame, association relations exist among the target area chains of the image groups, and the target area combination chain can be formed, so that the target is tracked, and the accuracy of the target area combination chain is improved.

As shown in fig. 7, it is a flowchart of a target tracking method provided in the embodiment of the present application, where the method includes:

1. a plurality of image frames of video data are acquired in real time.

2. And taking each preset number of image frames in the plurality of image frames as an image group to obtain a plurality of image groups which are arranged in sequence.

3. And calling a feature extraction model to obtain feature vectors of a preset number of image frames in each image group.

4. And calling a feature detection model, and carrying out target detection on feature vectors of a preset number of image frames in each image group to obtain at least one target area chain of each image group.

5. And combining the target region chains of the multiple image groups according to the target region chains of the multiple image groups and the target region chain matching relationship between every two adjacent image groups to obtain at least one target region combination chain.

6. And detecting whether the video data is finished or not, wherein the video data is not finished, and repeatedly executing the steps 1-4.

As shown in fig. 8, two adjacent image groups a and B, the image group a includes an image frame 3 and an image frame 4, the image group B includes an image frame 4 and an image frame 5, each image group is processed by a feature extraction model and a feature detection model to obtain a 3-item target region chain in each image group, and a target region matching relationship between a target region in the image frame 4 of the image group a and a target region in the image frame 4 of the image group B is determined by matching the target regions in the same image frame 4 of the image group a and the image group B by means of merging and matching.

FIG. 9 is a flowchart of invoking a feature extraction model and a feature detection model to obtain a target region chain of an image group, where for any image group, the image group includes an image frame 1 and an image frame 2, the image frame 1 and the image frame 2 are respectively input into the feature extraction model, the feature extraction model respectively outputs a feature vector of the image frame 1 and a feature vector of the image frame 2, the feature vectors of the image frame 1 and the image frame 2 are combined to obtain a feature vector combination of the image group, the feature vector combination is input into the feature detection model, the feature detection model is invoked, a second target feature map obtained by the image frame 1 is determined by a classification layer, at least one candidate region chain and a first target feature map of the image group are obtained by a verification layer of the feature detection model according to the feature vector combination of the image group, and a regression layer according to the feature vector combination, And determining a target area chain of the image group according to the first target feature map of the image group and the second target feature map of the first image frame of the image group. In addition, as shown in fig. 10, the classification layer, the verification layer, and the regression layer may each be composed of four convolutional layers, wherein the classification layer includes convolutional layer 1, convolutional layer 2, convolutional layer 3, and convolutional layer 4; the verification layer comprises a convolution layer 5, a convolution layer 6, a convolution layer 7 and a convolution layer 8; the regression layer includes convolutional layer 9, convolutional layer 10, convolutional layer 11, and convolutional layer 12. The convolution kernel of each convolution layer is 3 × 3, the activation functions used by convolution layer 4, convolution layer 8, and convolution layer 12 are Sigmoid (logistic regression) activation functions, and the activation functions used by the remaining convolution layers are receive Linear Unit (rlu) activation functions.

As shown in fig. 11, a plurality of image frames are acquired, any two adjacent image frames are used as an image group, the last image frame is copied, the last image frame and the copied image frame are used as an image group, and a plurality of image groups are obtained, each image group includes two image frames, and the second image frame of the first image group is the same as the first image frame of the second image group in the two adjacent image groups.

And using the last image frame and the copied image frame as an image group, and using a plurality of image frames as the first image frame of the image group in the obtained plurality of image groups respectively, wherein the obtained plurality of image groups have the same number as the plurality of image frames, so that the target region combination chain can be obtained by extracting the target region of the first image frame of each image group for combination.

When the target area chains of the plurality of image groups are acquired, calling a feature extraction model to acquire the feature vector of the image frame of each image group in sequence according to the arrangement sequence of the plurality of image groups. When the feature vector of the image frame of the first image group is obtained, calling a feature extraction model, and respectively extracting features of two image frames of the first image group to obtain the feature vector of each image frame; when the feature vector of the image frame of the second image group is obtained, taking the feature vector of the second image frame of the first image group as the feature vector of the first image frame of the second image group, calling a feature extraction model, and performing feature extraction on the second image frame of the second image group to obtain the feature vector of the second image frame; when a plurality of subsequent image groups are acquired, the feature vector of the second image frame of the previous image group is taken as the feature vector of the first image frame of the current image group each time, only the feature extraction model is called, and feature extraction is carried out on the second image frame of the current image group, so that the feature vectors of the two image frames of the current image group are acquired. For the last image group in the plurality of image groups, only calling a feature extraction model because two image frames included in the image group are the same, and performing feature extraction on the first image frame of the last image group to obtain feature vectors of the two image frames of the last image group.

When the feature extraction model is called to sequentially obtain feature vectors of image frames of a plurality of image groups, the feature vectors of two image frames of each image group output by the feature extraction model are combined to obtain the feature vector of the corresponding image group, the feature vector of each image group is input into the feature detection model to obtain a target area chain of each image group, and a target area of a first image frame of each image group is extracted according to the matching relation of the target area chains between every two adjacent image groups to be combined to obtain a target area combination chain.

Fig. 12 is a schematic structural diagram of an object tracking apparatus according to an embodiment of the present application, and as shown in fig. 12, the apparatus includes:

an image frame obtaining module 1201, configured to obtain a plurality of image frames arranged in sequence;

the image group acquiring module 1202 is configured to use every preset number of image frames in the plurality of image frames as an image group to obtain a plurality of image groups arranged in sequence, where any two adjacent image groups include at least one same image frame and at least one different image frame;

a feature vector acquiring module 1203, configured to acquire feature vectors of a preset number of image frames in each image group;

the object detection module 1204 is configured to perform object detection on feature vectors of a preset number of image frames in each image group, and obtain at least one candidate region chain of each image group and a corresponding first object feature map, where the candidate region chain includes multiple candidate regions located in different image frames of the corresponding image group, and the first object feature map includes a probability that multiple candidate regions included in each candidate region chain of the corresponding image group belong to the same object;

an adjustment processing module 1205, configured to perform adjustment processing on at least one candidate region chain of each image group according to the feature vectors of a preset number of image frames in each image group and the corresponding first target feature map, to obtain at least one target region chain of each image group respectively, where the target region chain includes multiple target regions that are located in different image frames of the corresponding image group and belong to the same target;

a region combination chain creating module 1206, configured to create a target region combination chain of at least one target according to the target region chains of the multiple image groups and the arrangement order of the multiple image groups.

In one possible implementation, as shown in fig. 13, the feature vector obtaining module 1203 includes:

a first determining unit 1231 configured to determine a feature vector of a designated image frame in a second image group as a feature vector of a corresponding designated image frame in the first image group, the second image group being a previous image group of the first image group in the plurality of image groups, the designated image frame being a same image frame in the first image group and the second image group;

the feature extraction unit 1232 is configured to perform feature extraction on other image frames in the first image group except the specified image frame to obtain feature vectors of the other image frames.

In another possible implementation, as shown in fig. 13, the object detection module 1204 includes:

the object detection unit 1241 is configured to perform object detection on feature vectors of a preset number of image frames in each image group, and obtain at least one candidate region chain and a corresponding first object feature map of each image group, and a second object feature map corresponding to a first image frame of each image group, where the second object feature map includes a probability that at least one candidate region of the first image frame of the corresponding image group contains an object.

In another possible implementation manner, as shown in fig. 13, the adjustment processing module 1205 includes:

the adjusting processing unit 1251 is configured to perform adjustment processing on at least one candidate region chain of each image group according to feature vectors of a preset number of image frames in each image group, a first target feature map corresponding to each image group, and a second target feature map corresponding to a first image frame of each image group, to obtain at least one target region chain of each image group, where the second target feature map includes a probability that at least one candidate region of the first image frame of the corresponding image group contains a target.

In another possible implementation manner, the adjustment processing unit 1251 is further configured to fuse feature vectors of a preset number of image frames in each image group, a first target feature map corresponding to each image group, and a second target feature map corresponding to a first image frame in each image group, so as to obtain an aggregated feature vector of each image group; and adjusting at least one candidate region chain of each image group according to the aggregation feature vector of each image group to respectively obtain at least one target region chain of each image group.

In another possible implementation manner, as shown in fig. 13, the object detection module 1204 includes:

an object detection unit 1241, configured to perform object detection according to a feature vector of a first image frame in any image group, to obtain at least one candidate region of the first image frame in the image group;

the candidate region processing unit 1242 is configured to process at least one candidate region according to feature vectors of a preset number of image frames in the image group, so as to obtain candidate regions of other image frames in the image group respectively;

the target area associating unit 1243 is configured to associate target areas, which belong to different image frames and are matched, in a preset number of image frames to obtain at least one candidate area chain of the image group.

In another possible implementation manner, the target detection unit 1241 is further configured to perform target detection according to the feature vector of the first image frame, so as to obtain a plurality of candidate regions of the first image frame and probabilities corresponding to the plurality of candidate regions; and selecting at least one alternative area with the probability larger than a preset threshold value from the multiple alternative areas.

In another possible implementation manner, the feature vector obtaining module 1203 includes:

a feature vector obtaining unit 1233, configured to invoke a feature extraction model, and obtain feature vectors of a preset number of image frames in each image group;

an object detection module 1204, comprising:

the target detection unit 1241 is configured to invoke a feature detection model, perform target detection on feature vectors of a preset number of image frames in each image group, and obtain at least one candidate region chain and a corresponding first target feature map of each image group respectively.

In another possible implementation manner, as shown in fig. 13, the area combination chain creating module 1206 includes:

a second determining unit 1261 configured to, for any two adjacent image groups, regard the same image frame in any two image groups as a designated image frame, and regard a target region of the designated image frame as a designated target region;

a region matching relationship determining unit 1262, configured to determine a target region matching relationship between any two image groups according to a designated image frame and a designated target region in any two image groups, where the target region matching relationship includes a matching relationship between any target region in any one designated image frame in a first image group and a target region belonging to the same target in the same designated image frame in a second image group;

a region chain matching relationship determining unit 1263, configured to determine a target region chain matching relationship between any two image groups according to a target region chain and a target region matching relationship of any two image groups, where the target region chain matching relationship includes a matching relationship between any item target region chain of a first image group and a target region chain belonging to the same target of a second image group;

a combining unit 1264, configured to combine, according to the target region chain matching relationship, another target region chain matched with any target region chain with another target region chain in any target region chain except the specified target region chain, so as to obtain a target region combined chain.

In another possible implementation manner, the region matching relationship determining unit 1262 is further configured to determine multiple sets of candidate matching relationships according to the specified image frames and the specified target regions in any two image sets, where the candidate matching relationships include a matching relationship between each specified target region in any specified image frame in the first image set and any specified target region in the same specified image frame in the second image set, and the multiple sets of candidate matching relationships are different; respectively determining the sum of the similarity of every two matched specified target areas in each group of candidate matching relations as the matching degree of each group of candidate matching relations; and selecting the candidate matching relationship with the maximum matching degree from the multiple groups of candidate matching relationships, and determining the candidate matching relationship as the target area matching relationship.

In another possible implementation, as shown in fig. 13, the second image frame of the second image group is the same as the first image frame of the first image group, and the second image group is a previous image group of the first image group in the plurality of image groups;

the adjustment processing module 1205 further includes:

and a target region prediction unit 1252, configured to predict, in the first image frame of the first image group, a prediction target region that belongs to the same target as the interruption target region if the first target region chain of the second image group includes any interruption target region in the second image frame of the second image group, and the interruption target region is not matched with each designated target region in the first image frame of the first image group.

In another possible implementation manner, the apparatus shown in fig. 13 further includes:

a target area prediction module 1207, configured to perform mapping processing on a prediction target area according to feature vectors of a preset number of image frames in the first image group, to obtain prediction target areas of other image frames in the first image group respectively;

the target area associating module 1208 is configured to associate prediction target areas belonging to different image frames in a preset number of image frames to obtain a prediction target area chain of the first image group.

In another possible implementation manner, as shown in fig. 13, the apparatus further includes:

a matching relationship adding module 1209, configured to add, in the target area matching relationship between the second image group and the first image group, a matching relationship between the interruption target area and the prediction target area.

In another possible implementation, as shown in fig. 13, the image frame acquiring module 1201 includes:

the image frame acquiring unit 12011 is configured to acquire video data, and perform frame extraction processing on the video data to obtain a plurality of image frames.

a video data updating module 1210, configured to add a moving trajectory of at least one target to video data according to a target area combination chain of the at least one target, to obtain updated video data;

the video data playing module 1211 is configured to play the updated video data to display a picture that the at least one object moves according to the corresponding movement track.

Fig. 14 is a schematic structural diagram of a terminal according to an embodiment of the present application, which can implement operations performed by a computer device in the foregoing embodiments. The terminal 1400 may be a portable mobile terminal such as: the mobile terminal comprises a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, Moving Picture Experts compress standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, Moving Picture Experts compress standard Audio Layer 4), a notebook computer, a desktop computer, a head-mounted device, a smart television, a smart sound box, a smart remote controller, a smart microphone, or any other smart terminal. Terminal 1400 can also be referred to as user equipment, a portable terminal, a laptop terminal, a desktop terminal, or other names.

In general, terminal 1400 includes: a processor 1401, and a memory 1402.

Processor 1401 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so forth. Memory 1402 may include one or more computer-readable storage media, which may be non-transitory, for storing at least one instruction for being possessed by processor 1401 for implementing the target tracking methods provided by method embodiments herein.

In some embodiments, terminal 1400 may further optionally include: a peripheral device interface 1403 and at least one peripheral device. The processor 1401, the memory 1402, and the peripheral interface 1403 may be connected by buses or signal lines. Each peripheral device may be connected to the peripheral device interface 1403 by a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1404, display 1405 and audio circuitry 1406.

The Radio Frequency circuit 1404 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuit 1404 communicates with a communication network and other communication devices by electromagnetic signals.

The display screen 1405 is used to display a UI (user interface). The UI may include graphics, text, icons, video, and any combination thereof. The display 1405 may be a touch display and may also be used to provide virtual buttons and/or a virtual keyboard.

The audio circuitry 1406 may include a microphone and a speaker. The microphone is used for collecting audio signals of a user and the environment, converting the audio signals into electric signals, and inputting the electric signals to the processor 1401 for processing or inputting the electric signals to the radio frequency circuit 1404 to realize voice communication. For stereo capture or noise reduction purposes, multiple microphones may be provided, each at a different location of terminal 1400. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is then used to convert the electrical signals from the processor 1401 or the radio frequency circuit 1404 into audio signals.

Those skilled in the art will appreciate that the configuration shown in fig. 14 is not intended to be limiting with respect to terminal 1400 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be employed.

Fig. 15 is a schematic structural diagram of a server 1500 according to an embodiment of the present application, where the server 1500 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 1501 and one or more memories 1502, where at least one instruction is stored in the memory 1502, and the at least one instruction is loaded and executed by the processors 1501 to implement the methods provided by the foregoing method embodiments. Of course, the server may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input/output, and the server may also include other components for implementing the functions of the device, which are not described herein again.

The server 1500 may be used to perform the above-described object tracking method.

The embodiment of the present application further provides a computer device, where the computer device includes a processor and a memory, where the memory stores at least one instruction, and the at least one instruction is loaded and executed by the processor, so as to implement the target tracking method of the foregoing embodiment.

The embodiment of the present application further provides a computer-readable storage medium, where at least one instruction is stored in the computer-readable storage medium, and the at least one instruction is loaded and executed by a processor, so as to implement the target tracking method of the foregoing embodiment.

The embodiment of the present application further provides a computer program, where at least one instruction is stored in the computer program, and the at least one instruction is loaded and executed by a processor, so as to implement the target tracking method of the foregoing embodiment.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only an alternative embodiment of the present application and should not be construed as limiting the present application, and any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method of target tracking, the method comprising:

acquiring a plurality of image frames which are arranged in sequence;

performing target detection on the feature vectors of the preset number of image frames in each image group to obtain at least one alternative region chain and a corresponding first target feature map of each image group, wherein the alternative region chain comprises a plurality of alternative regions located in different image frames of the corresponding image group, and the first target feature map comprises the probability that the plurality of alternative regions contained in each alternative region chain of the corresponding image group belong to the same target;

fusing the feature vectors of the preset number of image frames in each image group, a second target feature map corresponding to a first image frame of each image group and a first target feature map corresponding to each image group to obtain an aggregated feature vector of each image group, wherein the second target feature map comprises the probability that at least one candidate region of the first image frame of the corresponding image group contains a target;

adjusting at least one alternative region chain of each image group according to the aggregation feature vector of each image group to obtain at least one target region chain of each image group, wherein the target region chain comprises a plurality of target regions which are located in different image frames of the corresponding image group and belong to the same target;

and creating a target area combination chain of at least one target according to the target area chains of the image groups and the arrangement sequence of the image groups.

2. The method of claim 1, wherein said obtaining feature vectors for said preset number of image frames in each image group comprises:

determining feature vectors of specified image frames in a second image group as feature vectors of corresponding specified image frames in a first image group, wherein the second image group is a previous image group of the first image group in the plurality of image groups, and the specified image frames are the same image frames in the first image group and the second image group;

and extracting the features of other image frames except the appointed image frame in the first image group to obtain the feature vectors of the other image frames.

3. The method according to claim 1, wherein the performing target detection on the feature vectors of the preset number of image frames in each image group to obtain at least one candidate region chain and a corresponding first target feature map of each image group respectively comprises:

and performing target detection on the feature vectors of the preset number of image frames in each image group to respectively obtain at least one alternative region chain and a corresponding first target feature map of each image group and a corresponding second target feature map of the first image frame of each image group.

4. The method according to claim 1, wherein the performing target detection on the feature vectors of the preset number of image frames in each image group to obtain at least one candidate region chain of each image group respectively comprises:

carrying out target detection according to a feature vector of a first image frame in any image group to obtain at least one alternative region of the first image frame in the image group;

processing the at least one alternative region according to the feature vectors of the preset number of image frames in the image group to respectively obtain alternative regions of other image frames in the image group;

and associating target regions which belong to different image frames and are matched in the preset number of image frames to obtain at least one alternative region chain of the image group.

5. The method of claim 4, wherein the performing object detection based on the feature vector of the first image frame in any image group to obtain at least one candidate region of the first image frame in the image group comprises:

performing target detection according to the feature vector of the first image frame to obtain a plurality of candidate regions of the first image frame and corresponding probabilities of the candidate regions;

and selecting at least one alternative area with the probability greater than a preset threshold value from the multiple alternative areas.

6. The method of claim 1,

the obtaining of the feature vectors of the preset number of image frames in each image group includes:

calling a feature extraction model to obtain feature vectors of the preset number of image frames in each image group;

the performing target detection on the feature vectors of the preset number of image frames in each image group to respectively obtain at least one candidate region chain and a corresponding first target feature map of each image group includes:

and calling a feature detection model, and carrying out target detection on the feature vectors of the preset number of image frames in each image group to respectively obtain at least one alternative region chain and a corresponding first target feature map of each image group.

7. The method according to claim 1, wherein creating a target region combination chain of at least one target according to the target region chains of the plurality of image groups and the arrangement order of the plurality of image groups comprises:

regarding any two adjacent image groups, taking the same image frame in any two image groups as a designated image frame, and taking a target area of the designated image frame as a designated target area;

determining a target area matching relationship between any two image groups according to the designated image frames and the designated target areas in any two image groups, wherein the target area matching relationship comprises a matching relationship between any target area of any one designated image frame in a first image group and a target area belonging to the same target in the same designated image frame in a second image group;

determining a target area chain matching relationship between any two image groups according to the target area chains of any two image groups and the target area matching relationship, wherein the target area chain matching relationship comprises a matching relationship between any item label area chain of the first image group and a target area chain of the second image group, which belongs to the same target;

and combining other target areas except the specified target area in any target area chain with another target area chain matched with any target area chain according to the matching relation of the target area chains to obtain the target area combined chain.

8. The method according to claim 7, wherein said determining a target region matching relationship between any two image groups according to the designated image frame and the designated target region in any two image groups comprises:

determining a plurality of groups of alternative matching relations according to the designated image frames and the designated target areas in any two image groups, wherein the alternative matching relations comprise the matching relation between each designated target area of any one designated image frame in the first image group and any one designated target area in the same designated image frame in the second image group, and the plurality of groups of alternative matching relations are different;

respectively determining the sum of the similarity of every two matched specified target areas in each group of alternative matching relations as the matching degree of each group of alternative matching relations;

and selecting the candidate matching relationship with the maximum matching degree from the multiple groups of candidate matching relationships, and determining the candidate matching relationship as the target area matching relationship.

9. The method of claim 1, wherein a second image frame of a second image group is the same as a first image frame of a first image group, said second image group being a previous image group of said first image group in said plurality of image groups;

the method further comprises the following steps:

and the first target area chain of the second image group comprises any one interruption target area in the second image frame of the second image group, the interruption target area is not matched with each specified target area in the first image frame of the first image group, and then a prediction target area belonging to the same target as the interruption target area is predicted in the first image frame of the first image group.

10. The method of claim 9, further comprising:

mapping the prediction target area according to the feature vectors of the preset number of image frames in the first image group to respectively obtain the prediction target areas of other image frames in the first image group;

and associating the prediction target areas belonging to different image frames in the preset number of image frames to obtain a prediction target area chain of the first image group.

11. An object tracking apparatus, characterized in that the apparatus comprises:

the adjustment processing module is used for fusing the feature vectors of the preset number of image frames in each image group, the second target feature map corresponding to the first image frame of each image group and the first target feature map corresponding to each image group to obtain an aggregated feature vector of each image group, wherein the second target feature map comprises the probability that at least one candidate region of the first image frame of the corresponding image group contains a target; adjusting at least one alternative region chain of each image group according to the aggregation feature vector of each image group to obtain at least one target region chain of each image group, wherein the target region chain comprises a plurality of target regions which are located in different image frames of the corresponding image group and belong to the same target;

12. A computer device comprising a processor and a memory, the memory having stored therein at least one instruction that is loaded and executed by the processor to implement the target tracking method of any of claims 1 to 10.

13. A computer-readable storage medium having stored therein at least one instruction, which is loaded and executed by a processor, to implement the target tracking method of any one of claims 1 to 10.