CN111598924B

CN111598924B - Target tracking method and device, computer equipment and storage medium

Info

Publication number: CN111598924B
Application number: CN202010383524.9A
Authority: CN
Inventors: 彭瑾龙; 王昌安; 罗泽坤; 李剑; 邰颖; 王亚彪; 汪铖杰; 李季檩; 吴永坚; 黄飞跃
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-05-08
Filing date: 2020-05-08
Publication date: 2022-09-30
Anticipated expiration: 2040-05-08
Also published as: CN111598924A

Abstract

The embodiment of the application discloses a target tracking method and device, computer equipment and a storage medium, and belongs to the technical field of computers. The method comprises the following steps: the method comprises the steps of obtaining a plurality of image frames which are arranged in sequence, taking every preset number of image frames in the image frames as an image group, obtaining a plurality of image groups which are arranged in sequence, processing the image groups to obtain at least one target area chain of each image group, and creating a target area combination chain of at least one target according to the target area chains of the image groups and the arrangement sequence of the image groups. The image group is taken as a unit to obtain the target area chain, so that the target areas of different image frames in each image group have an association relationship, the accuracy of target detection is improved, the different image groups comprise the same image frame, the target area chains of a plurality of image groups have an association relationship, the target area combination chain can be formed, and the accuracy of the target area combination chain is improved.

Description

Target tracking method and device, computer equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a target tracking method and device, computer equipment and a storage medium.

Background

With the development of computer technology, the application of target tracking technology is more and more extensive. The target tracking technology realizes the tracking of the target by determining the area where the target in the plurality of image frames is located, and can be applied to the fields of video people stream statistics, video monitoring, suspect tracking and the like.

In the related art, a target tracking method is provided, which performs target detection according to a feature vector of each image frame, and determines a target area of each image frame, so that a moving track of a target can be determined according to the target areas of a plurality of image frames. However, the above method only uses a single image frame as a unit, and determines the target area of each image frame separately, which is not accurate.

Disclosure of Invention

The embodiment of the application provides a target tracking method and device, computer equipment and a storage medium, which can improve the accuracy of a determined target area combination chain. The technical scheme is as follows:

in one aspect, a target tracking method is provided, and the method includes:

acquiring a plurality of image frames arranged in sequence;

taking every preset number of image frames in the plurality of image frames as an image group to obtain a plurality of image groups which are arranged in sequence, wherein any two adjacent image groups comprise at least one same image frame and at least one different image frame;

processing the image groups to obtain at least one target area chain of each image group, wherein each target area chain comprises a plurality of target areas belonging to the same target, and the target areas are respectively positioned in different image frames in the corresponding image groups;

and creating a target area combination chain of at least one target according to the target area chains of the plurality of image groups and the arrangement sequence of the plurality of image groups.

In another aspect, an object tracking apparatus is provided, the apparatus including:

the image frame acquisition module is used for acquiring a plurality of image frames which are arranged in sequence;

the image group acquisition module is used for taking every preset number of image frames in the plurality of image frames as an image group to obtain a plurality of image groups which are arranged in sequence, wherein any two adjacent image groups comprise at least one same image frame and at least one different image frame;

the image group processing module is used for processing the image groups to obtain at least one target area chain of each image group, each target area chain comprises a plurality of target areas belonging to the same target, and the target areas are respectively positioned in different image frames in the corresponding image group;

and the area combination chain creating module is used for creating an object area combination chain of at least one object according to the object area chains of the plurality of image groups and the arrangement sequence of the plurality of image groups.

In one possible implementation, the image group processing module includes:

a feature vector acquisition unit configured to acquire feature vectors of the preset number of image frames in a first image group, where the first image group is any one of the plurality of image groups;

and the target area chain acquisition unit is used for processing the feature vectors of the preset number of image frames in the first image group to obtain at least one target area chain of the first image group.

In another possible implementation manner, the feature vector acquiring unit is further configured to determine a feature vector of a specified image frame in a second image group as a feature vector of a corresponding specified image frame in the first image group, where the second image group is a previous image group of the first image group in the plurality of image groups, and the specified image frame is a same image frame in the first image group and the second image group; and extracting the features of other image frames except the appointed image frame in the first image group to obtain the feature vectors of the other image frames.

In another possible implementation manner, the target region chain obtaining unit is further configured to perform target detection according to a feature vector of a first image frame in the first image group, so as to obtain at least one initial region of the first image frame; processing the at least one initial region according to the feature vectors of the preset number of image frames in the first image group to respectively obtain target regions of the preset number of image frames, wherein the at least one initial region corresponds to at least one target region of each image frame in the first image group in a one-to-one manner; and associating target areas belonging to the same target and different image frames in the preset number of image frames to obtain at least one target area chain of the first image group.

In another possible implementation manner, the target region chain obtaining unit is further configured to perform target detection according to a feature vector of a first image frame in the first image group, so as to obtain a plurality of initial regions of the first image frame and probabilities corresponding to the plurality of initial regions; and selecting at least one initial region with the probability greater than a preset threshold value from the plurality of initial regions.

In another possible implementation manner, the target detection model comprises a feature extraction submodel and a feature detection submodel;

the feature vector acquisition unit is further configured to invoke the feature extraction sub-model to acquire feature vectors of the preset number of image frames in the first image group;

the target area chain acquiring unit is further configured to invoke the feature detection submodel, and process the feature vectors of the preset number of image frames in the first image group to obtain at least one target area chain of the first image group.

In another possible implementation manner, the region combination chain creating module includes:

a first determining unit, configured to, for any two adjacent image groups, take the same image frame in the any two image groups as a designated image frame, and take a target area of the designated image frame as a designated target area;

a region matching relationship determining unit, configured to determine a target region matching relationship between any two image groups according to a specified image frame and a specified target region in any two image groups, where the target region matching relationship includes a matching relationship between any target region in any specified image frame in a first image group and a target region belonging to the same target in the same specified image frame in a second image group;

a region chain matching relationship determining unit, configured to determine a target region chain matching relationship between any two image groups according to the target region chains of any two image groups and the target region matching relationship, where the target region chain matching relationship includes a matching relationship between any item tag region chain of the first image group and a target region chain belonging to the same target of the second image group;

and the area combination chain combination unit is used for combining other target areas except the specified target area in any target area chain with another target area chain matched with any target area chain according to the target area chain matching relationship to obtain the target area combination chain.

In another possible implementation manner, the region matching relationship determining unit is further configured to determine multiple sets of candidate matching relationships according to the specified image frames and the specified target regions in any two image groups, where the candidate matching relationships include a matching relationship between each specified target region of any specified image frame in the first image group and any specified target region in the same specified image frame in the second image group, and the multiple sets of candidate matching relationships are different; respectively determining the sum of the similarity of every two matched specified target areas in each group of alternative matching relations as the matching degree of each group of alternative matching relations; and selecting the candidate matching relationship with the maximum matching degree from the multiple groups of candidate matching relationships, and determining the candidate matching relationship as the target area matching relationship.

In another possible implementation, the second image frame of the second image group is the same as the first image frame of the first image group;

the image group processing module further comprises:

and the target region predicting unit is used for predicting a prediction target region which belongs to the same target as the interruption target region in the first image frame of the first image group if the first target region chain of the second image group comprises any interruption target region in the second image frame of the second image group, and the interruption target region is not matched with each specified target region in the first image frame of the first image group.

In another possible implementation manner, the apparatus further includes:

the target area prediction module is used for mapping the prediction target area according to the feature vectors of the preset number of image frames in the first image group to respectively obtain the prediction target areas of other image frames in the first image group;

and the target area association module is used for associating the prediction target areas belonging to different image frames in the preset number of image frames to obtain a prediction target area chain of the first image group.

In another possible implementation manner, the apparatus further includes:

a matching relationship adding module, configured to add a matching relationship between the interruption target region and the prediction target region in a target region matching relationship between the second image group and the first image group.

In another possible implementation manner, the image frame acquiring module includes:

and the image frame acquisition unit is used for acquiring video data and performing frame extraction processing on the video data to obtain the plurality of image frames.

In another possible implementation manner, the apparatus further includes:

the video data updating module is used for adding the moving track of the at least one target in the video data according to the target area combination chain of the at least one target to obtain updated video data;

and the video data playing module is used for playing the updated video data so as to display the picture of the at least one target moving according to the corresponding moving track.

In another aspect, a computer device is provided, which includes a processor and a memory, where at least one instruction is stored, and the at least one instruction is loaded and executed by the processor to implement the target tracking method according to the above aspect.

In another aspect, a computer-readable storage medium is provided, having stored therein at least one instruction, which is loaded and executed by a processor, to implement the target tracking method according to the above aspect.

The beneficial effects that technical scheme that this application embodiment brought include at least:

the method, the device, the computer equipment and the storage medium provided by the embodiment of the application acquire a plurality of image frames which are arranged in sequence, take each preset number of image frames in the plurality of image frames as an image group to acquire the plurality of image groups which are arranged in sequence, process the plurality of image groups to acquire at least one target area chain of each image group, and create at least one target area combination chain of a target according to the target area chains of the plurality of image groups and the arrangement sequence of the plurality of image groups. The method comprises the steps of establishing a plurality of image groups according to a plurality of image frames, acquiring a target area chain by taking the image groups as units, enabling the target areas of different image frames in each image group to have an association relationship, improving the accuracy of target detection, enabling the target area chains of the plurality of image groups to have the association relationship by including the same image frame in different image groups, and being capable of forming a target area combination chain, thereby realizing the tracking of a target and improving the accuracy of the target area combination chain.

And determining at least one target area chain of each image group according to the feature vectors of the preset number of image frames of each image group through the feature extraction submodel and the feature detection submodel, so that the accuracy of the determined target area chain is improved, the accuracy of the target area combination chain is improved, and the accuracy of the determined moving track of the target is also improved.

When the target area chains of any two image groups are combined, the target area chain matching relationship between the designated target areas of the same designated image frames in the two image groups is determined by determining the target area matching relationship between the designated target areas of the same designated image frames in the two image groups, so that the target area chains of different image groups can be combined according to the determined target area chain matching relationship, the accuracy of the obtained target area combination chain is improved, and the accuracy of the determined target moving track is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic illustration of an implementation environment provided by an embodiment of the present application;

fig. 2 is a flowchart of a target tracking method provided in an embodiment of the present application;

fig. 3 is a flowchart of a target tracking method provided in an embodiment of the present application;

FIG. 4 is a schematic diagram of a target region chain according to an embodiment of the present disclosure;

FIG. 5 is a flowchart of a target tracking method provided in an embodiment of the present application;

FIG. 6 is a schematic diagram of a target area matching relationship provided in an embodiment of the present application;

FIG. 7 is a flowchart of a target tracking method provided in an embodiment of the present application;

FIG. 8 is a flowchart of a target tracking method provided by an embodiment of the present application;

FIG. 9 is a diagram for illustrating a target movement track according to an embodiment of the present application;

FIG. 10 is a flowchart of a target tracking method provided by an embodiment of the present application;

fig. 11 is a flowchart of determining a target area matching relationship according to an embodiment of the present application;

FIG. 12 is a flowchart illustrating a process of invoking a target detection model to obtain a target region chain of a group of images according to an embodiment of the present disclosure;

FIG. 13 is a flowchart of acquiring a target area chain of multiple image groups according to an embodiment of the present application;

fig. 14 is a schematic structural diagram of a target tracking apparatus according to an embodiment of the present application;

fig. 15 is a schematic structural diagram of a target tracking apparatus according to an embodiment of the present application;

fig. 16 is a schematic structural diagram of a terminal according to an embodiment of the present application;

fig. 17 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present application more clear, the embodiments of the present application will be further described in detail with reference to the accompanying drawings.

The terms "first," "second," and the like as used herein may be used herein to describe various concepts that are not limited by these terms unless otherwise specified. These terms are only used to distinguish one concept from another. For example, a first image group may be referred to as a second image group, and similarly, a second image group may be referred to as a first image group, without departing from the scope of the present application.

As used herein, the terms "at least one," "a plurality," "each," and "any," at least one of which includes one, two, or more than two, and a plurality of which includes two or more than two, each of which refers to each of the corresponding plurality, and any of which refers to any of the plurality. For example, the plurality of elements includes 3 elements, each of which refers to each of the 3 elements, and any one of the 3 elements refers to any one of the 3 elements, which may be a first one, a second one, or a third one.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.

According to the scheme provided by the embodiment of the application, the feature extraction submodel and the feature detection submodel can be trained on the basis of the machine learning technology of artificial intelligence, and the target tracking method is realized by utilizing the trained feature extraction submodel and the feature detection submodel.

The target tracking method provided by the embodiment of the application can be used in a computer device, the computer device acquires a plurality of image frames which are sequentially arranged, each preset number of the image frames in the image frames are used as an image group, a plurality of image groups which are sequentially arranged are obtained, the image groups are processed, at least one target area chain of each image group is obtained, and a target area combination chain of at least one target is created according to the target area chains of the image groups and the arrangement sequence of the image groups.

The computer equipment comprises a terminal or a server, wherein the server can be an independent physical server, can also be a server cluster or a distributed system formed by a plurality of physical servers, and can also be a cloud server for providing basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, Network service, cloud communication, middleware service, domain name service, security service, CDN (Content Delivery Network), big data, an artificial intelligence platform and the like. The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.

Fig. 1 is a schematic structural diagram of an implementation environment provided in an embodiment of the present application, and as shown in fig. 1, the implementation environment includes a terminal 101 and a server 102. The terminal 101 establishes a communication connection with the server 102, and performs interaction through the established communication connection.

The terminal 101 shoots a target, acquires a plurality of image frames which are arranged in sequence, sends the image frames to the server 102, the server 102 acquires the image frames, takes every preset number of image frames in the image frames as an image group to obtain a plurality of image groups which are arranged in sequence, processing the plurality of image groups to obtain at least one object region chain of each image group, creating an object region combination chain of at least one object according to the object region chains of the plurality of image groups and the arrangement sequence of the plurality of image groups, the server 102 according to the object region combination chain of at least one object, adding a moving track of at least one target in the video data to obtain updated video data, sending the video data enough to be updated to the terminal 101, and playing the updated video data by the terminal 101 to display a picture of the at least one target moving according to the corresponding moving track.

The method provided by the embodiment of the application can be used for a target tracking scene.

For example, in a people stream statistics scenario:

the method comprises the steps that a terminal shoots a crossing, video data are obtained, frame extraction processing is carried out on the video data, a plurality of image frames which are sequentially arranged are obtained, a target area combination chain of at least one target is created by adopting the target tracking method provided by the embodiment of the application, the number of the targets appearing in the video data is determined according to the target area combination chain of the at least one target appearing in the image frames, and therefore people flow statistics of the crossing is achieved.

As another example, under a suspected infected person tracking scenario:

during epidemic prevention and control, a terminal acquires a plurality of image frames of an epidemic prevention and control area, a target area combination chain of at least one target is created by adopting the target tracking method provided by the embodiment of the application, the moving track of the at least one target is determined according to the target area combination chain of the at least one target, so that the moving track of the at least one target in the epidemic prevention and control area can be determined, the at least one target can be determined to be in the epidemic prevention and control area through the determined moving track, the at least one target is determined to be a suspected infector, the at least one target can be isolated subsequently, the spread of viruses is avoided, and the effectiveness of the epidemic prevention and control is improved.

Fig. 2 is a flowchart of a target tracking method provided in an embodiment of the present application, which is applied to a computer device, and as shown in fig. 2, the method includes:

201. a plurality of image frames in a sequential arrangement is acquired.

The image frame is an image including an object, and the image frame may be obtained by shooting the object or by performing frame extraction processing on video data. The plurality of image frames may be arranged according to the sequence of the shooting time, or may be arranged according to the sequence number of the plurality of image frames, or may be arranged according to other sequences, etc.

202. And taking each preset number of image frames in the plurality of image frames as an image group to obtain a plurality of image groups which are arranged in sequence.

The preset number may be any set number, such as 2, 3, or 5. The arrangement order of the plurality of image groups may be the order of the generation time of the image groups, or the arrangement order of the image frames included in the image groups, for example, the arrangement order of the image frames included in the image groups.

In the plurality of image groups, image frames included in different image groups are not completely identical, and any two adjacent image groups include at least one identical image frame and at least one different image frame. For example, the plurality of image frames arranged in sequence include an image frame 1, an image frame 2, an image frame 3, and an image frame 4, and the preset number is 2, then the image frame 1 and the image frame 2 may be regarded as an image group a, the image frame 2 and the image frame 3 may be regarded as an image group B, the image frame 3 and the image frame 4 may be regarded as an image group C, and the obtained plurality of image groups are arranged according to the order of the image frames, that is, the order of the plurality of image groups is: image group A, image group B and image group C.

203. And processing the plurality of image groups to obtain at least one target region chain of each image group.

Each target area chain comprises a plurality of target areas belonging to the same target, and the target areas are respectively located in different image frames in the corresponding image group. The target area is an area where a target is located in the image frame, and the target may be a person, a car, or the like.

For any image group, any image frame of the image group may include at least one object, and any object may appear in at least one image frame of the image group, so that an object region of at least one object in each image frame may be determined, and an object region chain of the object is obtained by associating object regions belonging to the same object in a preset number of image frames in the image group, thereby obtaining at least one object region chain of the image group, where one object has one object region chain.

204. And creating a target area combination chain of at least one target according to the target area chains of the plurality of image groups and the arrangement sequence of the plurality of image groups.

Each target area combination chain comprises a plurality of target areas belonging to the same target, and the target areas are respectively positioned in different image frames. For example, the target region combination chain of the target 1 includes 4 target regions, a first target region is located in a fourth image frame of the plurality of image frames, a second target region is located in a fifth image frame of the plurality of image frames, a third target region is located in a sixth image frame of the plurality of image frames, and a fourth target region is located in a seventh image frame of the plurality of image frames.

Since each image group has at least one object region chain, and the adjacent image groups include at least one different image frame, object region chains of any object in the multiple image groups are associated according to the arrangement sequence of the multiple image groups, object regions of the object in different image frames can be obtained, and thus an object region combination chain of the object is created.

The method provided by the embodiment of the application comprises the steps of obtaining a plurality of image frames which are arranged in sequence, taking each preset number of image frames in the image frames as an image group, obtaining a plurality of image groups which are arranged in sequence, processing the image groups to obtain at least one target area chain of each image group, and creating at least one target area combination chain of at least one target according to the target area chains of the image groups and the arrangement sequence of the image groups. The method comprises the steps of creating a plurality of image groups according to a plurality of image frames, acquiring a target area chain by taking the image groups as units, enabling the target areas of different image frames in each image group to have an association relationship, improving the accuracy of target detection, enabling the target area chains of the plurality of image groups to have the association relationship by including the same image frame in different image groups, and being capable of forming a target area combination chain, thereby realizing the tracking of a target and improving the accuracy of the target area combination chain.

Fig. 3 is a flowchart of a target tracking method provided in an embodiment of the present application, where the target tracking method is applied to a computer device, and as shown in fig. 3, the method includes:

301. the computer equipment acquires video data and performs frame extraction processing on the video data to obtain a plurality of image frames.

Wherein the video data is a continuous sequence of images, the video data comprising a plurality of successive image frames. The video data may be shot by a computer device, or may be video data sent by another device.

The video data is acquired for the purpose of target tracking, and one or more image frames in the video data comprise a target to be tracked, wherein the target can be a pedestrian, an automobile and the like. The video data is subjected to frame extraction processing, image frames are extracted from a continuous image sequence, and therefore a plurality of image frames are obtained, and the obtained image frames can be arranged according to time sequence, so that the target can be tracked through the image frames subsequently.

In one possible implementation, this step 301 may include: the method comprises the steps of obtaining video data, and carrying out frame extraction processing on the video data according to a preset time interval to obtain a plurality of images.

The preset time interval may be any time duration, such as 0.1 second, 0.2 second, and the like. In a plurality of images obtained by performing frame extraction processing at preset time intervals, the interval between any two adjacent images is the preset time interval.

It should be noted that in the embodiment of the present application, the plurality of image frames are obtained by performing frame extraction processing on the video data, and in another embodiment, the plurality of image frames in sequence may be obtained in other manners without performing step 301. For example, a computer device acquires video data, acquires each image frame included in the video data, and obtains a plurality of image frames.

302. The computer device takes every preset number of image frames in the plurality of image frames as an image group to obtain a plurality of image groups arranged in sequence.

In the plurality of image groups, image frames included in different image groups are not completely identical, and any two adjacent image groups include at least one identical image frame and at least one different image frame. For example, the plurality of image frames arranged in sequence include an image frame 1, an image frame 2, an image frame 3, and an image frame 4, and the preset number is 2, then the image frame 1 and the image frame 2 may be regarded as an image group a, the image frame 2 and the image frame 3 may be regarded as an image group B, the image frame 3 and the image frame 4 may be regarded as an image group C, and the obtained plurality of image groups are arranged according to the order of the image frames, that is, the order of the plurality of image groups is: image group A, image group B, and image group C.

In one possible implementation, this step 302 may include: according to the arrangement sequence of the image frames, every preset number of adjacent image frames are used as an image group, and a plurality of image groups arranged in sequence are obtained.

Wherein the preset number of adjacent image frames indicates that the preset number of image frames are adjacent to each other in the arrangement order of the plurality of images. For example, the plurality of image frames includes: the image processing method includes the steps that image frames 1, 2, 3, 4, 5 and 6 are preset, and the preset number is 3, so that the preset number of adjacent image frames can be image frames 1, 2 and 3, image frames 2, 3 and 4, image frames 3, 4 and 5, or image frames 4, 5 and 6.

When a plurality of image groups are obtained, according to the arrangement sequence of a plurality of image frames, each image frame is sequentially used as a starting image frame from the first image of the plurality of images, a preset number of image frames are selected as an image group from the starting image frame, a plurality of image groups are obtained, the preset number of image frames included in each image group are adjacent to each other in the plurality of image frames which are arranged in sequence, and the plurality of image groups can be arranged according to the arrangement sequence of the starting image frame of each image group in the plurality of images. For example, the plurality of image frames includes: the image processing method comprises the following steps of (1) taking an image frame 1, an image frame 2, an image frame 3 and an image frame 4 as a first image frame, wherein the preset number of the image frames is 2, and the image frames 1, 2 and 3 are respectively used as a plurality of image groups: the image group a includes an image frame 1 and an image frame 2, the image group B includes an image frame 2 and an image frame 3, and the image group C includes an image frame 3 and an image frame 4.

303. And calling the feature extraction submodel by the computer equipment to acquire feature vectors of a preset number of image frames in the first image group.

The first image group is any one of a plurality of image groups. The feature extraction submodel is a model for extracting a feature vector of an image frame. The feature vector is a vector for representing a feature of an image frame, and the feature vector differs between image frames. Inputting a preset number of image frames of any image group into the feature extraction sub-model, and outputting a feature vector of each image frame in the image group by the feature extraction sub-model. The Feature extraction submodel may be ResNet50(Residual Network 50, Residual Network model 50), FPN (Feature Pyramid Network model), Google Network (a Network model), Shuffle Network (a Network model), or a Network model composed of ResNet50 and FPN.

In a possible implementation manner, the first image group is a first image group in a plurality of image groups, and then step 303 may include: and calling the feature extraction submodel by the computer equipment, and extracting the features of each image frame in the first image group to obtain the feature vector of each image frame in the first image group.

In one possible implementation manner, the first image group is not the first image group in the plurality of image groups, and the computer device sequentially obtains the feature vectors of the image frames in each image group according to the arrangement order of the plurality of image groups, then the step 303 may include: and determining the feature vector of the appointed image frame in the second image group as the feature vector of the corresponding appointed image frame in the first image group, calling a feature extraction sub-model, and performing feature extraction on other image frames except the appointed image frame in the first image group to obtain the feature vectors of other image frames.

Wherein the second image group is a previous image group of the first image group in the plurality of image groups, and the designated image frame is the same image frame in the first image group and the second image group. Since at least one identical image frame is included in the first image group and the second image group, the identical image frames in the first image group and the second image group can be both regarded as the designated image frame. The feature vector of the specified image frame in the second image group is obtained by calling the feature extraction sub-model, namely the feature vector is obtained after the feature extraction sub-model is called to extract the features of the specified image frame in the second image group. Since the computer device sequentially acquires the feature vectors of the image frames in each image group according to the arrangement order of the plurality of image groups, the feature vector of each image frame in the second image group, that is, the feature vector of the designated image frame in the second image group, is already acquired by calling the feature extraction sub-model before the feature vector of the image frame in the first image group is acquired.

In the embodiment of the application, the computer device calls the feature extraction sub-model according to the arrangement sequence of the plurality of image groups, and sequentially obtains the feature vectors of the image frames in each image group. Because any two adjacent image groups comprise at least one same image frame, in order to avoid repeated extraction of feature vectors of the same image frame, when the feature vectors of the image frames in the first image group are obtained, the feature vectors of the designated image frames in the second image group can be obtained and used as the feature vectors of the designated image frames in the first image group, so that when the feature vectors of a preset number of image frames in the first image group are obtained, only the feature vectors of other image frames except the designated image frames need to be obtained, the time for obtaining the feature vectors of the image frames is saved, and a feature vector sharing mechanism is realized.

In a possible implementation manner, the computer device calls a feature extraction sub-model according to the arrangement sequence of a plurality of image groups, sequentially obtains feature vectors of image frames in each image group, calls the feature extraction sub-model, stores the feature vectors of a preset number of image frames in the current image group into a memory after obtaining the feature vectors of the preset number of image frames in the current image group, responds to the calling of the feature extraction sub-model, stores the feature vectors of the preset number of image frames in the next image group into the memory after obtaining the feature vectors of the preset number of image frames in the next image group, and deletes the feature vectors of the preset number of image frames in the previous image group.

When the computer equipment sequentially acquires the feature vectors of a preset number of image frames in a plurality of image groups according to the arrangement sequence of the image groups, the feature vectors of the image frames in the current image group are stored in the memory after the feature vectors of the image frames in the current image group are acquired each time, so that the feature vectors of the appointed image frames in two image groups can be acquired when the feature vectors of the image frames in the next image group are acquired, repeated acquisition of the feature vectors of the same image frame is avoided, the time for acquiring the feature vectors of the image frames is saved, and the feature vectors of the image frames in the previous image group are always stored in the memory, so that a memory sharing mechanism is realized, and the time consumption for acquiring the feature vectors of the image frames is reduced.

304. And the computer equipment calls the feature detection submodel to process the feature vectors of a preset number of image frames in the first image group to obtain at least one target area chain of the first image group.

The feature detection submodel is used for acquiring a target area chain of at least one target in the image frame according to the feature vector of the image frame. Each target area chain represents the incidence relation of a corresponding target among target areas of a plurality of image frames, each target area chain comprises a plurality of target areas belonging to the same target, and the plurality of target areas are respectively positioned in different image frames in the corresponding image group.

Calling a feature detection sub-model to process feature vectors of a preset number of image frames in a first image group, so as to obtain a target area of at least one target in each image frame of the preset number of image frames, and associating the target areas belonging to the same target in the preset number of image frames to obtain a target area chain, so as to obtain at least one target area chain of the first image group. As shown in fig. 4, the first image group includes an image frame 1 and an image frame 2, the image frame 1 includes two target regions, the image frame 2 includes two target regions, and the two target regions belonging to the same target in the image frame 1 and the image frame 2 are associated to obtain two target region chains.

In one possible implementation, this step 304 may include: combining the feature vectors of the preset number of image frames in the first image group according to the arrangement sequence of the preset number of image frames in the first image group to obtain the feature vectors of the first image group, calling a feature detection sub-model, and processing the feature vectors of the first image group to obtain at least one target region chain of the first image group.

When the feature vectors of a preset number of image frames in the first image group are combined, the feature vectors of each image frame may be sequentially spliced according to the arrangement order of the preset number of image frames, and the spliced feature vectors are used as the feature vectors of the first image group, so that the feature vectors of the first image group include the feature vectors of each image frame in the preset number of image frames.

The feature vectors of a preset number of image frames are combined, and a feature detection sub-model is called to process the feature vectors of the first image group, so that the target area in each obtained image frame is related to the target areas of other image frames, the accuracy of detecting the target areas is improved, and the accuracy of a target area chain is improved.

In one possible implementation, as shown in fig. 5, the step 304 includes the following steps 3041 and 3043:

3041. and calling a feature detection sub-model, and carrying out target detection according to the feature vector of the first image frame in the first image group to obtain at least one initial region of the first image frame.

Wherein the initial region is used to represent the region in which the object is located in the first image frame. When the feature detection submodel is called to detect the target, the target detection can be performed on the image frame in the form of a detection frame, and when the target detection frame includes the target, the area where the detection frame is located is determined as an initial area, so that at least one initial area in the first image frame is obtained.

In one possible implementation, this step 3041 may include: calling a feature detection submodel, carrying out target detection according to a feature vector of a first image frame in the first image group to obtain a plurality of initial regions of the first image frame and corresponding probabilities of the plurality of initial regions, and selecting at least one initial region with the probability larger than a preset threshold value from the plurality of initial regions.

The probability corresponding to the initial region is used to represent the probability that the initial region includes the target, the higher the probability is, and the lower the probability is, the higher the probability is, the probability is. The preset threshold may be any value, such as 0.5, 0.7, etc., and by setting the preset threshold, it may be determined that the target is included in the initial region in response to the probability being greater than the preset threshold, and it may be determined that the target is not included in the initial region in response to the probability being less than the preset threshold.

By selecting at least one initial region with the probability greater than a preset threshold from the plurality of initial regions of the first image frame, the accuracy of the determined at least one initial region is improved, and thus the accuracy of at least one target region chain obtained through the at least one initial region subsequently is improved.

3042. And processing at least one initial region according to the feature vectors of the preset number of image frames in the first image group to respectively obtain target regions of the preset number of image frames.

The target area is used to indicate an area where a target is located in the image frame, and the target area may be indicated in the form of a detection frame or in the form of coordinates, where if the target area is a rectangle, the target area may be indicated by coordinates of an upper left corner and a lower right corner of the rectangle, or may be indicated by coordinates of a lower right corner and an upper left corner of the rectangle.

The at least one initial region corresponds one-to-one to the at least one target region of each image frame in the first image set. For example, if the first image frame includes 3 initial regions, each image frame in the first image group includes 3 target regions, and the 3 target regions respectively correspond to the 3 initial regions one to one.

When at least one initial region is processed according to the feature vectors of a preset number of image frames in a first image group, because the at least one initial region is positioned in the first image frame, at least one initial region in the first image frame is adjusted to obtain at least one target region in the first image frame; and for other image frames except the first image frame in the first image group, mapping processing is carried out according to at least one initial region to obtain a target region corresponding to the at least one initial region in the other image frames.

For the same target, target regions of the target in different image frames may be the same or different, and since a plurality of image frames in the same image group are adjacent, target region variation differences of the target are small in different image frames in the same image group, so that when mapping processing is performed on other image frames in the image group through an initial region in a first image frame in the same image group, the target regions of the target in other image frames can be determined by performing small translation on the initial region, thereby obtaining target regions of a preset number of image frames.

3043. And associating target areas which belong to the same target and different image frames in a preset number of image frames to obtain at least one target area chain of the first image group.

Because each image frame comprises at least one target area, and targets to which different target areas belong are different in the same image frame, at least one target area chain is obtained by associating at least one target area in a preset number of image frames according to whether the targets to which the different target areas belong are the same, and the targets to which the different target area chains belong are different.

For example, one image group includes an image frame 1, an image frame 2, and an image frame 3, each image frame includes 3 objects, that is, an object a, an object B, and an object C, then the object areas belonging to the object a in the image frame 1, the image frame 2, and the image frame 3 are associated to obtain an object area chain of the object a, the object areas belonging to the object B in the image frame 1, the image frame 2, and the image frame 3 are associated to obtain an object area chain of the object B, and the object areas belonging to the object C in the image frame 1, the image frame 2, and the image frame 3 are associated to obtain an object area chain of the object C.

In addition, for the feature detection submodel, the feature detection submodel includes a classification layer and a regression layer. By the classification layer of the feature detection submodel, the target detection is carried out according to the feature vector of the first image frame in the first image group, and the target or the background in the first image frame can be distinguished, so that the initial region containing the target in the first image frame is obtained. And processing the initial region of the first image frame according to the feature vectors of a preset number of image frames in the first image group through a regression layer of the feature detection sub-model to obtain a target region of each image frame of the first image group. The target area may be represented by coordinates, that is, coordinates of the target area of each image frame of the first image group.

It should be noted that, in the embodiment of the present application, at least one object region chain of the first image group is obtained through the feature extraction sub-model and the feature detection sub-model, but in another embodiment, the step 303 and the step 304 need not be executed, and the plurality of image groups may be processed by invoking the object detection model to obtain at least one object region chain of each image group.

In addition, before calling the feature detection submodel, the feature detection submodel needs to be trained, and the feature detection submodel may be trained by a computer device or sent to a computer and a device after being trained by other devices.

The computer equipment obtains a plurality of sample sets, each sample set comprises feature vectors of a preset number of image frames of a sample image group and a sample target area chain of the sample image group, the feature vectors of the preset number of image frames of the sample image groups are used as the input of the feature detection sub-model, the sample target area chains of the sample image groups are used as the output of the feature detection sub-model, and the feature detection sub-model is trained.

In the process of training the feature detection submodel, calling the feature detection submodel, processing feature vectors of a preset number of image frames of any sample image group to obtain a prediction target area chain of the sample image group, determining the difference between the prediction target area chain and the sample target area chain according to the prediction target area chain of the sample image group and the sample target area chain of the sample image group, and adjusting the feature detection submodel according to the difference to reduce the difference between the prediction target area chain and the sample target area chain.

In the training process of the feature detection submodel, determining an output value of a loss function of the feature detection submodel according to a prediction target area chain of the sample image group and a sample target area chain of the sample image group, adjusting the feature detection submodel according to the output value of the loss function, and stopping training the feature detection submodel to obtain the trained feature detection submodel in response to the fact that the output value of the loss function of the feature detection submodel is smaller than a preset loss threshold value. The output value of the loss function is a value calculated according to a difference between the prediction target region chain and the sample target region chain, and the preset loss threshold may be any set value, such as 0.3, 0.2, and the like.

In the training of the feature detection submodel, for the classification branch, Focal Loss (focus Loss function) may be used, and for the regression branch, Smooth L1(Smooth L1 Loss, absolute Loss function) may be used, and the Loss functions of the two branches are weighted and summed to obtain the Loss function of the feature detection submodel. The weights of the loss functions of the classification branch and the regression branch can be set arbitrarily.

It should be noted that, in the embodiment of the present application, at least one target region chain of the first image group is obtained through the feature extraction sub-model and the feature detection sub-model, but in another embodiment, when step 303 and step 304 are executed, the corresponding steps may be directly executed by the computer device without invoking the feature extraction sub-model and the feature detection sub-model.

305. The computer device takes the same image frame in any two adjacent image groups as a designated image frame and takes a target area of the designated image frame as a designated target area.

Since any two adjacent image groups include at least one same image frame and at least one different image frame, the same image frame can be used as a designated image frame, and a target area included in the designated image frame can be used as a designated target area.

306. The computer device determines a target area matching relationship between any two image groups according to the designated image frame and the designated target area in any two image groups.

The target area matching relationship is used for representing the matching relationship between the designated target areas of any one of the same designated image frames in any two image groups, and the target area matching relationship comprises the matching relationship between any one of the target areas of any one of the designated image frames in the first image group and the target area belonging to the same target in the same designated image frame in the second image group. When determining the target region matching relationship between any two image groups, the target region matching relationship may be determined by a KM (Kuhn-Munkres, bipartite graph maximum weight matching) algorithm, a bipartite graph matching algorithm, a greedy method, or the like.

For example, if the designated image frame in the first image group includes a target area 1, a target area 2, and a target area 3, and the same designated image frame in the second image group includes a target area 4, a target area 5, and a target area 6, the target area matching relationship between the two image groups is shown in table 1, where the target area 1 matches the target area 4, the target area 2 matches the target area 4, and the target area 3 matches the target area 6.

TABLE 1

In any two image groups, the same designated image frame comprises the same target, designated target areas in the same designated image frame in the two image groups are matched, and the matching relation between the designated target areas belonging to the same target is determined.

For example, image group a includes image frame 1 and image frame 2, image group B includes image frame 2 and image frame 3, image group a includes two chains of target regions, and image group B includes two chains of target regions. The image group a image frame 2 and the image group B image frame 2 each include two target areas, and the matching relationship between the image group a image frame 2 and the image group B image frame 2 is determined as shown in fig. 6.

In one possible implementation, as shown in FIG. 7, this step 306 may include the following steps 3061-3063:

3061. and determining a plurality of groups of alternative matching relations according to the specified image frames and the specified target areas in any two image groups.

The candidate matching relationship comprises a matching relationship between each designated target region of any designated image frame in the first image group and any designated target region in the same designated image frame in the second image group, and the multiple groups of candidate matching relationships are different.

The same designated image frame may include a plurality of designated target regions, and when the plurality of designated target regions of the designated image frames of the two image groups are matched, there are a plurality of matching manners, and a plurality of matching relationships may be obtained, and then the plurality of matching relationships are all used as candidate matching relationships. For example, if any one of the designated image frames in the first image group includes the designated target area 1 and the designated target area 2, and the same designated image frame in the second image group includes the designated target area 3 and the designated target area 4, 2 sets of matching relationships can be determined, where the first set of matching relationships is: the designated target area 1 is matched with the designated target area 3, and the designated target area 2 is matched with the designated target area 4; the second group of matching relations are: the designated target area 1 matches the designated target area 4, and the designated target area 2 matches the designated target area 3.

3062. And respectively determining the sum of the similarity of every two matched specified target areas in each group of candidate matching relations as the matching degree of each group of candidate matching relations.

The similarity of the two matched specified target regions is used for representing the similarity of the two specified regions, the greater the similarity is, the greater the possibility that the two specified regions belong to the same target is, and the smaller the similarity is, the less the possibility that the two specified regions belong to the same target is. The matching degree of the alternative matching relationship is used for representing the matching degree between the designated target regions matched in the alternative matching relationship, the greater the matching degree is, the more accurate the designated target regions matched pairwise in the matching relationship is, and the smaller the matching degree is, the less accurate the designated target regions matched pairwise in the matching relationship is.

In one possible implementation, determining the similarity of two matching specified target regions may include: and determining the ratio of the intersection area of the two specified target areas and the union area of the two specified target areas as the similarity between the two specified image frames.

The ratio of the intersection area of the two designated target regions to the union area of the two designated target regions is also the intersection ratio of the two designated target regions. Since the regions where the targets included in the same designated image frame are located are fixed, if two designated target regions belong to the same target, the two designated target regions should be completely overlapped, that is, the intersection area of the two designated target regions is equal to the union area of the two designated target regions, and if the two designated target regions do not belong to the same target, the intersection area of the two designated target regions is small, and the union area of the two designated target regions is large, so that the similarity of the two designated image frames can be determined through the intersection area and the union area of the two designated target regions.

In one possible implementation, determining the intersection area and the union area of the two specified target regions may include: and determining the intersection area and the union area of the two specified target areas according to the coordinates of the two specified target areas.

For example, if both the two specified target regions are rectangles, the intersection area between the two rectangles is the intersection area of the two specified target regions, and the union area between the two rectangles is the union area of the two specified target regions. The coordinates of the two specified target areas are the coordinates of the four corners of the two rectangles, so that the overlapping area of the two rectangles can be determined through the coordinates of the four corners of the two rectangles, the overlapping area is used as the intersection area of the two specified target areas, and the difference between the sum of the areas of the two rectangles and the overlapping area is used as the union area of the two specified target areas.

3063. And selecting the candidate matching relationship with the maximum matching degree from the multiple groups of candidate matching relationships, and determining the candidate matching relationship as the target region matching relationship.

The larger the matching degree is in the multiple candidate matching relations, the more accurate the designated target area representing pairwise matching in the matching relation is, so that the candidate matching relation with the largest matching degree is selected from the multiple candidate matching relations and determined as the target matching relation, and the accuracy of the target matching relation is improved.

307. And the computer equipment determines the target area chain matching relationship between any two image groups according to the target area chain and the target area matching relationship of any two image groups.

The target region chain matching relationship is used for representing the matching relationship between the target region chains of any two image groups, and the target region chain matching relationship comprises any item target region chain of a first image group and the matching relationship between the item target region chain of a second image group and the target region chain of the same target, so as to represent that the two matched target region chains belong to the same target. For example, the first image group includes a target region chain 1 and a target region chain 2, the second image group includes a target region chain 3 and a target region chain 4, and the target region chain matching relationship between the first image group and the second image group is as follows: target region chain 1 matches target region chain 3, and target region chain 2 matches target region chain 4.

Since the object region matching relationship is used to represent the matching relationship between the specified object regions of any one of the same specified image frames in any two image groups, and different object region chains of each image group respectively contain one specified object region, the matching relationship between the object region chains of two image groups can be determined according to the matching relationship between the specified object regions of any one of the same specified image frames, so that any two object region chains containing the matching object regions are matched.

For example, the first image group includes a target area chain 1 and a target area chain 2, the second image group includes a target area chain 3 and a target area chain 4, the designated image frame of the first image group includes a designated target area a and a designated target area B, the designated image frame of the second image group includes a designated target area C and a designated target area D, the target area chain 1 includes the designated target area a, the target area chain 2 includes the designated target area B, the target area chain 3 includes the designated target area C, the target area chain 4 includes the designated target area D, and the target area matching relationship is: matching the designated target area A with the designated target area C, and matching the designated target area B with the designated target area D, wherein the determined target area chain matching relationship is as follows: target region chain 1 matches target region chain 3, and target region chain 2 matches target region chain 4.

In addition, the same target identifier may be set for the target region chains matched in any two image groups, where the target identifier is used to indicate the target to which the target region chain belongs, so that the target region chains belonging to the same target may be determined subsequently according to the target identifier.

308. And the computer equipment combines other target areas except the specified target area in any target area chain with another target area chain matched with the any target area chain according to the matching relation of the target area chains to obtain a target area combined chain.

Wherein the other target area is a target area included in other image frames except the designated image frame in the image group. The target area combination chain represents a combination chain formed by associating target areas belonging to the same target in a plurality of image frames.

Because two adjacent image groups include at least one same designated image frame and at least one different image frame, designated target regions included in the same designated image frame may be the same, and two matched target region chains may include the same designated target region, when the two matched target region chains in the two image groups are combined, only other target regions except for the designated target region in any item target region chain are combined with another target region chain, and a target region chain formed by the target regions included in the union image frame of the two image groups can be obtained. In the above manner, the target region chains of the plurality of image groups are combined, thereby obtaining a target region combination chain.

In addition, for any two adjacent image groups in the plurality of image groups, the second image frame of the first image group is the same as the first image frame of the second image group, after the target region chains of the plurality of image groups and the target region chain matching relationship between every two adjacent image groups are obtained, the matching relationship between the target region chains of the plurality of image groups is determined, and the target regions of the first image frames of other image groups except the last image group in the plurality of image groups are combined with the target region chain of the last image group to obtain a target region combination chain.

When combining the target region chains of a plurality of image groups, for any two adjacent image groups, any target region chain of the second image group contains an interruption target region of a designated image frame, the target region chain is singly used as one target region combination chain, and when the target region chain is combined with the target region chain of the next image group of the second image group, the target region combination chain is combined with the target region chain matched with the next image group to obtain a new target region combination chain.

It should be noted that, in the embodiment of the present application, the target area combination chain is obtained according to the target area chain matching relationship, but in another embodiment, the step 305 and the step 308 do not need to be executed, and the target area combination chain of at least one target can be created according to the target area chains of the plurality of image groups and the arrangement order of the plurality of image groups.

It should be noted that, the embodiment of the present application is only described in terms of the designated target areas that are matched with the designated image frames of the two image groups, and in another embodiment, the designated image frames of the two image groups further include the designated target areas that cannot be matched with each other. As shown in fig. 8, when the second image frame of the second image group is the same as the first image frame of the first image group, the determining the target region combination chain may further comprise the following steps 3081 and 3084:

3081. and the first target area chain of the second image group comprises any one interruption target area in the second image frame of the second image group, and the interruption target area is not matched with each specified target area in the first image frame of the first image group, so that the prediction target area belonging to the same target as the interruption target area is predicted in the first image frame of the first image group.

The second image group is a previous image group of the first image group in the plurality of image groups.

That is, the interruption target region is detected in the second image frame of the second image group based on the feature vectors of the preset number of image frames of the second image group, and a target region belonging to the same target as the interruption target region is not detected in the first image frame of the first image group based on the feature vectors of the preset number of image frames of the first image group.

In the first image frame of the first image group, the target to which the interruption target region belongs may be blocked by another object so that the target cannot be detected, or the target may not exist in the first image frame of the first image group. Since the target may reappear in other image frames after the first image frame of the first image group when the target is blocked by other objects and cannot be detected, in order to ensure the consistency of the target in the target area of different image frames, the target area may be predicted for the interruption to determine the predicted target area of the target in the next image frame.

In one possible implementation, obtaining the predicted target region may include: determining a first image frame of a second image group and a first image frame of the first image group, determining interval duration in video data, determining a moving direction of a target to which an interruption target region belongs, calling a speed model, determining a moving distance of the target according to the interval duration, determining a first target region belonging to the same target as the interruption target region in the first image frame of the second image group, translating the first target region according to the moving direction and the moving distance to obtain a translated second target region, and adding a predicted target region with the same coordinates as the second target region in the first image frame of the first image group according to the coordinates of the second target region.

In one possible implementation, determining a moving direction of the target to which the interruption target region belongs may include: and determining a third target area belonging to the same target as the interruption target area in the first image frame of the previous image group of the second image group, wherein the connecting line direction of the coordinates of the reference point in the third target area and the coordinates of the reference point in the second target area is used as the moving direction of the target belonging to the interruption target area. The reference point is any point of the target area, for example, the target area is a rectangle, and the upper left corner point of the rectangle is used as the reference point, or the center point of the rectangle is used as the reference point.

3082. In the target region matching relationship between the second image group and the first image group, the matching relationship of the interruption target region and the prediction target region is added.

By adding the matching relationship between the interrupt target region and the prediction target region in the target region matching relationship between the second image group and the first image group, the matching relationship between the target region chain containing the interrupt target region in the first image group and the prediction target region chain in the second image group can be determined subsequently through the matching relationship, so that the target region chain and the prediction target region chain can be combined subsequently, and a target region combination chain is obtained.

3083. According to the feature vectors of a preset number of image frames in the first image group, mapping the prediction target area to respectively obtain the prediction target areas of other image frames in the first image group, and associating the prediction target areas belonging to different image frames in the preset number of image frames to obtain a prediction target area chain of the first image group.

After determining the prediction target region in the first image frame in the first image group, the other image frames except the first image frame in the first image group may be predicted in the same manner as in step 3042 described above, so as to obtain the prediction target regions in the other image frames.

And obtaining a prediction target area chain by associating prediction target areas which belong to the same target and belong to different image frames in the second image group, so that the prediction target area chain of the first image group is matched with the target area chain containing the interrupted target area.

And after the prediction target region chain is obtained through prediction in the first image group, the prediction target region chain of the first image group can be used as a real target region chain to be matched with the target region chain of the next image group, and the target region chain of the first image group and the target region chain of the next image group are combined through the matching relationship of the first image group and the target region chain of the next image group to obtain a target region combination chain.

3084. And if the third image group comprises a second target area chain matched with the prediction target area chain, combining the first target area chain, the prediction target area chain and the second target area chain according to the matching relation between the interruption target area and the prediction target area to obtain a target area combination chain.

Wherein the third image group is a subsequent image group of the first image group.

The same appointed image frame in the second image group and the first image group is used as a first appointed image frame, the same appointed image frame in the first image group and the third image group is used as a second appointed image frame, when a first target region chain, a prediction target region chain and a second target region chain are combined, target regions in other image frames except the first appointed image frame in the first target region chain are combined with the prediction target region chain to obtain a first target region combination chain, and target regions in other image frames except the second appointed image frame in the first target region combination chain are combined with the second target region chain to obtain a second target region combination chain, namely the target region combination chain formed by combining the first target region chain, the prediction target region chain and the second target region chain.

It should be noted that, in the embodiment of the present application, only the third image group includes the second target region chain matching the prediction target region chain, but in another embodiment, the third image group does not include the second target region chain matching the prediction target region chain, according to the above-mentioned

step

3081 and 3083, the prediction target region chain associated with the prediction target region of the different image frames in the third image group is obtained, the above-mentioned steps are repeatedly executed, and the fourth image group includes the third target region chain matching the target region chain associated with the prediction target region of the previous image group, and then the first target region chain, the third target region chain, and the prediction target region chain predicted in the image group between the first image group and the fourth image group are associated to obtain the target region combination chain. The fourth image group is a first number of image groups after the second image group, the first number is not more than a preset number, and the preset number can be an arbitrarily set numerical value, such as 10, 8 and the like.

In addition, in response to the preset number of image groups subsequent to the second image group not including the target region chain matching the prediction target region chain of the previous image group, the interruption target region of the previous image group is not predicted any more in the preset number of image groups subsequent to the one image group.

309. And the computer equipment adds the moving track of at least one target in the video data according to the target area combination chain of the at least one target to obtain the updated video data.

Since the computer device determines the object region combination chain of the at least one object, i.e. determines the position of the at least one object in the plurality of image frames, the movement trajectory of the at least one object may be determined, adding the movement trajectory of the at least one object in the video data.

In addition, when the moving track of any object is determined, the reference point of the object area combination chain of the object in the object areas of the image frames is determined, so that a plurality of image frames containing the object are obtained, and each image frame contains one reference point of the object. Adding a line connected with the reference point in each image frame, wherein the line is formed by connecting the reference points in the previous image frames according to the positions of the reference points in the previous image frames so as to simulate the effect that the target moves along the line in a plurality of image frames, and the line can represent the moving track of the target. The reference point may be any point of the target area, for example, the target area is a rectangle, and the reference point is a middle point of a lower side of the rectangle.

310. And playing the updated video data by the computer equipment to display a picture of at least one target moving according to the corresponding moving track.

And displaying the moving track of at least one target when the updated video data is played, and displaying a picture of at least one target moving according to the corresponding moving track.

In addition, the target area may be displayed in the form of a detection box, and when the updated video data is played, the target area of the currently played video frame may be displayed in the form of a detection box, and the moving track of the target in the video frame before the currently played video frame may be displayed. As shown in fig. 9, fig. 9 shows any video frame in the played video data, and in fig. 9, the target areas of a plurality of targets in the video frame and the moving track of each target are displayed.

The method provided by the embodiment of the application comprises the steps of obtaining a plurality of image frames which are sequentially arranged, taking every preset number of image frames in the image frames as an image group to obtain a plurality of image groups which are sequentially arranged, processing the image groups to obtain at least one target area chain of each image group, and creating a target area combination chain of at least one target according to the target area chains of the image groups and the arrangement sequence of the image groups. The method comprises the steps of creating a plurality of image groups according to a plurality of image frames, acquiring a target area chain by taking the image groups as units, enabling the target areas of different image frames in each image group to have an association relationship, improving the accuracy of target detection, enabling the target area chains of the plurality of image groups to have the association relationship by including the same image frame in different image groups, and being capable of forming a target area combination chain, thereby realizing the tracking of a target and improving the accuracy of the target area combination chain.

When the target area chains of any two image groups are combined, the target area chain matching relationship between the specified target areas of the same specified image frames in the two image groups is determined, so that the target area chains of different image groups can be combined according to the determined target area chain matching relationship, the accuracy of the obtained target area combination chain is improved, and the accuracy of the determined target movement track is improved.

As shown in fig. 10, it is a flowchart of a target tracking method provided in the embodiment of the present application, where the method includes:

1. a plurality of image frames of video data are acquired in real time.

2. And taking each preset number of image frames in the plurality of image frames as an image group to obtain a plurality of image groups which are arranged in sequence.

3. And calling a target detection model, and processing the plurality of image groups to obtain at least one target area chain of each image group.

4. And combining the target area chains of the multiple image groups according to the target area chains of the multiple image groups and the target area chain matching relationship between every two adjacent image groups to obtain at least one target area combination chain.

5. And detecting whether the video data is finished or not, wherein the video data is not finished, and repeatedly executing the steps 1-4.

As shown in fig. 11, two adjacent image groups a and B, the image group a includes an image frame 3 and an image frame 4, the image group B includes an image frame 4 and an image frame 5, each image group is processed by an object detection model to obtain 3 object region chains in each image group, and an object region matching relationship between an object region in the image frame 4 of the image group a and an object region in the image frame 4 of the image group B is determined by matching the object regions in the same image frame 4 of the image group a and the image group B by a merging ratio.

Fig. 12 is a flowchart of invoking a target detection model to obtain a target region chain of an image group, where the target detection model includes a feature extraction sub-model and a feature detection sub-model, and for any image group, the image group includes an image frame 1 and an image frame 2, the image frame 1 and the image frame 2 are respectively input into the feature extraction sub-model, the feature extraction sub-model respectively outputs a feature vector of the image frame 1 and a feature vector of the image frame 2, the feature vectors of the image frame 1 and the image frame 2 are combined to obtain a feature vector combination of the image group, the feature vector combination is input into the feature detection sub-model, the feature detection sub-model is invoked, an initial region in the image frame 1 is determined through a classification layer, and the target region chain of the image group is determined according to the feature vector combination and the initial region through a regression layer.

As shown in fig. 13, a plurality of image frames are acquired, any two adjacent image frames are used as an image group, the last image frame is copied, the last image frame and the copied image frame are used as an image group, and a plurality of image groups are obtained, each image group includes two image frames, and the second image frame of the first image group is the same as the first image frame of the second image group in the two adjacent image groups.

And using the last image frame and the copied image frame as an image group, and using a plurality of image frames as the first image frame of the image group in the obtained plurality of image groups respectively, wherein the obtained plurality of image groups have the same number as the plurality of image frames, so that the target region combination chain can be obtained by extracting the target region of the first image frame of each image group for combination.

When a target area chain of a plurality of image groups is obtained, calling a feature extraction sub-model to sequentially obtain the feature vector of the image frame of each image group according to the arrangement sequence of the plurality of image groups. When the feature vector of the image frame of the first image group is obtained, calling a feature extraction sub-model, and respectively extracting features of two image frames of the first image group to obtain the feature vector of each image frame; when the feature vector of the image frame of the second image group is obtained, the feature vector of the second image frame of the first image group is used as the feature vector of the first image frame of the second image group, a feature extraction sub-model is called, and feature extraction is carried out on the second image frame of the second image group to obtain the feature vector of the second image frame; when a plurality of subsequent image groups are acquired, the feature vector of the second image frame of the previous image group is taken as the feature vector of the first image frame of the current image group each time, only the feature extraction sub-model is called, and feature extraction is carried out on the second image frame of the current image group, so that the feature vectors of the two image frames of the current image group are acquired. For the last image group in the plurality of image groups, because the two image frames in the image group are the same, only calling a feature extraction sub-model, and performing feature extraction on the first image frame of the last image group to obtain feature vectors of the two image frames of the last image group.

When the feature extraction submodel is called to sequentially obtain the feature vectors of the image frames of a plurality of image groups, the feature vectors of two image frames of each image group output by the feature extraction submodel are combined to obtain the feature vector of the corresponding image group, the feature vector of each image group is input into the feature detection submodel to obtain a target region chain of each image group, and the target region of the first image frame of each image group is extracted and combined to obtain a target region combination chain according to the target region chain matching relationship between every two adjacent image groups.

Fig. 14 is a schematic structural diagram of an object tracking apparatus according to an embodiment of the present application, and as shown in fig. 14, the apparatus includes:

an image frame acquisition module 1401 configured to acquire a plurality of image frames arranged in sequence;

an image group acquiring module 1402, configured to take every preset number of image frames in the multiple image frames as an image group to obtain multiple image groups arranged in sequence, where any two adjacent image groups include at least one same image frame and at least one different image frame;

an image group processing module 1403, configured to process the multiple image groups to obtain at least one target region chain of each image group, where each target region chain includes a preset number of target regions belonging to the same target, and the preset number of target regions are located in different image frames in the corresponding image group respectively;

a region combination chain creating module 1404, configured to create a target region combination chain of at least one target according to the target region chains of the plurality of image groups and the arrangement order of the plurality of image groups.

In one possible implementation, as shown in fig. 15, the image group processing module 1403 includes:

a feature vector acquisition unit 1431 configured to acquire feature vectors of a preset number of image frames in a first image group, where the first image group is any one of a plurality of image groups;

the target region chain acquiring unit 1432 is configured to process feature vectors of a preset number of image frames in the first image group to obtain at least one target region chain of the first image group.

In another possible implementation manner, the feature vector acquiring unit 1431 is further configured to determine a feature vector of a specified image frame in a second image group as a feature vector of a corresponding specified image frame in the first image group, the second image group being a previous image group of the first image group in the plurality of image groups, the specified image frame being a same image frame in the first image group and the second image group; and extracting the features of other image frames except the appointed image frame in the first image group to obtain the feature vectors of the other image frames.

In another possible implementation manner, the target region chain obtaining unit 1432 is further configured to perform target detection according to a feature vector of a first image frame in the first image group, so as to obtain at least one initial region of the first image frame; processing at least one initial region according to the feature vectors of a preset number of image frames in the first image group to respectively obtain target regions of the preset number of image frames, wherein the at least one initial region corresponds to at least one target region of each image frame in the first image group one by one; and associating target areas which belong to the same target and different image frames in a preset number of image frames to obtain at least one target area chain of the first image group.

In another possible implementation manner, the target region chain obtaining unit 1432 is further configured to perform target detection according to a feature vector of a first image frame in the first image group, so as to obtain a plurality of initial regions of the first image frame and probabilities corresponding to the initial regions; and selecting at least one initial region with the probability greater than a preset threshold value from the plurality of initial regions.

the feature vector obtaining unit 1431 is further configured to invoke a feature extraction sub-model, and obtain feature vectors of a preset number of image frames in the first image group;

the target area chain obtaining unit 1432 is further configured to invoke a feature detection sub-model, and process feature vectors of a preset number of image frames in the first image group to obtain at least one target area chain of the first image group.

In another possible implementation, as shown in fig. 15, the regional combinatorial chain creation module 1404 includes:

a first determining unit 1441 configured to, for any two adjacent image groups, take the same image frame in any two image groups as a specified image frame, and take a target region of the specified image frame as a specified target region;

a region matching relationship determining unit 1442, configured to determine, according to a specified image frame and a specified target region in any two image groups, a target region matching relationship between any two image groups, where the target region matching relationship includes a matching relationship between any target region of any specified image frame in a first image group and a target region belonging to the same target in the same specified image frame in a second image group;

a region chain matching relationship determining unit 1443, configured to determine a target region chain matching relationship between any two image groups according to a target region chain and a target region matching relationship between any two image groups, where the target region chain matching relationship includes a matching relationship between any item tag region chain of a first image group and a target region chain belonging to the same target of a second image group;

a region combination chain combining unit 1444, configured to combine, according to the target region chain matching relationship, another target region in any target region chain except the specified target region with another target region chain matched with any target region chain, so as to obtain a target region combination chain.

In another possible implementation manner, the region matching relationship determining unit 1442 is further configured to determine multiple groups of candidate matching relationships according to the specified image frames and the specified target regions in any two image groups, where the candidate matching relationships include a matching relationship between each specified target region in any specified image frame in the first image group and any specified target region in the same specified image frame in the second image group, and the multiple groups of candidate matching relationships are different; respectively determining the sum of the similarity of every two matched specified target areas in each group of candidate matching relations as the matching degree of each group of candidate matching relations; and selecting the candidate matching relationship with the maximum matching degree from the multiple groups of candidate matching relationships, and determining the candidate matching relationship as the target area matching relationship.

In another possible implementation, as shown in fig. 15, the second image frame of the second image group is the same as the first image frame of the first image group; the image group processing module 1403 further includes:

the target region predicting unit 1433 is configured to predict, in the first image frame of the first image group, a prediction target region that belongs to the same target as the interruption target region if the first target region chain of the second image group includes any interruption target region in the second image frame of the second image group, and the interruption target region is not matched with each designated target region in the first image frame of the first image group.

In another possible implementation, as shown in fig. 15, the apparatus further includes:

a target region prediction module 1405, configured to perform mapping processing on a prediction target region according to feature vectors of a preset number of image frames in the first image group, to obtain prediction target regions of other image frames in the first image group respectively;

the target area association module 1406 is configured to associate prediction target areas belonging to different image frames in a preset number of image frames to obtain a prediction target area chain of the first image group.

In another possible implementation manner, as shown in fig. 15, the apparatus further includes:

a matching relationship adding module 1407, configured to add a matching relationship between the interruption target region and the prediction target region in the target region matching relationship between the second image group and the first image group.

In another possible implementation, as shown in fig. 15, the image frame acquiring module 1401 includes:

the image frame acquiring unit 1411 is configured to acquire video data and perform frame extraction processing on the video data to obtain a plurality of image frames.

the video data updating module 1408 is configured to add a moving track of at least one target to the video data according to the target area combination chain of the at least one target, so as to obtain updated video data;

the video data playing module 1409 is configured to play the updated video data to display a picture that the at least one target moves according to the corresponding movement track.

Fig. 16 is a schematic structural diagram of a terminal according to an embodiment of the present application, which can implement operations performed by a computer device in the foregoing embodiments. The terminal 1600 may be a portable mobile terminal such as: the mobile terminal comprises a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, Moving Picture Experts compress standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, Moving Picture Experts compress standard Audio Layer 4), a notebook computer, a desktop computer, a head-mounted device, a smart television, a smart sound box, a smart remote controller, a smart microphone, or any other smart terminal. Terminal 1600 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, etc.

Generally, terminal 1600 includes: a processor 1601, and a memory 1602.

Processor 1601 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. Memory 1602 may include one or more computer-readable storage media, which may be non-transitory, for storing at least one instruction for processor 1601 to have in order to implement the target tracking method provided by method embodiments of the present application.

In some embodiments, the terminal 1600 may also optionally include: peripheral interface 1603 and at least one peripheral. Processor 1601, memory 1602 and peripheral interface 1603 may be connected by buses or signal lines. Various peripheral devices may be connected to peripheral interface 1603 via buses, signal lines, or circuit boards. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1604, a display 1605, and audio circuitry 1606.

The Radio Frequency circuit 1604 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuit 1604 communicates with communication networks and other communication devices via electromagnetic signals.

The display screen 1605 is used to display a UI (user interface). The UI may include graphics, text, icons, video, and any combination thereof. The display 1605 may be a touch display and may also be used to provide virtual buttons and/or a virtual keyboard.

Audio circuitry 1606 may include a microphone and a speaker. The microphone is used for collecting audio signals of a user and the environment, converting the audio signals into electric signals, and inputting the electric signals to the processor 1601 for processing or inputting the electric signals to the radio frequency circuit 1604 for realizing voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones may be respectively disposed at different portions of the terminal 1600. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert the electrical signal from the processor 1601 or the radio frequency circuit 1604 into an audio signal.

Those skilled in the art will appreciate that the configuration shown in fig. 16 is not intended to be limiting of terminal 1600, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be employed.

Fig. 17 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 1700 may have a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 1701 and one or more memories 1702, where the memory 1702 stores at least one instruction, and the at least one instruction is loaded and executed by the processors 1701 to implement the methods provided by the foregoing method embodiments. Of course, the server may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input/output, and the server may also include other components for implementing the functions of the device, which are not described herein again.

The server 1700 may be used to perform the object tracking method described above.

The embodiment of the present application further provides a computer device, where the computer device includes a processor and a memory, where the memory stores at least one instruction, and the at least one instruction is loaded and executed by the processor, so as to implement the target tracking method of the foregoing embodiment.

The embodiment of the present application further provides a computer-readable storage medium, where at least one instruction is stored in the computer-readable storage medium, and the at least one instruction is loaded and executed by a processor to implement the target tracking method of the foregoing embodiment.

The embodiment of the present application further provides a computer program, where at least one instruction is stored in the computer program, and the at least one instruction is loaded and executed by a processor, so as to implement the target tracking method of the foregoing embodiment.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only an alternative embodiment of the present application, and is not intended to limit the present application, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method of target tracking, the method comprising:

acquiring a plurality of image frames arranged in sequence;

calling a feature extraction sub-model in a target detection model, inputting the preset number of image frames in a first image group into the feature extraction sub-model, and acquiring feature vectors of the preset number of image frames in the first image group, wherein the first image group is any one of the plurality of image groups;

combining the feature vectors of the preset number of image frames in the first image group to obtain a feature vector combination of the first image group; calling a feature detection submodel in the target detection model, inputting the feature vector combination of the first image group into the feature detection submodel, and processing to obtain at least one target area chain of the first image group, wherein each target area chain comprises a plurality of target areas belonging to the same target, and the target areas are respectively positioned in different image frames in the corresponding image group;

2. The method according to claim 1, wherein said obtaining feature vectors for the preset number of image frames in the first image group comprises:

determining a feature vector of a designated image frame in a second image group as a feature vector of a corresponding designated image frame in the first image group, wherein the second image group is a previous image group of the first image group in the plurality of image groups, and the designated image frame is the same image frame in the first image group and the second image group;

and extracting the features of other image frames except the appointed image frame in the first image group to obtain the feature vectors of the other image frames.

3. The method according to claim 1, wherein the inputting the feature vector combination of the first image group into the feature detection submodel, and processing the feature vector combination to obtain at least one target region chain of the first image group comprises:

performing target detection according to the feature vector of a first image frame in the first image group to obtain at least one initial region of the first image frame;

processing the at least one initial region according to the feature vectors of the preset number of image frames in the first image group to respectively obtain target regions of the preset number of image frames, wherein the at least one initial region corresponds to at least one target region of each image frame in the first image group one to one;

and associating target areas belonging to the same target and different image frames in the preset number of image frames to obtain at least one target area chain of the first image group.

4. The method of claim 3, wherein the performing object detection based on the feature vector of the first image frame in the first image group to obtain at least one initial region of the first image frame comprises:

performing target detection according to the feature vector of a first image frame in the first image group to obtain a plurality of initial regions of the first image frame and corresponding probabilities of the plurality of initial regions;

and selecting at least one initial region with the probability greater than a preset threshold value from the plurality of initial regions.

5. The method of claim 1, wherein creating a target region combination chain of at least one target according to the target region chains of the plurality of image groups and the arrangement order of the plurality of image groups comprises:

regarding any two adjacent image groups, taking the same image frame in any two image groups as a designated image frame, and taking a target area of the designated image frame as a designated target area;

determining a target area matching relationship between any two image groups according to the designated image frames and the designated target areas in any two image groups, wherein the target area matching relationship comprises a matching relationship between any target area of any one designated image frame in a first image group and a target area belonging to the same target in the same designated image frame in a second image group;

determining a target region chain matching relationship between any two image groups according to the target region chains of any two image groups and the target region matching relationship, wherein the target region chain matching relationship comprises a matching relationship between any item tag region chain of the first image group and a target region chain of the second image group which belongs to the same target;

and combining other target areas except the specified target area in any target area chain with another target area chain matched with any target area chain according to the matching relation of the target area chains to obtain the target area combined chain.

6. The method according to claim 5, wherein the determining a target region matching relationship between any two image groups according to the designated image frame and the designated target region in any two image groups comprises:

determining multiple groups of alternative matching relations according to the specified image frames and the specified target areas in any two image groups, wherein the alternative matching relations comprise the matching relation between each specified target area of any specified image frame in the first image group and any specified target area in the same specified image frame in the second image group, and the multiple groups of alternative matching relations are different;

respectively determining the sum of the similarity of every two matched specified target areas in each group of alternative matching relations as the matching degree of each group of alternative matching relations;

and selecting the candidate matching relationship with the maximum matching degree from the multiple groups of candidate matching relationships, and determining the candidate matching relationship as the target area matching relationship.

7. The method of claim 3, wherein the second image frame of the second image group is the same as the first image frame of the first image group; the method further comprises the following steps:

and if the first target area chain of the second image group comprises any one interrupt target area in the second image frame of the second image group, and the interrupt target area is not matched with each specified target area in the first image frame of the first image group, predicting a prediction target area belonging to the same target as the interrupt target area in the first image frame of the first image group.

8. The method of claim 7, further comprising:

mapping the prediction target area according to the feature vectors of the preset number of image frames in the first image group to respectively obtain the prediction target areas of other image frames in the first image group;

and associating the prediction target areas belonging to different image frames in the preset number of image frames to obtain a prediction target area chain of the first image group.

9. The method of claim 7, further comprising:

adding the matching relation between the interruption target area and the prediction target area in the target area matching relation between the second image group and the first image group.

10. The method of any of claims 1-9, wherein said obtaining a plurality of image frames in a sequential order comprises:

acquiring video data, and performing frame extraction processing on the video data to obtain a plurality of image frames;

after the creating of the target region combination chain of at least one target according to the target region chains of the plurality of image groups and the arrangement sequence of the plurality of image groups, the method further comprises:

adding the moving track of the at least one target in the video data according to the target area combination chain of the at least one target to obtain updated video data;

and playing the updated video data to display the picture of the at least one target moving according to the corresponding moving track.

11. An object tracking apparatus, characterized in that the apparatus comprises:

the image group processing module is used for calling a feature extraction sub-model in the target detection model, inputting the preset number of image frames in a first image group into the feature extraction sub-model, and acquiring feature vectors of the preset number of image frames in the first image group, wherein the first image group is any one of the plurality of image groups; combining the feature vectors of the preset number of image frames in the first image group to obtain a feature vector combination of the first image group; calling a feature detection submodel in the target detection model, inputting the feature vector combination of the first image group into the feature detection submodel, and processing to obtain at least one target area chain of the first image group, wherein each target area chain comprises a plurality of target areas belonging to the same target, and the target areas are respectively positioned in different image frames in the corresponding image group;

12. A computer device comprising a processor and a memory, the memory having stored therein at least one instruction, the at least one instruction being loaded and executed by the processor to implement the object tracking method of any of claims 1 to 10.

13. A computer-readable storage medium having stored therein at least one instruction, which is loaded and executed by a processor, to implement the target tracking method of any one of claims 1 to 10.