CN111882582B

CN111882582B - Image tracking correlation method, system, device and medium

Info

Publication number: CN111882582B
Application number: CN202010727266.1A
Authority: CN
Inventors: 姚志强; 周曦; 周牧
Original assignee: Guangzhou Yuncongboyan Intelligent Technology Co Ltd
Current assignee: Guangzhou Yuncongboyan Intelligent Technology Co Ltd
Priority date: 2020-07-24
Filing date: 2020-07-24
Publication date: 2021-10-08
Anticipated expiration: 2040-07-24
Also published as: CN111882582A

Abstract

According to the image tracking association method, the image tracking association system, the image tracking association equipment and the image tracking association medium, one or more target objects in a multi-frame image are subjected to frame marking, and a corresponding target image frame is obtained; marking one or more target image frames belonging to the same target object with the same reference number; and assigning the same color to one or more target image frames belonging to the same target object; and associating one or more target objects according to the label and the color tracking corresponding to the target image frame, and modifying the corresponding error tracking association when the error tracking association occurs. The method can pre-label the target object in the image, track the labeling condition of each frame of picture in the associated video through the label and the color, judge whether the error tracking association exists between the frame pictures, and modify the corresponding error tracking association. The image detection algorithm is trained based on correct labeling of the target object in the video, so that the accuracy of the image detection algorithm in the fields of security, intelligent transportation and the like can be improved.

Description

Image tracking correlation method, system, device and medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an image tracking correlation method, system, device, and medium.

Background

The application scenarios of the image detection technology are many, and the image detection technology can be applied to the fields of security, intelligent transportation and the like to perform image detection. In order to improve the detection accuracy of the image detection algorithm in the fields of security, intelligent transportation and the like, a large amount of image data labeled by target objects is generally required to be provided as training samples to train the image detection algorithm. The image shooting device shoots continuous frame pictures in the fields of security, intelligent transportation and the like, so for target object labeling of the continuous frame pictures, not only the target object labeling of a single frame picture needs to be considered, but also the associated labeling relation between the frame pictures needs to be considered.

Disclosure of Invention

In view of the above-mentioned shortcomings of the prior art, it is an object of the present invention to provide an image tracking correlation method, system, device and medium, which are used to solve the problems of the shortcomings in the prior art.

To achieve the above and other related objects, the present invention provides an image tracking correlation method, including:

performing frame marking on one or more target objects in a multi-frame picture to obtain one or more corresponding target image frames;

marking one or more target image frames belonging to the same target object with the same reference number;

assigning the same color to one or more target image frames belonging to the same target object;

and tracking and associating one or more target objects in the multi-frame pictures according to the labels and colors corresponding to the one or more target image frames.

Optionally, the method further comprises:

identifying whether the same target image frame in the current frame image and the other frame images is the same in label and color;

and determining whether the error tracking correlation exists according to the identification result, and modifying the corresponding error tracking correlation.

Optionally, if there is an error tracking correlation, the identification result includes at least one of:

the label or color of a certain target image frame in the current frame picture is detected again after one or more frames of pictures are separated, the label of the same target image frame in the current frame picture is different from the label of the adjacent one or more frames of pictures, the color of the same target image frame in the current frame picture is different from the color of the adjacent one or more frames of pictures, and the target image frame with the subordinate relationship has wrong subordinate in the one or more frames of pictures.

Optionally, the method further comprises:

acquiring a plurality of target image frames subjected to frame labeling;

and determining the membership of the target image frame by taking a certain target image frame as a parent image frame and taking another target image frame subordinate to the parent image frame as a child image frame according to the preset membership.

Optionally, if a child image frame belonging to a parent image frame is marked as not belonging to the parent image frame; and/or, marking the sub-level image frame which is not subordinate to a certain parent-level image frame as the image frame which is subordinate to the parent-level image frame; the target image frame with the dependency relationship has an erroneous dependency.

Optionally, assigning the same color to one or more target image frames belonging to the same target object comprises:

when a new label appears in a current frame picture, determining the space-time distance between one or more target image frames corresponding to the new label and all the target image frames;

determining color assignment weights based on the spatiotemporal distances;

calculating the weighted distance between each candidate color and the distributed color according to the color distribution weight and the pixel distance between all the candidate colors in the color space; taking the calculated minimum weighted distance corresponding to each candidate color as the acceptance degree of the candidate color;

and acquiring the adoption degrees of all the candidate colors, and distributing the candidate color with the highest adoption degree to the target object corresponding to the new label, so that one or more target image frames belonging to the target object have the same color.

Optionally, when a new label appears in the current frame picture, the method further includes:

acquiring the space distances between one or more target image frames corresponding to the new label and all the target image frames;

acquiring the time distances between one or more target image frames corresponding to the new label and all the target image frames;

and selecting the maximum value of the space distance and the time distance, and taking the maximum value as the space-time distance between one or more target image frames corresponding to the new label and all target image frames.

Optionally, the method further comprises:

numbering each frame of picture, and acquiring a frame number corresponding to each frame of picture;

cutting a single target image frame in each frame of image into a single target object image, and numbering all the cut target object images;

and displaying all target object pictures according to the picture frame number and the target object picture number, and establishing a target object picture sample library.

Optionally, if the label of the same target image frame in the current frame picture is different from the label of the adjacent one or more frames of pictures in the recognition result, and the label of the target image frame in the current frame picture exists in the adjacent one or more frames of pictures; modifying the corresponding error tracking association based on the display interface of the target object picture sample library;

taking the last target object picture before the label of the target image frame in the target object picture sample library is changed as an end frame, or taking the first target object picture after the label of the target image frame in the target object picture sample library is changed as a start frame;

and cutting off the tracking association of the target object corresponding to the target image frame from the ending frame or the starting frame, and allocating a new label to the target object corresponding to the target image frame in the starting frame.

Optionally, if the label of the same target image frame in the current frame picture is different from the label of the adjacent one or more frames of pictures in the identification result, and the label of the target image frame in the current frame picture does not exist in the adjacent one or more frames of pictures; modifying the corresponding error tracking association based on the display interface of the target object picture sample library;

marking a target object picture containing the target image frame appearing for the first time as a starting frame; taking the target object picture without the tracking association relation as a single frame; screening all initial frames containing the target image frame and all independent frames containing the target image frame from the target object image sample library;

and combining the screened starting frame and the screened independent frame into a new target object picture, supplementing an associated pointer for the new target object picture, and distributing a corresponding label.

Optionally, a label is established based on a display interface of the target object picture sample library or a display interface of the one or more frames of pictures; the label includes at least one of: target object type, picture frame bit, key frame.

Optionally, when performing frame annotation on one or more target objects in a multi-frame picture, adding an associated pointer to the one or more target objects; the association pointer includes at least one of: a pointer to a parent image frame, a pointer to a child image frame, a pointer to a previous tracking associated target object, and a pointer to a next tracking associated target object.

Optionally, the method further comprises:

displaying a trajectory line in a display interface of the one or more frames of pictures, the trajectory line being tracked in association with the one or more target objects;

determining whether a mis-tracking correlation exists according to the trajectory line; and modifying the corresponding error tracking association according to the trajectory line.

Optionally, the method further comprises:

selecting one or more target objects in the display interface of the one or more frames of pictures, and loading all target object pictures on the same trajectory line in the display interface of the target object picture sample library according to the selected one or more target objects;

if the label or color of a certain target image frame in the current frame picture is detected again after one or more frames of pictures are separated, fitting the target image frames corresponding to one or more selected target objects based on the current frame picture, the previous frame picture of the current frame and the next frame picture of the current frame to generate corresponding virtual frames;

and taking the corresponding virtual frame as a real frame of the selected one or more target objects, and adding the real frame to one or more frames of pictures at intervals for supplementing.

Optionally, the method further comprises:

and modifying the frame of the target object picture based on the display interface of the target object picture sample library.

Optionally, the method further comprises:

acquiring one or more target image frames subjected to frame labeling;

modifying the target image frame with error or overlap based on the display interface of the one or more frames of images; wherein the error comprises at least one of: super frame, invade frame.

Optionally, the method further comprises:

and setting a control key in the display interface of the one or more frames of pictures, and controlling the display of the one or more frames of pictures through the control key.

Optionally, the target object comprises at least one of: human body, human head, human face;

the target image frame includes at least one of: a human body frame, a human head frame and a human face frame.

The invention also provides an image tracking correlation system, which comprises:

the pre-labeling module is used for performing frame labeling on one or more target objects in the multi-frame pictures to obtain one or more corresponding target picture frames;

the marking module is used for marking the same label on one or more target image frames belonging to the same target object;

the distribution module is used for distributing the same color to one or more target image frames belonging to the same target object;

and the tracking association module is used for tracking and associating one or more target objects in the multi-frame pictures according to the labels and the colors corresponding to the one or more target image frames.

Optionally, the method further comprises:

acquiring a plurality of target image frames subjected to frame labeling;

determining color assignment weights based on the spatiotemporal distances;

Optionally, the method further comprises:

acquiring one or more target image frames subjected to frame labeling;

Optionally, the method further comprises:

The invention also provides an image tracking association device, which comprises:

The present invention also provides an apparatus comprising:

one or more processors; and

one or more machine-readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform a method as in any one of the above.

The invention also provides one or more machine-readable media having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform the method as described in any one of the above.

As described above, the image tracking association method, system, device and medium provided by the present invention have the following beneficial effects: performing frame marking on one or more target objects in a multi-frame picture to obtain one or more corresponding target image frames; marking one or more target image frames belonging to the same target object with the same reference number; and assigning the same color to one or more target image frames belonging to the same target object; and tracking and associating one or more target objects in the multi-frame pictures according to the labels and colors corresponding to the one or more target image frames. For a large number of video images obtained from the fields of security, intelligent transportation and the like, the method can firstly mark the target object in the single-frame or multi-frame image by a preset algorithm in advance to obtain a plurality of target image frames. Compared with manual direct labeling, the method has the advantages that the labeling efficiency can be improved and the labeling cost can be reduced by performing pre-labeling through a preset algorithm. Because video images are obtained from the fields of security, intelligent transportation and the like, and frame pre-marking is carried out aiming at the frame images, when all single frame images in the video are pre-marked, the target image frames between the frame images are not related; therefore, the invention finds out the errors (such as frame loss, heel breaking, heel jumping and other errors) occurring when the associated target object is tracked and modifies the corresponding errors by marking the label and distributing the color for the pre-labeled target image frame and tracking the target object of all the frame images in the associated video image based on the corresponding label and color. Therefore, the method can not only pre-label the target object in the image through a preset algorithm, but also track the labeling condition of each frame of picture in the associated video through the label and the color, judge whether the error tracking association exists between the frame pictures, and modify the corresponding error tracking association. Meanwhile, the image detection algorithm is trained based on correct labeling of the target object in the video, so that the accuracy of the image detection algorithm in the fields of security, intelligent transportation and the like can be improved.

Drawings

FIG. 1 is a schematic flow chart diagram illustrating an image tracking correlation method according to an embodiment;

FIG. 2 is a schematic diagram of tracking an associated target object, according to an embodiment;

FIG. 3 is a schematic diagram of tracking an associated target object according to another embodiment;

FIG. 4 is a schematic illustration of a super-block diagram according to an embodiment;

FIG. 5 is a schematic illustration of a super-block diagram according to another embodiment;

FIG. 6 is a schematic view of an exemplary embodiment of an invader frame;

FIG. 7 is a diagram illustrating a hardware configuration of an image tracking correlation system according to an embodiment;

FIG. 8 is a schematic diagram illustrating tracking of an associated target object, according to an embodiment;

FIG. 9 is a schematic diagram of tracking an associated target object according to another embodiment;

fig. 10 is a schematic hardware structure diagram of a terminal device according to an embodiment;

fig. 11 is a schematic diagram of a hardware structure of a terminal device according to another embodiment.

Description of the element reference numerals

M10 pre-labeling module

M20 marking module

M30 distribution module

M40 tracking association module

Head frame of 10 persons X

Human body frame of 20 persons X

Face frame of 30 people X

40 protective film layer

Body frame of 200 persons Y

1100 input device

1101 first processor

1102 output device

1103 first memory

1104 communication bus

1200 processing assembly

1201 second processor

1202 second memory

1203 communication assembly

1204 Power supply Assembly

1205 multimedia assembly

1206 voice assembly

1207 input/output interface

1208 sensor assembly

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.

It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.

Referring to fig. 1 to 6, the present invention provides an image tracking correlation method, including:

s100, performing frame marking on one or more target objects in a multi-frame picture to obtain one or more corresponding target picture frames;

s200, marking the same label on one or more target image frames belonging to the same target object;

s300, distributing the same color to one or more target image frames belonging to the same target object;

and S400, tracking one or more target objects in the associated multi-frame pictures according to the labels and colors corresponding to the one or more target image frames.

According to the method, for a large number of video images obtained from the fields of security, intelligent transportation and the like, frame labeling can be performed on a target object in a single-frame or multi-frame picture in advance through a preset algorithm to obtain a plurality of target image frames. Compared with manual direct labeling, the method has the advantages that the labeling efficiency can be improved and the labeling cost can be reduced by performing pre-labeling through a preset algorithm. Because video images are obtained from the fields of security, intelligent transportation and the like, and frame pre-marking is carried out aiming at the frame images, when all single frame images in the video are pre-marked, the target image frames between the frame images are not related; therefore, the method finds out errors (such as frame loss, heel breaking, heel jumping and other errors) occurring when the associated target object is tracked and modifies the corresponding errors by marking labels and distributing colors for the pre-labeled target image frames and tracking the target objects of all frame images in the associated video image based on the corresponding labels and colors. Meanwhile, the image detection algorithm is trained based on correct labeling of the target object in the video, so that the accuracy of the image detection algorithm in the fields of security, intelligent transportation and the like can be improved.

In an exemplary embodiment, the method further includes identifying whether the same target image frame in the current frame picture and the rest frame pictures is the same label and the same color; and determining whether the error tracking correlation exists according to the identification result, and modifying the corresponding error tracking correlation. If the error tracking association exists, the identification result comprises at least one of the following: the label or color of a certain target image frame in the current frame picture is detected again after one or more frames of pictures are separated, the label of the same target image frame in the current frame picture is different from the label of the adjacent one or more frames of pictures, the color of the same target image frame in the current frame picture is different from the color of the adjacent one or more frames of pictures, and the target image frame with the subordinate relationship has wrong subordinate in the one or more frames of pictures.

In the embodiment of the present application, if the label or color of a certain target image frame in the current frame picture is detected again after one or more frames of pictures are separated, a frame loss may occur. That is, after the label or color of the target image frame is detected in the current frame picture, the label or color of the target image frame is not detected in the next frame or frames, and after one or more frames are separated, the label or color of the target image frame is detected again, and then the frames of pictures in which the label or color of the target image frame is not detected are called lost frames. If the label of the same target image frame in the current frame picture is different from the label of the adjacent one or more frames of pictures, and the label of the target image frame in the current frame picture does not exist in the adjacent one or more frames of pictures, the tracking interruption occurs. For example, the target frame has a label replaced in a certain frame, and the original label does not exist, and the target frame has a new label in the frame, then the tracking-off may occur. If the label of the same target image frame in the current frame picture is different from the label of the adjacent one or more frames of pictures, and the label of the target image frame in the current frame picture exists in the adjacent one or more frames of pictures, the tracking may occur. For example, when a target frame loses an original label in a frame and the original label jumps to another frame, it is called scurrying. Cross-tracking is most commonly inter-tracking, i.e., two image frames have been interchanged in a frame. For multi-level target objects, there may also be complex fleeting. For example, for a multi-level target object (human body, head, face), it is possible that a human body frame of a certain frame a gets caught on a human body of a frame b; in another frame, the head frame of the first person moves to the third person; or the human body frame and the human head frame do not jump to the heel, but the human face frame jumps to the body of the third person.

In an exemplary embodiment, the method further comprises the steps of obtaining a plurality of target image frames subjected to frame labeling; and determining the membership of the target image frame by taking a certain target image frame as a parent image frame and taking another target image frame subordinate to the parent image frame as a child image frame according to the preset membership. The preset membership in the embodiment of the present application is set according to an actual target source, for example, if the target source is a human, the target object includes at least one of the following: human body, human head, human face; the target image frame includes at least one of: a human body frame, a human head frame and a human face frame. As an example, the preset affiliation may be configured to establish an affiliation between the body frame and the head frame by using the body frame as a parent-level image frame and the head frame as a child-level image frame; and taking the human head frame as a parent-level image frame and the human face frame as a child-level image frame, and establishing the membership between the human head frame and the human face frame. In the embodiment of the application, if the human body frame is taken as a parent-level image frame, the human head frame is taken as a child-level image frame, and the human face frame is taken as a child-level image frame of the human head frame, grandchild association is formed among the human body frame, the human head frame and the human face frame; namely, the human body frame and the human head frame are in a parent-child subordination relationship, and the human head frame and the human face frame are in a parent-child subordination relationship. In the embodiment of the application, the human body frame, the human head frame and the human face frame have unique subrelations, wherein the human head frame is a child node of the human body frame, and the human face frame is a child node of the human head frame; the human body frame of the same person does not have two human head frames as child nodes, and the human head frame of the same person does not have two human face frames as child nodes. In the embodiment of the application, in order to indicate uniqueness among the human body frame, the human head frame and the human face frame, the human body frame, the human head frame and the human face frame of the same person can be set to be the same number. The setting of the number is set according to the personnel in the image picture, and the application is not limited to specific numerical values. For example, the number of the human body frame, the human head frame, and the human face frame corresponding to a certain person in a certain frame of image picture may be set to 23.

According to the above description, in an exemplary embodiment, if a child image frame belonging to a parent image frame is marked as not belonging to the parent image frame; and/or, marking the sub-level image frame which is not subordinate to a certain parent-level image frame as the image frame which is subordinate to the parent-level image frame; the target image frame with the dependency relationship has an erroneous dependency. As an example, for a certain person in a certain frame of image picture, if the head frame of the person is marked as a body frame not belonging to the person, the head frame of the person is marked as a child node of the body frame not belonging to the person, so that the head frame of the person is not associated with the body frame of the person; or marking the head frames of other persons as the body frames belonging to the person, namely marking the head frames of other persons as the body frames belonging to the person, so that the head frames of other persons are associated with the body frame of the person; the pre-labeled image frame is considered to have an incorrect dependency. As another example, for a person in a certain frame of image picture, if the face frame of the person is labeled as a head frame not belonging to the person, the face frame of the person is labeled as a child node of the head frame not belonging to the person, so that the face frame of the person is not associated with the head frame of the person; or marking the face frames of other persons as the head frames belonging to the person, namely marking the face frames of other persons as the head frames belonging to the person, so that the face frames of other persons are associated with the head frames of the person; the pre-labeled image frame is considered to have an incorrect dependency. If the sub-level image frame belonging to a certain parent-level image frame is marked as not belonging to the parent-level image frame; drawing a curve or straight line to pass through the corresponding sub-level image frame by taking any position in the parent-level image frame as a starting point, creating the association between the parent-level image frame and the corresponding sub-level image frame, and modifying the corresponding error slave. For example, if a person in a certain frame of image screen is marked as a human body frame not belonging to the person, such error marking can be modified by a line-drawing correlation method. Specifically, a straight line or a curve is drawn to pass through the corresponding human head frame by taking any position in the human body frame as a starting point, and the human body frame and the human head frame of the person are automatically associated through a preset algorithm, so that the error marking is modified. The false labeling is modified by a line drawing association method, new association can be quickly created again, a new subordinate complete system is established, and multi-level target objects such as human bodies, human heads and human faces can be associated at the same time. Wherein the vertical direction in fig. 5 and 6 is the parent-child-slave association.

Due to the fact that a plurality of target sources (people and vehicles) exist in security and intelligent transportation videos, colors and labels of one or more target objects belonging to the same target source are the same. By distributing a label and a color to each target source, target objects (human bodies, human heads, human faces, human key points and the like) with the same label and target image frames (human body frames, human head frames and human face frames) with the same label can all display the same color, so that the target objects can be easily found when matching errors occur. If the colors are randomly distributed by using a random distribution algorithm, for two target objects with similar positions and different labels, a certain probability is distributed to the two similar colors; if the two target objects actually come from the same target source, it is difficult to find this labeling error due to the close colors. In addition, the tracking association between frames is the case, for example, if a person breaks down, the label in this frame is ID1, and the label in the next frame becomes ID2, then this person will be displayed in two colors; so that a mis-tracking correlation occurs when tracking the correlation between frames. The observer cannot find the tracking problem because the random allocation algorithm may allocate two similar colors to the target image frames of the previous and subsequent frames. Therefore, the color is distributed to the target source, the target object and the target image frame by using the space-time minimum maximum difference color distribution algorithm. Specifically, when a new label appears in a current frame picture, acquiring the spatial distances between one or more target image frames corresponding to the new label and all the target image frames; acquiring the time distances between one or more target image frames corresponding to the new label and all the target image frames; and selecting the maximum value of the spatial distance and the temporal distance, and taking the maximum value as the space-time distance between one or more target image frames corresponding to the new label and all the target image frames. Determining color assignment weights based on the spatio-temporal distances; calculating the weighted distance between each candidate color and the distributed color according to the color distribution weight and the pixel distance between all the candidate colors in the color space; taking the calculated minimum weighted distance corresponding to each candidate color as the acceptance degree of the candidate color; and acquiring the adoption degrees of all the candidate colors, and distributing the candidate color with the highest adoption degree to the target source corresponding to the new label, so that one or more target objects belonging to the target source have the same color, and simultaneously, one or more target image frames corresponding to each target object also have the same color.

As an example, a three-dimensional discrete color space is first defined, such as R (0-25-255) G (0-25-255) B (0-25-255). Wherein, the middle 25 is a sampling interval, i.e. 25 is a sampling period; by this definition, the discrete color space can represent 1000 colors. In general, such multiple objects do not appear in the image of a single frame and its adjacent frames in the video, and thus the color is sufficient. Secondly, defining the space-time distance between target image frames; in the embodiment of the application, the space-time distance between image frames which come from the same target object and have the same label is 0; the space-time distance between the target image frames of different labels is Max (space distance, time distance). Wherein, if the two image frames are mutually overlapped, the space distance is 0; if there is no overlap, the maximum value of the shortest distance in the X-axis direction/the maximum width of the two image frames and the shortest distance in the Y-axis direction/the maximum height of the two image frames is taken as the spatial distance. The time distance between the two image frames is set by taking seconds as a unit according to the video frame rate; for example, a video frame rate of 30fps (frame Per second), the temporal distance of one frame apart is 1/30. Color assignment weights are then determined based on spatio-temporal distances, such as coef (a, B) ═ sigmoid (dist (a, B)). A, B are all target objects. When a new tracking associated label appears in a certain frame and needs to be assigned with a new color, all candidate colors are searched in a traversal mode in a discrete color space, and the weighted distance between each candidate color and all colors which are currently assigned is calculated, namely the weighted distance is pixel distance cofe (A, B). And selecting the minimum weighted distance as the acceptance degree of the candidate color, acquiring the acceptance degrees of all the candidate colors at the same time, and distributing the candidate color with the highest acceptance degree to the target source corresponding to the new label, so that one or more target objects belonging to the target source have the same color, and simultaneously one or more target image frames corresponding to each target object also have the same color. According to the embodiment of the application, two dimensions of time and space are considered at the same time, so that both the slave association error and the tracking association error are easier to discover. In addition, the method is dynamic, only colors are allocated to the current active objects, and the colors of the inactive objects can be recycled within a period of time, so that the color sparsity is ensured. When a label disappears in a certain frame (it may be that the target object disappears in the picture, or the label is broken or omitted), the color of the label is not immediately recovered, but it is used as a memory target, and 30 frames are reserved. The frame number/time of the memory object stays at the last frame before disappearance. If the time distance between the newly appeared target frame and the memory target needs to be calculated, the difference value between the current frame time and the memory target time is calculated. When a new labeled target object appears at the same position in a short time after a certain label disappears, the situation may be a broken heel, and the space-time distance between the new target and the memory target is smaller because of the close space position and the short time interval. And a small space-time distance brings about a large color difference, and the large color difference is easier to find corresponding errors.

In an exemplary embodiment, the method further comprises numbering each frame of picture, and acquiring a frame number corresponding to each frame of picture; cutting a single target image frame in each frame of image into a single target object image, and numbering all the cut target object images; and displaying all target object pictures according to the picture frame number and the target object picture number, and establishing a target object picture sample library. The method comprises the following steps of establishing a label based on a display interface of a target object picture sample library or a display interface of one or more frames of pictures; as an example, the tag includes at least one of: target object type, picture frame bit, key frame. The target object type may include at least one of: human body, human head, human face; the picture frame bits may include at least one of: start frame, end frame, independent frame, intermediate frame, start frame. In the embodiment of the application, a first target image with the label of the target image frame changed in the target image sample library is used as a starting frame; taking the last target object picture before the label of the target image frame in the target object picture sample library is changed as an end frame; marking a target object picture containing the target image frame appearing for the first time as a starting frame; taking the target object picture without the tracking association relation as a single frame; the frame in the middle of the start frame and the end frame is taken as a key frame. There is only one key frame for each label, each target object type, and the others are non-key frames.

According to the above description, the attribute bar is adjusted to the frame bit attribute, and the frame bit attribute of each target can be viewed and edited on the display interface of the target object picture sample library. For the target objects belonging to the same target source and having the same label, the display is usually started from the start frame and ended from the end frame, and then the next batch of target objects is displayed. If the target object picture is replaced from a certain position and becomes another target object picture, but the replacement position is not marked as an end frame and a start frame, the situation that the leap-down occurs is shown. Therefore, the jump-tracking error can be modified based on the display interface of the target object picture sample library, namely jump-tracking cutting is performed on the display interface of the target object picture sample library. Specifically, if the label of the same target image frame in the current frame picture is different from the label of the adjacent one or more frames of pictures in the identification result, and the label of the target image frame in the current frame picture exists in the adjacent one or more frames of pictures; modifying the corresponding error tracking association based on the display interface of the target object picture sample library; taking the last target object picture before the label of the target image frame in the target object picture sample library is changed as an end frame, or taking the first target object picture after the label of the target image frame in the target object picture sample library is changed as a start frame; and starting from the ending frame or the starting frame, cutting off the tracking association of the target object corresponding to the target image frame, automatically cutting off an association pointer between the two target image frames, allocating a new label to the target object corresponding to the target image frame in the starting frame, and marking all the subsequent frame pictures of the target object with the new label. If the target objects are multi-level target objects (such as human bodies, human heads and human faces), one-time fleeing heel check and cutting-off can be respectively carried out on the three types of target objects. Each time the node is disconnected, the node is disconnected together with the parent node and the child node. In the embodiment of the application, the lower the resolution is, the more difficult the detection of the jump and the follow is; for example, under the condition of low resolution, the approximation rate of different faces is very high, and it is not easy to judge whether the heel jump occurs. Therefore, the human body frame or the human head frame with higher identification degree can be selected as much as possible to judge whether the heel fleeing occurs. By cutting off the tracking on the display interface of the target object picture sample library, the error tracking association of the next frame or frames of pictures can be avoided in time; meanwhile, new labels are allocated to the following target objects, and the corresponding error tracking association problem is solved.

According to the above description, in an exemplary embodiment, the tracking interruption error may also be modified based on the display interface of the target object picture sample library, that is, the tracking interruption splicing is performed on the display interface of the target object picture sample library. Specifically, a target object picture containing the target picture frame appearing for the first time is marked as a start frame; taking the target object picture without the tracking association relation as a single frame; screening all initial frames containing the target image frame and all independent frames containing the target image frame from a target object image sample library; and combining the screened starting frame and the screened independent frame into a new target object picture, supplementing an associated pointer for the new target object picture, and distributing a corresponding label. According to the method and the device, the specific target object type is screened out by setting different screening conditions, the images of the starting frame and the single frame are screened and displayed, and the images of the ending frame and the intermediate frame are not displayed. Thus, only one target object will be displayed on a complete track segment. However, after the tracking is cut off, many incomplete tracking related segments are added, and when labels are assigned to newly generated tracking related segments, only one number can be selected from the back, which is usually large, so that two tracking related segments with similar space-time distances have large tracking related label differences, are likely to be far away from each other on a display interface of a target object picture sample library, and are not beneficial to finding two identical target objects. Therefore, before the break-and-follow splicing is carried out, the tracking associated labels can be rearranged according to the entering sequence of the target object. In the embodiment of the application, on a display interface of a target object picture sample library, each target object picture represents a tracking association segment, the same target object pictures are selected for combination, are combined into the same reference number, are supplemented with corresponding association pointers, and are automatically associated with parent-level target object pictures and child-level target object pictures through the association pointers.

According to the above description of some exemplary embodiments, when performing frame tagging on one or more target objects in a multi-frame picture, the method further includes adding an associated pointer to the one or more target objects; the association pointer includes at least one of: a pointer to a parent image frame (parent pointer), a pointer to a child image frame, a pointer to a previous tracking related target object (before pointer), and a pointer to a next tracking related target object (after pointer). By adding the association pointers, a tracking association tree data structure can be formed, and meanwhile, the target object can be observed, searched, positioned and edited on the tracking association tree with higher efficiency.

In an exemplary embodiment, the method further comprises displaying a trajectory line tracking the associated one or more target objects in a display interface of the one or more frames of pictures; determining whether a mis-tracking correlation exists according to the trajectory line; and modifying the corresponding error tracking association according to the trajectory line. In the embodiment of the application, although most of fleeing heels and a part of broken heels can be modified through the display interface of the target object picture sample library, the error tracking association can also be modified efficiently; however, it sacrifices scene information, spatial information; therefore, for part missing and new fleeing problems caused by accidental misoperation, the user needs to return to the video playing interface for supplement. For example, by displaying the trajectory of the target, the missing (frame loss), the broken (heel break), and the abnormal route (heel jump) of the trajectory can be found. Specifically, the displayed trajectory lines include a forward trajectory line and a reverse trajectory line; and traversing the target object forwards (or backwards) by taking the upper left vertex of the target image frame of the current frame as a starting point. As the before pointer and the after pointer are added among the target objects with the tracking incidence relation, the target objects with the same label and the same type can be traversed easily according to the time sequence. And drawing the position of the top left vertex of the target in a point form, wherein the color of the top left vertex is the same as that of the target object of the current frame. Drawing a connecting line between the corresponding top left vertex positions in the two frames of pictures, wherein if the two frames of pictures are continuous frames, the color of the connecting line is the same as that of the target object; if not a consecutive frame (there is a missing frame), the color of the link segment is black. Meanwhile, if the trajectory line ends at the edge position of the picture, it may be that the target object naturally exits the picture (the target object really disappears); but if the trajectory line ends in the middle region of the image, it may be that the target has broken its heel; if the trajectory suddenly makes an unreasonable turn at a certain position, it may be that heel-skipping has occurred. Therefore, the embodiment of the application determines whether a tracking correlation error exists or not through the trace line, and the position of the frame loss can be seen. By observing the abnormal trajectory line, the mouse can be moved to the position near a certain point on the trajectory line, the point is selected by a key, and the corresponding position is quickly positioned for viewing and modifying. Wherein, the horizontal direction in fig. 5 and fig. 6 is the tracking relation of the inter-frame pictures.

According to the record, the label information can be edited on a video playing interface or a multi-frame picture playing interface, and errors such as scurrying, breaking, losing frames, inaccurate frame shapes and the like can be modified according to the corresponding label information. As an example, if the tracking interruption is performed on the video playing interface or the multi-frame picture playing interface by using the pairwise association method, there are: firstly, selecting an Nth frame of picture, and pressing a shift key to set a target object A in the Nth frame of picture as a correlation front point; and then switching to the M frame, selecting a target object B in the M frame picture by long-pressing a left mouse button, and setting the target object B as a related rear point. If A and B are target objects of the same type and are not associated, automatically establishing a tracking association relation; if A, B already has a trace association, it will be automatically switched off. When the interval between two frames is far, the key functions of forward tracing, backward tracing, bookmark and the like can be used for cutting off. For example, to cut off the tracking association between A and B; where A is at frame 100 and B is at frame 500 (with an occlusion or no annotation in the middle); then select a in frame 100, press shift key to set a as the point before association, press the "backtracking" key, jump directly to frame 500B, set B as the point after association, because A, B has already tracked the association relation, will cut it at this moment. The embodiment of the application adopts four keys on the keyboard to realize the functions of ' up-tracking ', ' down-tracking ', ' forward-tracking ' and ' backward-tracking ', and uses ' less ' and ' to realize the functions of ' forward break point ' and ' backward break point ' respectively. Wherein, the 'backtracking': and switching to a parent node target object corresponding to the target object. "go down": and switching to a child node target object corresponding to the target object. "front tracing": and switching to a tracking related target object in front of the target object. "backtracking": and switching to a tracking associated target object behind the target object. "front break point": and searching the target object forwards to locate the position where the interruption occurs. "rear break point": and searching backwards for the target object, and positioning to the position where the interruption occurs. "first target object": the target object is searched forward and positioned to the position where the target object appears for the first time. "last target object": the target object is searched backwards and positioned to the position which appears last before disappearance. Due to the fact that the bookmark function is designed, when the bookmark is added, if the bookmark is added to a certain frame of picture; clicking on the bookmark can return to the frame of picture.

As another example, the consecutive heel problem is handled at a video playing interface or a multi-frame picture playing interface by using the frame-folding technique. Specifically, when the target object has continuous fragmentation and tracking, for example, the ID number of the first frame marker is 1, the ID number of the second frame marker is 2, and the ID number of the third frame marker is 3; if the correlation is performed by the pairwise correlation method, the speed is relatively slow. If no other target object is interfered nearby the target object at the moment, continuous multiple frames can be connected in series at one time. That is, from the first frame, the forward key is pressed in sequence while the Ctrl key is held. Since the pressed Ctrl key has a reservation function, the old object does not disappear and the new object is superimposed every frame. After a plurality of frames are superposed, the mouse is moved to the inside of the target image frame of the first frame, and then a curve or a straight line is dragged along the target motion direction until the inside area of the target frame of the last frame; the fragmented target objects are automatically spliced into a unified tracking association ID number.

In an exemplary embodiment, the method further comprises the steps of obtaining one or more target image frames subjected to frame labeling; modifying the target image frame with errors or overlapping based on the display interface of one or more frames of images; wherein the error comprises at least one of: super frame, invade frame. If the two image frames have parent-child subordination, wherein the frame-shaped area of the child image frame is not completely positioned in the frame-shaped area of the parent image frame, and a dotted line is connected between the key point of the child image frame and the key point of the parent image frame; the pre-labeled image frame has a super frame. As an example, as shown in fig. 4, if the head frame 10 of the person X and the corresponding body frame 20 are correctly labeled, but the frame-shaped area of the head frame 10 is not completely located within the frame-shaped area 20 of the body frame, and a dotted line is connected between the vertex of the head frame 10 and the corresponding body frame 20, it is considered that the image frame after the pre-labeling has a super frame. As another example, as shown in fig. 5, if the face frame 30 of the person X and the corresponding head frame 10 are correctly labeled, but the frame-shaped area of the face frame 30 is not completely located within the frame-shaped area 10 of the head frame, and a dotted line is connected between the vertex of the face frame 30 and the corresponding head frame 10, it is determined that the pre-labeled image frame has a super-frame. The method and the device for modifying the super-frame can modify the super-frame through a vertex dragging method, specifically, a moving cursor is moved to the position near a key point of a parent-level image frame or a child-level image frame through moving a mouse, a left button of the mouse is pressed to directly drag the key point of the parent-level image frame or the key point of the child-level image frame, and a frame-shaped area of the parent-level image frame or the child-level image frame is changed, so that the frame-shaped area of the child-level image frame is completely located in the frame-shaped area of the parent-level image frame. In the conventional method, the corresponding parent-level image frame or the child-level image frame is selected first, and then the vertex of the selected parent-level image frame is dragged, so that one extra step is time-consuming. Compared with the traditional method, the method can reduce one operation step and save corresponding time. If a super-box occurs, the super-box may be modified by anchor point modification. Specifically, when the anchor point is located outside the image frame, the left mouse button is pressed, a button instruction is sent, a corresponding button instruction is responded, and the edge corresponding to the image frame is directly expanded to the point. When the anchor point is located inside the image frame, only one of the edges can be selected to shrink to the anchor point, specifically, the edge with the smallest area loss is selected. If one of the four shortcut keys, up, down, left and right, is pressed at the same time, one of the edges, up, down, left and right, is designated to shrink to the anchor point. In general, the frame shape modification by the anchor point method is more efficient than the frame shape modification by the vertex dragging method; and in the operation-intensive frame marking/modifying task, the anchor point method has more advantages than the top-supporting point method.

If the two image frames do not have a parent-child relationship, the frame-shaped area of one image frame is completely located in the frame-shaped area of the other image frame, and the contact surface of the two image frames is the existing protective film layer, the marked image frame has an intrusion frame. As an example, as shown in fig. 6, the frame-shaped area of the human head frame 10 of the person X is completely located within the frame-shaped area of the human body frame 200 of the person Y, and the contact surface of the two image frames is provided with the protective film layer 40. In the embodiment of the application, when the human head frame which does not belong to a certain human body frame enters the frame-shaped area of the human body frame, a layer of obvious contact layer is attached to the periphery of the contact edge between the human body frame and the human head frame. This design mimics the immune response process in which cells invaded by microorganisms, forming a protective outer membrane of protein on the contact surface with the microorganisms. It can visually reflect that the head frame which is wrapped does not belong to the body frame below, and is a foreign object or an invader.

The embodiment of the application also sets a control key on the video playing interface (namely the display interface of one or more frames of pictures), and controls the display of the one or more frames of pictures through the control key. The control keys comprise a forward key, a backward key and a pause/play key, and the three basic keys are used for assisting the operation based on the video play interface. Wherein the pause/play key: when the video is played, the video is converted into pause by pressing; when the video is paused, the video is converted into a playing state by pressing. A forward key: when the video is paused, a forward frame is pressed, if the video is continuously pressed and not placed, the video is switched to a temporary playing state after waiting for a certain time (400ms), and when the key is released, the video is paused again. A back key: when the video is paused, a frame is backed up after being pressed, if the video is continuously pressed and not put, the video is switched to a temporary reverse playing state after waiting for a certain time (400ms), and the pause is resumed after the key is released. The three keys can be operated by the ring finger, middle finger and index finger of the left hand in the order of the back key, pause/play key and forward key. On a computer keyboard, three functions are realized by using three keys, namely a key A, a key S and a key D, so that a shift key and a ctrl key can be pressed by a small thumb, and a space key can be pressed by a big thumb.

When the target image frame to be modified is not limited to a plurality of frames before and after, but is in batch and large segment, the target image frame can be called back to a batch small image interface (namely a target object image sample library display interface) for processing. The batch thumbnail interface at this point is no longer for all target objects, but rather for small-scale batches of the selected target objects. In particular, the amount of the solvent to be used,

for single-track jump cutting, when certain target image frame jump is found, a corresponding target object can be selected in a video playing interface, and all target object images on the same track line are loaded in a batch small image interface according to the selected target object; and then finding out the fleeing position, and setting the frame bit attribute corresponding to the corresponding target object as an 'end frame'.

For the single-target object tracking splicing, when a target object is found to have a plurality of fragmented tracking video segments, many situations to be spliced are inevitably missed due to too many displayed segments. At this time, the batch minigraph interface of the selected target object can be entered for the selected tracking video clip containing the target object. If all target object tracking video segments are taken as an integral object, the tracking distances between all target object tracking segments and the selected target object tracking segment can be calculated, and then the target object tracking segments are displayed in the batch small graph interface from small to large. Wherein the time distance is the closest time distance of two tracking segments. The spatial distance is the relative distance between the two target frames in the closest frames (i.e. the distance between the centers of the target frames divided by the length of the diagonal of the largest frame). If both tracking segments have the target object in the same frame, their tracking distance is infinite, otherwise, the tracking distance is MAX (time distance, space distance). When two tracking segments are far apart, the tracking distance calculated directly will be larger, but the tracking distance can be calculated by using other tracking segments as an intermediary, which is similar to two strip magnets with a small attraction force, but the attraction force of the two magnets increases the same after the other strip magnet is placed between the two magnets: the tracking distance between the tracking segment R and the tracking segment S is MIN (the tracking distance between the tracking segment R and the tracking segment S, the tracking distance between the tracking segment R and the tracking segment T + the tracking distance between the tracking segment T and the tracking segment S), and the tracking segment T may be any tracking segment. Through the method, the tracking distances from all the tracking segments to the selected tracking segment are finally calculated, the tracking segments are placed in a batch small image interface after being sorted from small to large, target image frames from the same object are selected, a combination key is clicked, the target image frames are spliced and fused into a tracking label. The distance algorithm for tracking segments proposed here is to make the distance between tracking segments from the same object as small as possible, and the tracking segments are ranked in front when sorted, so that they are easier to find.

For single-track batch frame supplementing, selecting one or more target objects in a video playing interface, and loading all target object pictures on the same track line in a batch small picture interface according to the selected one or more target objects; for the frame loss condition, namely the label or color of a certain target image frame in the current frame picture is detected again after one or more frames of pictures are separated, the target image frames corresponding to the selected one or more target objects are fitted based on the current frame picture, the previous frame picture of the current frame and the next frame picture of the current frame, and corresponding virtual frames are generated; and taking the corresponding virtual frame as a real frame of the selected one or more target objects, and adding the virtual frame to one or more frames of pictures at intervals for supplementing. By way of example, a target object is selected in a video playing interface, and a batch thumbnail interface of the selected target object is entered. At this time, not only all target objects on the track are loaded, but also a virtual frame is generated and loaded according to the image frames marked in the previous and next frames of the target objects when the frames are lost. The interface will default to screening to show only those virtual boxes that are fit. Clicking the virtual box target by the mouse can confirm the virtual box target as a real box target, and adding a real target at a corresponding position. Pressing the 'R' key in the input keyboard and clicking the target restores it to the original virtual frame state.

For single-track batch frame repairing, selecting one or more target objects in a video playing interface, and loading all target object pictures on the same track line in a batch small picture interface according to the selected one or more target objects; and modifying the frame of the target object picture based on the batch small picture interface. By way of example, a continuous piece of target object pictures is activated in batch on a batch thumbnail interface, and then the upper, lower, left and right frames of the target object pictures are uniformly adjusted, or the target object pictures are translated and zoomed integrally. In addition, there may be a slight difference in the amplitude of each target object picture that needs to be adjusted; thus, the present application supports pruning the active target object picture queues from the left and right. When the leftmost target box is trimmed in place, the leftmost target is removed, then batch adjustment is continued, the next left target is adjusted in place, and then removed, and so on. Because many deviations of the target image frame are continuous, fine adjustment made by the method can be accumulated, when the left head of the activation queue reaches a certain target, the error is not large, and only some fine adjustment is needed.

Based on the above description, the coverage areas of the embodiments of the present application when modifying missing frames, jumping heels, short interval broken heels, and long interval broken heels are shown in table 1.

TABLE 1 modification of coverage during frame loss, heel jump, short interval broken heel, and long interval broken heel

Capability of coverage	Frame loss	Flee heel	Short-interval broken heel	Long-interval broken heel
					Video playing interface	100％	90％	90％	50％
Batch minigraph interface for all target objects	0％	99％	50％	50％
					Batch thumbnail interface for selected target objects	80％	100％	95％	95％

According to the above description, the operation efficiency of the embodiment of the present application in modifying the missing frame, the jumping heel, the short interval broken heel and the long interval broken heel is shown in table 2.

TABLE 2 efficiency of operations in modifying missing frames, fleeing heels, short interval broken heels, long interval broken heels

Capability of coverage	Frame loss	Flee heel	Short-interval broken heel	Long-interval broken heel
					Video playing interface	0.5	0.5	0.9	0.3
Batch minigraph interface for all target objects	0	1.0	0.9	0.9
					Batch thumbnail interface for selected target objects	0.7	0.8	0.5	0.5

According to the above table, the batch small image interface (i.e. the sample library display interface of the target object picture) has high operation efficiency, but has a limited coverage; the video playing interface (i.e. single-frame or multi-frame picture playing interface) has low operation efficiency but wide coverage.

From the time span: when the time span between different tracking associated segments of the same target is smaller, the operation on a video playing interface is more convenient, and the safety is higher. When the time span is large, splicing is not easy to carry out on the video playing interface at the moment, and higher guarantee is provided for batch small-image interface operation.

From the appearance characteristics of the target: the video playing interface retains more spatiotemporal information, while the bulk thumbnail interface relies more on the apparent characteristics of the target object. When the degree of distinction of the apparent features of the target object is not strong, for example, the difference between local face images of different people and a background image when the light is not good may not be so large, and the method is more suitable for operation in a video playing interface.

The batch small graph interface aiming at the specific target is customized, so that the advantage of solving the problem of the interruption of the tracking of the single target is obvious. Compared with the batch small-image interface of the whole target, the method is easier to find the lost frame; compared with a video playing interface, the method has stronger capability of processing the long-interval heel breaking problem.

The invention provides an image tracking correlation method, which comprises the steps of carrying out frame marking on one or more target objects in a multi-frame picture to obtain one or more corresponding target image frames; marking one or more target image frames belonging to the same target object with the same reference number; and assigning the same color to one or more target image frames belonging to the same target object; and tracking and associating one or more target objects in the multi-frame pictures according to the labels and colors corresponding to the one or more target image frames. For a large number of video images obtained from the fields of security protection, intelligent transportation and the like, frame labeling can be performed on a target object in a single-frame or multi-frame picture in advance through a preset algorithm to obtain a plurality of target image frames. Compared with manual direct labeling, the method has the advantages that the labeling efficiency can be improved and the labeling cost can be reduced by performing pre-labeling through a preset algorithm. Because video images are obtained from the fields of security, intelligent transportation and the like, and frame pre-marking is carried out aiming at the frame images, when all single frame images in the video are pre-marked, the target image frames between the frame images are not related; therefore, the method finds out errors (such as frame loss, heel breaking, heel jumping and other errors) occurring when the associated target object is tracked and modifies the corresponding errors by marking labels and distributing colors for the pre-labeled target image frames and tracking the target objects of all frame images in the associated video image based on the corresponding labels and colors. Meanwhile, the image detection algorithm is trained based on correct labeling of the target object in the video, so that the accuracy of the image detection algorithm in the fields of security, intelligent transportation and the like can be improved.

As shown in fig. 4 to 9, an image tracking correlation system includes:

the pre-labeling module M10 is configured to perform frame labeling on one or more target objects in a multi-frame picture, and acquire one or more corresponding target image frames;

a marking module M20, configured to mark one or more target image frames belonging to the same target object with the same reference number;

an assigning module M30, configured to assign the same color to one or more target image frames belonging to the same target object;

and the tracking association module M40 is used for tracking and associating one or more target objects in the multi-frame pictures according to the labels and colors corresponding to the one or more target image frames.

The system can perform frame marking on a target object in a single-frame or multi-frame picture in advance through a preset algorithm to obtain a plurality of target image frames for a large number of video images obtained from the fields of security protection, intelligent transportation and the like. Compared with manual direct labeling, the method has the advantages that the labeling efficiency can be improved and the labeling cost can be reduced by performing pre-labeling through a preset algorithm. Because video images are obtained from the fields of security, intelligent transportation and the like, and frame pre-marking is carried out aiming at the frame images, when all single frame images in the video are pre-marked, the target image frames between the frame images are not related; therefore, the system finds out errors (such as frame loss, heel breaking, heel jumping and other errors) occurring when the associated target object is tracked and modifies the corresponding errors by marking labels for the pre-labeled target image frames, distributing colors and tracking the target objects of all frame images in the associated video images based on the corresponding labels and colors. Meanwhile, the image detection algorithm is trained based on correct labeling of the target object in the video, so that the accuracy of the image detection algorithm in the fields of security, intelligent transportation and the like can be improved.

In an exemplary embodiment, the method further comprises the steps of obtaining a plurality of target image frames subjected to frame labeling; and determining the membership of the target image frame by taking a certain target image frame as a parent image frame and taking another target image frame subordinate to the parent image frame as a child image frame according to the preset membership. The preset membership in the embodiment of the present application is set according to an actual target source, for example, if the target source is a human, the target object includes at least one of the following: human body, human head, human face; the target image frame includes at least one of: a human body frame, a human head frame and a human face frame. As an example, the preset affiliation may be configured to establish an affiliation between the body frame and the head frame by using the body frame as a parent-level image frame and the head frame as a child-level image frame; and taking the human head frame as a parent-level image frame and the human face frame as a child-level image frame, and establishing the membership between the human head frame and the human face frame. In the embodiment of the application, if the human body frame is taken as a parent-level image frame, the human head frame is taken as a child-level image frame, and the human face frame is taken as a child-level image frame of the human head frame, grandchild association is formed among the human body frame, the human head frame and the human face frame; namely, the human body frame and the human head frame are in a parent-child subordination relationship, and the human head frame and the human face frame are in a parent-child subordination relationship. In the embodiment of the application, the human body frame, the human head frame and the human face frame have unique subrelations, wherein the human head frame is a child node of the human body frame, and the human face frame is a child node of the human head frame; the human body frame of the same person does not have two human head frames as child nodes, and the human head frame of the same person does not have two human face frames as child nodes. In the embodiment of the application, in order to indicate uniqueness among the human body frame, the human head frame and the human face frame, the human body frame, the human head frame and the human face frame of the same person can be set to be the same number. The setting of the number is set according to the personnel in the image picture, and the application is not limited to specific numerical values. For example, the number of the human body frame, the human head frame, and the human face frame corresponding to a certain person in a certain frame of image picture may be set to 23. The vertical direction in fig. 8 and 9 is the parent-child association.

According to the above description, in an exemplary embodiment, if a child image frame belonging to a parent image frame is marked as not belonging to the parent image frame; and/or, marking the sub-level image frame which is not subordinate to a certain parent-level image frame as the image frame which is subordinate to the parent-level image frame; the target image frame with the dependency relationship has an erroneous dependency. As an example, for a certain person in a certain frame of image picture, if the head frame of the person is marked as a body frame not belonging to the person, the head frame of the person is marked as a child node of the body frame not belonging to the person, so that the head frame of the person is not associated with the body frame of the person; or marking the head frames of other persons as the body frames belonging to the person, namely marking the head frames of other persons as the body frames belonging to the person, so that the head frames of other persons are associated with the body frame of the person; the pre-labeled image frame is considered to have an incorrect dependency. As another example, for a person in a certain frame of image picture, if the face frame of the person is labeled as a head frame not belonging to the person, the face frame of the person is labeled as a child node of the head frame not belonging to the person, so that the face frame of the person is not associated with the head frame of the person; or marking the face frames of other persons as the head frames belonging to the person, namely marking the face frames of other persons as the head frames belonging to the person, so that the face frames of other persons are associated with the head frames of the person; the pre-labeled image frame is considered to have an incorrect dependency. If the sub-level image frame belonging to a certain parent-level image frame is marked as not belonging to the parent-level image frame; drawing a curve or straight line to pass through the corresponding sub-level image frame by taking any position in the parent-level image frame as a starting point, creating the association between the parent-level image frame and the corresponding sub-level image frame, and modifying the corresponding error slave. For example, if a person in a certain frame of image screen is marked as a human body frame not belonging to the person, such error marking can be modified by a line-drawing correlation method. Specifically, a straight line or a curve is drawn to pass through the corresponding human head frame by taking any position in the human body frame as a starting point, and the human body frame and the human head frame of the person are automatically associated through a preset algorithm, so that the error marking is modified. The false labeling is modified by a line drawing association method, new association can be quickly created again, a new subordinate complete system is established, and multi-level target objects such as human bodies, human heads and human faces can be associated at the same time.

Due to the fact that a plurality of target sources (people and vehicles) exist in security and intelligent transportation videos, colors and labels of one or more target objects belonging to the same target source are the same. By distributing a label and a color to each target source, target objects (human bodies, human heads, human faces, human key points and the like) with the same label and target image frames (human body frames, human head frames and human face frames) with the same label can all display the same color, so that the target objects can be easily found when matching errors occur. If the colors are randomly distributed by using a random distribution algorithm, for two target objects with similar positions and different labels, a certain probability is distributed to the two similar colors; if the two target objects actually come from the same target source, it is difficult to find this labeling error due to the close colors. In addition, the tracking association between frames is the case, for example, if a person breaks down, the label in this frame is ID1, and the label in the next frame becomes ID2, then this person will be displayed in two colors; so that a mis-tracking correlation occurs when tracking the correlation between frames. The observer cannot find the tracking problem because the random allocation algorithm may allocate two similar colors to the target image frames of the previous and subsequent frames. Therefore, the color is distributed to the target source, the target object and the target image frame by using the space-time minimum maximum difference color distribution algorithm. Specifically, when a new label appears in a current frame picture, acquiring the spatial distances between one or more target image frames corresponding to the new label and all the target image frames; acquiring the time distances between one or more target image frames corresponding to the new label and all the target image frames; and selecting the maximum value of the spatial distance and the temporal distance, and taking the maximum value as the space-time distance between one or more target image frames corresponding to the new label and all the target image frames. Determining color assignment weights based on the spatio-temporal distances; calculating the weighted distance between each candidate color and the distributed color according to the color distribution weight and the pixel distance between all the candidate colors in the color space; taking the calculated minimum weighted distance corresponding to each candidate color as the acceptance degree of the candidate color; and acquiring the adoption degrees of all the candidate colors, and distributing the candidate color with the highest adoption degree to the target source corresponding to the new label, so that one or more target objects belonging to the target source have the same color, and simultaneously, one or more target image frames corresponding to each target object have the same color.

As an example, a three-dimensional discrete color space is first defined, such as R (0-25-255) G (0-25-255) B (0-25-255). Wherein, the middle 25 is a sampling interval, i.e. 25 is a sampling period; by this definition, the discrete color space can represent 1000 colors. In general, such multiple objects do not appear in the image of a single frame and its adjacent frames in the video, and thus the color is sufficient. Secondly, defining the space-time distance between target image frames; in the embodiment of the application, the space-time distance between image frames which come from the same target object and have the same label is 0; the space-time distance between the target image frames of different labels is Max (space distance, time distance). Wherein, if the two image frames are mutually overlapped, the space distance is 0; if there is no overlap, the maximum value of the shortest distance in the X-axis direction/the maximum width of the two image frames and the shortest distance in the Y-axis direction/the maximum height of the two image frames is taken as the spatial distance. The time distance between the two image frames is set by taking seconds as a unit according to the video frame rate; for example, a video frame rate of 30fps (frame Per second), the temporal distance of one frame apart is 1/30. Color assignment weights are then determined based on spatio-temporal distances, such as coef (a, B) ═ sigmoid (dist (a, B)). A, B are all target objects. When a new tracking associated label appears in a certain frame and needs to be assigned with a new color, all candidate colors are searched in a traversal mode in a discrete color space, and the weighted distance between each candidate color and all colors which are currently assigned is calculated, namely the weighted distance is pixel distance cofe (A, B). And selecting the minimum weighted distance as the acceptance degree of the candidate color, acquiring the acceptance degrees of all the candidate colors at the same time, and distributing the candidate color with the highest acceptance degree to the target source corresponding to the new label, so that one or more target objects belonging to the target source have the same color, and simultaneously one or more target image frames corresponding to each target object have the same color. According to the embodiment of the application, two dimensions of time and space are considered at the same time, so that both the slave association error and the tracking association error are easier to discover. In addition, the method is dynamic, only colors are allocated to the current active objects, and the colors of the inactive objects can be recycled within a period of time, so that the color sparsity is ensured. When a label disappears in a certain frame (it may be that the target object disappears in the picture, or the label is broken or omitted), the color of the label is not immediately recovered, but it is used as a memory target, and 30 frames are reserved. The frame number/time of the memory object stays at the last frame before disappearance. If the time distance between the newly appeared target frame and the memory target needs to be calculated, the difference value between the current frame time and the memory target time is calculated. When a new labeled target object appears at the same position in a short time after a certain label disappears, the situation may be a broken heel, and the space-time distance between the new target and the memory target is smaller because of the close space position and the short time interval. And a small space-time distance brings about a large color difference, and the large color difference is easier to find corresponding errors.

In an exemplary embodiment, the method further comprises displaying a trajectory line tracking the associated one or more target objects in a display interface of the one or more frames of pictures; determining whether a mis-tracking correlation exists according to the trajectory line; and modifying the corresponding error tracking association according to the trajectory line. In the embodiment of the application, although most of fleeing heels and a part of broken heels can be modified through the display interface of the target object picture sample library, the error tracking association can also be modified efficiently; however, it sacrifices scene information, spatial information; therefore, for part missing and new fleeing problems caused by accidental misoperation, the user needs to return to the video playing interface for supplement. For example, by displaying the trajectory of the target, the missing (frame loss), the broken (heel break), and the abnormal route (heel jump) of the trajectory can be found. Specifically, the displayed trajectory lines include a forward trajectory line and a reverse trajectory line; and traversing the target object forwards (or backwards) by taking the upper left vertex of the target image frame of the current frame as a starting point. As the before pointer and the after pointer are added among the target objects with the tracking incidence relation, the target objects with the same label and the same type can be traversed easily according to the time sequence. And drawing the position of the top left vertex of the target in a point form, wherein the color of the top left vertex is the same as that of the target object of the current frame. Drawing a connecting line between the corresponding top left vertex positions in the two frames of pictures, wherein if the two frames of pictures are continuous frames, the color of the connecting line is the same as that of the target object; if not a consecutive frame (there is a missing frame), the color of the link segment is black. Meanwhile, if the trajectory line ends at the edge position of the picture, it may be that the target object naturally exits the picture (the target object really disappears); but if the trajectory line ends in the middle region of the image, it may be that the target has broken its heel; if the trajectory suddenly makes an unreasonable turn at a certain position, it may be that heel-skipping has occurred. Therefore, the embodiment of the application determines whether a tracking correlation error exists or not through the trace line, and the position of the frame loss can be seen. By observing the abnormal trajectory line, the mouse can be moved to the position near a certain point on the trajectory line, the point is selected by a key, and the corresponding position is quickly positioned for viewing and modifying. Among them, the horizontal direction in fig. 8 and 9 is the tracking association of the inter picture.

In an exemplary embodiment, the method further comprises the steps of obtaining one or more target image frames subjected to frame labeling; modifying the target image frame with errors or overlapping based on the display interface of one or more frames of images; wherein the error comprises at least one of: super frame, invade frame. If the two image frames have parent-child subordination, wherein the frame-shaped area of the child image frame is not completely positioned in the frame-shaped area of the parent image frame, and a dotted line is connected between the key point of the child image frame and the key point of the parent image frame; the pre-labeled image frame has a super frame. As an example, as shown in fig. 4, if the head frame 10 of the person X and the corresponding body frame 20 are correctly labeled, but the frame-shaped area of the head frame 10 is not completely located within the frame-shaped area 20 of the body frame, and a dotted line is connected between the vertex of the head frame 10 and the corresponding body frame 20, it is considered that the image frame after the pre-labeling has a super frame. As another example, as shown in fig. 5, if the face frame 30 of the person X and the corresponding head frame 10 are correctly labeled, but the frame-shaped area of the face frame 30 is not completely located within the frame-shaped area 10 of the head frame, and a dotted line is connected between the vertex of the face frame 30 and the corresponding head frame 10, it is determined that the pre-labeled image frame has a super-frame. The method and the device for modifying the super-frame can modify the super-frame through a vertex dragging method, specifically, a moving cursor is moved to the position near a key point of a parent-level image frame or a child-level image frame through moving a mouse, a left button of the mouse is pressed to directly drag the key point of the parent-level image frame or the key point of the child-level image frame, and a frame-shaped area of the parent-level image frame or the child-level image frame is changed, so that the frame-shaped area of the child-level image frame is completely located in the frame-shaped area of the parent-level image frame. However, the conventional system generally needs to select the corresponding parent-level image frame or child-level image frame first, and then drag the vertex of the selected parent-level image frame, which is a more time-consuming step. Compared with the traditional system, the method and the system can reduce one operation step and save corresponding time. If a super-box occurs, the super-box may be modified by anchor point modification. Specifically, when the anchor point is located outside the image frame, the left mouse button is pressed, a button instruction is sent, a corresponding button instruction is responded, and the edge corresponding to the image frame is directly expanded to the point. When the anchor point is located inside the image frame, only one of the edges can be selected to shrink to the anchor point, specifically, the edge with the smallest area loss is selected. If one of the four shortcut keys, up, down, left and right, is pressed at the same time, one of the edges, up, down, left and right, is designated to shrink to the anchor point. In general, the frame shape modification by the anchor point method is more efficient than the frame shape modification by the vertex dragging method; and in the operation-intensive frame marking/modifying task, the anchor point method has more advantages than the top-supporting point method.

Based on the above description, the coverage areas of the embodiments of the present application when modifying missing frames, jumping heels, short interval broken heels, and long interval broken heels are shown in table 3.

TABLE 3 modification of coverage during missing frames, fleeing heels, short interval broken heels, long interval broken heels

Based on the above description, the operation efficiency of the embodiment of the present application in modifying the missing frame, the jumping heel, the short interval broken heel and the long interval broken heel is shown in table 4.

TABLE 4 efficiency of operations in the case of lost frames, fleeing heels, short interval broken heels, long interval broken heels

The invention provides an image tracking correlation system, which is characterized in that one or more target objects in a multi-frame picture are subjected to frame marking to obtain one or more corresponding target image frames; marking one or more target image frames belonging to the same target object with the same reference number; and assigning the same color to one or more target image frames belonging to the same target object; and tracking and associating one or more target objects in the multi-frame pictures according to the labels and colors corresponding to the one or more target image frames. For a large number of video images obtained from the fields of security protection, intelligent transportation and the like, frame labeling can be performed on a target object in a single-frame or multi-frame picture in advance through a preset algorithm to obtain a plurality of target image frames. Compared with manual direct labeling, the method has the advantages that the labeling efficiency can be improved and the labeling cost can be reduced by performing pre-labeling through a preset algorithm. Because video images are obtained from the fields of security, intelligent transportation and the like, and frame pre-marking is carried out aiming at the frame images, when all single frame images in the video are pre-marked, the target image frames between the frame images are not related; therefore, the system finds out errors (such as frame loss, heel breaking, heel jumping and other errors) occurring when the associated target object is tracked and modifies the corresponding errors by marking labels for the pre-labeled target image frames, distributing colors and tracking the target objects of all frame images in the associated video images based on the corresponding labels and colors. Meanwhile, the image detection algorithm is trained based on correct labeling of the target object in the video, so that the accuracy of the image detection algorithm in the fields of security, intelligent transportation and the like can be improved.

The embodiment of the present application further provides an image tracking related device, which includes:

In this embodiment, the image tracking related device executes the system or the method, and specific functions and technical effects may refer to the above embodiments, which are not described herein again.

An embodiment of the present application further provides an apparatus, which may include: one or more processors; and one or more machine readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform the method of fig. 1. In practical applications, the device may be used as a terminal device, and may also be used as a server, where examples of the terminal device may include: the mobile terminal includes a smart phone, a tablet computer, an electronic book reader, an MP3 (Moving Picture Experts Group Audio Layer III) player, an MP4 (Moving Picture Experts Group Audio Layer IV) player, a laptop, a vehicle-mounted computer, a desktop computer, a set-top box, an intelligent television, a wearable device, and the like.

Embodiments of the present application also provide a non-transitory readable storage medium, where one or more modules (programs) are stored in the storage medium, and when the one or more modules are applied to a device, the device may execute instructions (instructions) included in the method in fig. 1 according to the embodiments of the present application.

Fig. 10 is a schematic diagram of a hardware structure of a terminal device according to an embodiment of the present application. As shown, the terminal device may include: an input device 1100, a first processor 1101, an output device 1102, a first memory 1103, and at least one communication bus 1104. The communication bus 1104 is used to implement communication connections between the elements. The first memory 1103 may include a high-speed RAM memory, and may also include a non-volatile storage NVM, such as at least one disk memory, and the first memory 1103 may store various programs for performing various processing functions and implementing the method steps of the present embodiment.

Alternatively, the first processor 1101 may be, for example, a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, a microprocessor, or other electronic components, and the first processor 1101 is coupled to the input device 1100 and the output device 1102 through a wired or wireless connection.

Optionally, the input device 1100 may include a variety of input devices, such as at least one of a user-oriented user interface, a device-oriented device interface, a software programmable interface, a camera, and a sensor. Optionally, the device interface facing the device may be a wired interface for data transmission between devices, or may be a hardware plug-in interface (e.g., a USB interface, a serial port, etc.) for data transmission between devices; optionally, the user-facing user interface may be, for example, a user-facing control key, a voice input device for receiving voice input, and a touch sensing device (e.g., a touch screen with a touch sensing function, a touch pad, etc.) for receiving user touch input; optionally, the programmable interface of the software may be, for example, an entry for a user to edit or modify a program, such as an input pin interface or an input interface of a chip; the output devices 1102 may include output devices such as a display, audio, and the like.

In this embodiment, the processor of the terminal device includes a function for executing each module of the speech recognition apparatus in each device, and specific functions and technical effects may refer to the above embodiments, which are not described herein again.

Fig. 11 is a schematic hardware structure diagram of a terminal device according to an embodiment of the present application. FIG. 11 is a specific embodiment of the implementation of FIG. 10. As shown, the terminal device of the present embodiment may include a second processor 1201 and a second memory 1202.

The second processor 1201 executes the computer program code stored in the second memory 1202 to implement the method described in fig. 1 in the above embodiment.

The second memory 1202 is configured to store various types of data to support operations at the terminal device. Examples of such data include instructions for any application or method operating on the terminal device, such as messages, pictures, videos, and so forth. The second memory 1202 may include a Random Access Memory (RAM) and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.

Optionally, a second processor 1201 is provided in the processing assembly 1200. The terminal device may further include: communication component 1203, power component 1204, multimedia component 1205, speech component 1206, input/output interfaces 1207, and/or sensor component 1208. The specific components included in the terminal device are set according to actual requirements, which is not limited in this embodiment.

The processing component 1200 generally controls the overall operation of the terminal device. The processing assembly 1200 may include one or more second processors 1201 to execute instructions to perform all or part of the steps of the data processing method described above. Further, the processing component 1200 can include one or more modules that facilitate interaction between the processing component 1200 and other components. For example, the processing component 1200 can include a multimedia module to facilitate interaction between the multimedia component 1205 and the processing component 1200.

The power supply component 1204 provides power to the various components of the terminal device. The power components 1204 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the terminal device.

The multimedia components 1205 include a display screen that provides an output interface between the terminal device and the user. In some embodiments, the display screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the display screen includes a touch panel, the display screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.

The voice component 1206 is configured to output and/or input voice signals. For example, the voice component 1206 includes a Microphone (MIC) configured to receive external voice signals when the terminal device is in an operational mode, such as a voice recognition mode. The received speech signal may further be stored in the second memory 1202 or transmitted via the communication component 1203. In some embodiments, the speech component 1206 further comprises a speaker for outputting speech signals.

The input/output interface 1207 provides an interface between the processing component 1200 and peripheral interface modules, which may be click wheels, buttons, etc. These buttons may include, but are not limited to: a volume button, a start button, and a lock button.

The sensor component 1208 includes one or more sensors for providing various aspects of status assessment for the terminal device. For example, the sensor component 1208 may detect an open/closed state of the terminal device, relative positioning of the components, presence or absence of user contact with the terminal device. The sensor assembly 1208 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact, including detecting the distance between the user and the terminal device. In some embodiments, the sensor assembly 1208 may also include a camera or the like.

The communication component 1203 is configured to facilitate communications between the terminal device and other devices in a wired or wireless manner. The terminal device may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In one embodiment, the terminal device may include a SIM card slot therein for inserting a SIM card therein, so that the terminal device may log onto a GPRS network to establish communication with the server via the internet.

From the above, the communication component 1203, the voice component 1206, the input/output interface 1207 and the sensor component 1208 involved in the embodiment of fig. 11 can be implemented as the input device in the embodiment of fig. 10.

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims

1. An image tracking correlation method is characterized by comprising the following steps:

tracking and associating one or more target objects in the multi-frame pictures according to the labels and colors corresponding to the one or more target image frames;

assigning the same color to one or more target image frames belonging to the same target object, comprising:

determining color assignment weights based on the spatiotemporal distances;

2. The image tracking correlation method of claim 1, further comprising:

3. The image tracking correlation method according to claim 2, wherein if there is a wrong tracking correlation, the recognition result comprises at least one of:

4. The image tracking correlation method of claim 3, further comprising:

acquiring a plurality of target image frames subjected to frame labeling;

5. The image tracking correlation method according to claim 4, wherein if a child image frame belonging to a parent image frame is marked as not belonging to the parent image frame; and/or, marking the sub-level image frame which is not subordinate to a certain parent-level image frame as the image frame which is subordinate to the parent-level image frame; the target image frame with the dependency relationship has an erroneous dependency.

6. The image tracking correlation method according to claim 1, further comprising, when a new label appears in the current frame picture:

7. The image tracking correlation method of claim 3, further comprising:

8. The method according to claim 7, wherein if the same target frame in the current frame picture has a different label from the adjacent one or more frames of pictures, and the label of the target frame in the current frame picture exists in the adjacent one or more frames of pictures; modifying the corresponding error tracking association based on the display interface of the target object picture sample library;

9. The method according to claim 7, wherein if the same target frame in the current frame picture has a different label from the adjacent one or more frames of pictures, and the label of the target frame in the current frame picture does not exist in the adjacent one or more frames of pictures; modifying the corresponding error tracking association based on the display interface of the target object picture sample library;

10. The image tracking correlation method according to claim 7, wherein a label is established based on a display interface of the target object picture sample library or a display interface of the one or more frames of pictures; the label includes at least one of: target object type, picture frame bit, key frame.

11. The image tracking correlation method according to claim 1, wherein when performing frame labeling on one or more target objects in a multi-frame picture, further comprising adding correlation pointers to the one or more target objects; the association pointer includes at least one of: a pointer to a parent image frame, a pointer to a child image frame, a pointer to a previous tracking associated target object, and a pointer to a next tracking associated target object.

12. The image tracking correlation method of claim 7, further comprising:

13. The image tracking correlation method of claim 12, further comprising:

14. The image tracking correlation method of claim 12, further comprising:

15. The image tracking correlation method according to any one of claims 1 to 14, further comprising:

acquiring one or more target image frames subjected to frame labeling;

modifying the target image frame with errors or overlapping based on the display interface of one or more frames of images; wherein the error comprises at least one of: super frame, invade frame.

16. The image tracking correlation method of claim 15, further comprising:

17. The image tracking correlation method of any one of claims 1 to 14, wherein the target object comprises at least one of: human body, human head, human face;

18. An image tracking correlation system, comprising:

the tracking association module is used for tracking and associating one or more target objects in the multi-frame pictures according to the labels and colors corresponding to the one or more target image frames;

determining color assignment weights based on the spatiotemporal distances;

19. The image tracking correlation system of claim 18, further comprising:

20. The image tracking correlation system of claim 19, wherein if there is a wrong tracking correlation, the recognition result comprises at least one of:

21. The image tracking correlation system of claim 20, further comprising:

acquiring a plurality of target image frames subjected to frame labeling;

22. The image tracking correlation system of claim 21, wherein if a child image frame belonging to a parent image frame is marked as not belonging to the parent image frame; and/or, marking the sub-level image frame which is not subordinate to a certain parent-level image frame as the image frame which is subordinate to the parent-level image frame; the target image frame with the dependency relationship has an erroneous dependency.

23. The image tracking correlation system of claim 19, further comprising, when a new label appears in the current frame picture:

24. The image tracking correlation system of claim 20, further comprising:

25. The system according to claim 23, wherein if the identification result indicates that the label of the same target frame in the current frame picture is different from the label of the adjacent one or more frames of pictures, the label of the target frame in the current frame picture exists in the adjacent one or more frames of pictures; modifying the corresponding error tracking association based on the display interface of the target object picture sample library;

26. The system according to claim 24, wherein if the identification result indicates that the label of the same target frame in the current frame picture is different from the label of the adjacent one or more frames of pictures, the label of the target frame in the current frame picture is not present in the adjacent one or more frames of pictures; modifying the corresponding error tracking association based on the display interface of the target object picture sample library;

27. The image tracking correlation system of claim 24, wherein a label is further established based on a display interface of the sample library of target object pictures or a display interface of the one or more frames of pictures; the label includes at least one of: target object type, picture frame bit, key frame.

28. The image tracking correlation system according to claim 18, further comprising adding correlation pointers to one or more target objects in the multi-frame pictures when performing frame labeling on the one or more target objects; the association pointer includes at least one of: a pointer to a parent image frame, a pointer to a child image frame, a pointer to a previous tracking associated target object, and a pointer to a next tracking associated target object.

29. The image tracking correlation system of claim 18, further comprising:

displaying a trajectory line tracking and associated with the one or more target objects in a display interface of one or more frames of pictures;

30. The image tracking correlation system of claim 29, further comprising:

31. The image tracking correlation system of claim 29, further comprising:

32. The image tracking correlation system of any of claims 18 to 31, further comprising:

acquiring one or more target image frames subjected to frame labeling;

33. The image tracking correlation system of claim 32, further comprising:

34. The image tracking correlation system of any of claims 18 to 31, wherein the target object comprises at least one of: human body, human head, human face;

35. An image tracking correlation device, comprising:

one or more processors; and

one or more machine-readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform the method of any of claims 1-17.

36. One or more machine-readable media having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform the method of any of claims 1-17.