CN113643324B

CN113643324B - Target association method and device

Info

Publication number: CN113643324B
Application number: CN202010369446.7A
Authority: CN
Inventors: 叶佩; 甘茂霖; 董维山
Original assignee: Momenta Suzhou Technology Co Ltd
Current assignee: Momenta Suzhou Technology Co Ltd
Priority date: 2020-04-27
Filing date: 2020-04-27
Publication date: 2022-12-23
Anticipated expiration: 2040-04-27
Also published as: WO2021217777A1; CN113643324A; DE112020003171T5

Abstract

The embodiment of the invention discloses a target association method and a target association device, wherein the method comprises the following steps: obtaining a current frame to be marked and a first video frame corresponding to the current frame to be marked; obtaining recommendation frames which have suspected association relation with a current frame to be annotated and a second video frame corresponding to each recommendation frame; displaying a first video frame containing a current frame to be annotated and a second video frame containing a recommendation frame corresponding to each recommendation frame; and determining whether a target recommending frame which has an association relation with the current frame to be marked exists in all recommending frames based on the first association relation operation information of the user, and establishing the association relation between the current frame to be marked and the target recommending frame under the condition of determining that the target recommending frame which has the association relation with the current frame to be marked exists so as to realize convenient and effective association of the same target in each frame of video frame of the video.

Description

Target association method and device

Technical Field

The invention relates to the technical field of target tracking, in particular to a target association method and a target association device.

Background

At present, a target object included in each frame of video frame in a video to be detected can be automatically detected based on a tracking algorithm for target detection, and an object included in a previous frame of video frame and the target object are related.

However, in the process of tracking and detecting each target object in a video to be detected based on a tracking algorithm for target detection, missed detection or false detection often occurs, for example: aiming at a target object which disappears and reappears in a video to be detected, detecting the target object which disappears and reappears as different objects based on a tracking algorithm of target detection, namely identifying the target object to be different before disappearance and after reappearance; or a case where the same target object appearing successively is recognized as a different target object occurs.

For the above situation, the detection result obtained by the tracking algorithm based on the target detection needs to be checked and calibrated manually. Therefore, how to provide a convenient and effective method for checking and calibrating the condition of missing detection or false detection in the detection result of the video to be detected, and correcting the condition becomes a problem to be solved urgently.

Disclosure of Invention

The invention provides a target association method and a target association device, which are used for realizing convenient and effective association of the same target appearing in each frame of video frame of a video. The specific technical scheme is as follows:

in a first aspect, an embodiment of the present invention provides a target association method, where the method includes:

obtaining a current frame to be marked and a first video frame corresponding to the current frame to be marked, wherein the first video frame corresponding to the current frame to be marked is as follows: the detected video frames which contain the target corresponding to the current frame to be marked and meet the first screening condition in the video to be marked;

obtaining recommendation frames having a suspected association relation with the current frame to be labeled and a second video frame corresponding to each recommendation frame, wherein the second video frame corresponding to each recommendation frame is as follows: the detected video to be annotated comprises video frames of the target corresponding to the recommendation frame and meeting a second screening condition;

displaying the first video frame containing the current frame to be annotated and the second video frame containing the recommendation frame corresponding to each recommendation frame, so that a user can determine a target recommendation frame having an association relation with the current frame to be annotated from the displayed recommendation frames based on the second video frame corresponding to each recommendation frame;

determining whether a target recommendation frame having an association relation with the current frame to be labeled exists in all recommendation frames based on first association relation operation information of a user, and establishing the association relation between the current frame to be labeled and the target recommendation frame under the condition that the target recommendation frame having the association relation with the current frame to be labeled is determined to exist.

Optionally, the video frames that include the target corresponding to the current frame to be annotated and satisfy the first screening condition in the detected video to be annotated are: the last frame of video frame which contains the target corresponding to the current frame to be marked in the detected video to be marked;

the detected video frames which contain the target corresponding to the recommendation frame and meet the second screening condition in the video to be annotated are as follows: and the detected video to be annotated comprises a first frame video frame of a target corresponding to the recommendation frame.

Optionally, the step of obtaining the current frame to be labeled in the video to be labeled is implemented by any one of the following two implementation manners:

the first implementation mode comprises the following steps:

determining an unmarked frame with earliest time for representing and acquiring timestamp information corresponding to the corresponding tail frame video frame from the unmarked frames corresponding to the video to be marked as the current frame to be marked;

the second implementation mode comprises the following steps:

and determining the recommendation frame selected by the user as the current frame to be labeled under the condition that the user selects the recommendation frame having the association relation with the displayed history frame to be labeled from the corresponding displayed recommendation frames.

Optionally, the step of obtaining the recommended frames that have a suspected association relationship with the current frame to be labeled and the second video frame corresponding to each recommended frame includes:

traversing the pre-labeling result corresponding to the N frames of video frames after the last frame of video frame corresponding to the current frame to be labeled, and determining the detection frame with the corresponding frame identifier meeting the preset recommendation condition from the detection frames corresponding to the N frames of video frames after the last frame of video frame corresponding to the current frame to be labeled as the recommendation frame with the suspected association relation with the current frame to be labeled, wherein the detection frame with the corresponding frame identifier meeting the preset recommendation condition comprises: the corresponding frame identification information does not appear in the last frame video frame and the previous frame corresponding to the current frame to be marked;

for each recommended frame which has a suspected association relationship with the current frame to be annotated, determining a video frame which contains a target corresponding to the recommended frame and meets the second screening condition in the N frames of video frames after the last frame of video frame corresponding to the current frame to be annotated as a second video frame corresponding to the recommended frame, wherein the video frame which contains the target corresponding to the recommended frame and meets the second screening condition comprises: and the detected video to be annotated comprises the first frame video frame of the target corresponding to the recommendation frame.

Optionally, before the step of obtaining the recommended frames that have the suspected association relationship with the current frame to be labeled and the second video frame corresponding to each recommended frame, the method further includes:

judging whether a display function of displaying a recommendation frame corresponding to a current frame to be marked is started or not;

if the video frame is started, executing the step of obtaining the recommendation frames which have suspected association relation with the current frame to be marked and a second video frame corresponding to each recommendation frame;

if the current frame to be marked is not started, obtaining M frames of video frames behind the end frame video frame corresponding to the current frame to be marked in the video to be marked as the video frame to be played corresponding to the current frame to be marked, wherein the end frame video frame corresponding to the current frame to be marked is: detecting a tail frame video frame containing a target corresponding to the current frame to be marked in the video to be marked;

displaying the first video frame containing the current frame to be marked, and playing the video frame to be played corresponding to the current frame to be marked so as to enable a user to determine a target detection frame having an association relation with the current frame to be marked from detection frames included in the video frame to be played corresponding to the played current frame to be marked;

determining whether a target detection frame having an association relation with the current frame to be labeled exists in a labeling frame included in a video frame to be played corresponding to the current frame to be labeled based on second association relation operation information of a user, and establishing an association relation between the current frame to be labeled and the target detection frame under the condition that the target detection frame having the association relation with the current frame to be labeled is determined to exist.

Optionally, before the step of determining, based on the first association relationship operation information of the user, whether a target recommendation frame having an association relationship with the current frame to be labeled exists in all recommendation frames, and establishing an association relationship between the current frame to be labeled and the target recommendation frame when it is determined that the target recommendation frame having an association relationship with the current frame to be labeled exists, the method further includes:

under the condition that a first amplification instruction triggered by a user aiming at a first recommended frame having a suspected association relation with the current frame to be annotated is detected, amplifying and displaying the first recommended frame and a second video frame corresponding to the first recommended frame, wherein the triggering operation of the first amplification instruction comprises the following steps: right clicking a second video frame corresponding to the first recommendation frame; and/or

And under the condition that a second amplification instruction triggered by the user aiming at the first video frame corresponding to the current frame to be marked is detected, amplifying and displaying the first video frame corresponding to the current frame to be marked.

Optionally, after the step of determining, based on the first association relationship operation information of the user, whether a target recommendation frame having an association relationship with the current frame to be annotated exists in all recommendation frames, and establishing an association relationship between the current frame to be annotated and the target recommendation frame under the condition that it is determined that the target recommendation frame having an association relationship with the current frame to be annotated exists, the method further includes:

determining the target recommendation frame as a new current frame to be marked;

determining a detected video frame which contains a target corresponding to the new current frame to be labeled and meets a first screening condition from the video to be labeled as a first video frame corresponding to the new current frame to be labeled;

acquiring new recommended frames which have suspected association relation with the new current frame to be annotated and a second video frame corresponding to each new recommended frame, wherein the second video frame corresponding to each new recommended frame is as follows: the detected video to be annotated comprises video frames of the target corresponding to the new recommendation frame and meeting a second screening condition;

displaying the first video frame containing the new current frame to be annotated and a second video frame corresponding to each new recommended frame and containing the new recommended frame, so that a user can determine a new target recommended frame which has an association relation with the new current frame to be annotated from the displayed new recommended frames based on the second video frame corresponding to each new recommended frame;

and if a skip instruction for skipping to the previous section is detected by the user trigger instruction, displaying the first video frame containing the current frame to be marked and a second video frame corresponding to each recommended frame and containing the recommended frame.

Optionally, before the step of displaying the first video frame including the current frame to be annotated and the second video frame including the recommended frame corresponding to each recommended frame, the method further includes:

acquiring frame number information of a first video frame corresponding to the current frame to be marked and frame identification information corresponding to the current frame to be marked;

acquiring frame number information corresponding to a second video frame corresponding to each recommended frame having a suspected association relation with the current frame to be marked and frame identification information corresponding to each recommended frame;

the step of displaying the first video frame containing the current frame to be annotated and the second video frame containing the recommended frame corresponding to each recommended frame comprises the following steps:

displaying frame number information of the first video frame containing the current frame to be marked and the first video frame corresponding to the current frame to be marked and frame identification information corresponding to the current frame to be marked in a first display area of a first preset display interface;

and displaying a second video frame containing the recommended frame and frame number information corresponding to each recommended frame corresponding to the current frame to be marked having a suspected association relation with the recommended frame and frame identification information corresponding to each recommended frame in a second display area of the first preset display interface.

Optionally, the third display area of the first preset display interface further includes a first trigger area for instructing to jump to a jump instruction of a previous segment, and a second trigger area for instructing to jump to a jump instruction of a next segment, where the jump instruction instructing to jump to the previous segment is: and indicating and displaying a first video frame corresponding to a frame to be marked which is previous to the current frame to be marked, wherein the first video frame corresponding to the frame to be marked which is previous to the current frame to be marked: the video frames which comprise targets corresponding to a frame to be marked before the current frame to be marked and meet a first screening condition in the detected video to be marked are detected; the jump instruction for instructing to jump to the next section is as follows: and indicating and displaying a first video frame corresponding to a frame to be marked next to the current frame to be marked, wherein the frame to be marked next to the current frame to be marked is: and in the unmarked frames corresponding to the video to be marked, the timestamp information corresponding to the corresponding tail frame video frame represents the unmarked frame with the earliest acquisition time and the later acquisition time than the acquisition time of the current frame to be marked.

In a second aspect, an embodiment of the present invention provides a target association apparatus, where the apparatus includes:

a first obtaining module, configured to obtain a current frame to be labeled and a first video frame corresponding to the current frame to be labeled, where the first video frame corresponding to the current frame to be labeled is: the detected video frames which contain the target corresponding to the current frame to be labeled and meet the first screening condition in the video to be labeled;

a second obtaining module, configured to obtain recommendation frames having a suspected association relationship with the current frame to be annotated and a second video frame corresponding to each recommendation frame, where the second video frame corresponding to each recommendation frame is: the detected video to be annotated comprises video frames of the target corresponding to the recommendation frame and meeting a second screening condition;

the first display module is configured to display the first video frame containing the current frame to be annotated and a second video frame corresponding to each recommendation frame and containing the recommendation frame, so that a user can determine a target recommendation frame having an association relation with the current frame to be annotated from the displayed recommendation frames based on the second video frame corresponding to each recommendation frame;

the first determination and establishment module is configured to determine whether a target recommendation frame having an association relationship with the current frame to be annotated exists in all recommendation frames based on first association relationship operation information of a user, and establish the association relationship between the current frame to be annotated and the target recommendation frame under the condition that the target recommendation frame having the association relationship with the current frame to be annotated is determined to exist.

Optionally, the video frames, which include the target corresponding to the current frame to be annotated and satisfy the first filtering condition, in the detected video to be annotated are: the last frame of video frame of the target corresponding to the current frame to be marked is contained in the detected video to be marked;

Optionally, the first obtaining module is specifically configured to:

or is specifically configured to

Optionally, the second obtaining module is specifically configured to traverse the pre-labeling result corresponding to the N frames of video frames after the last frame of video frame corresponding to the current frame to be labeled, and determine, from the detection frames corresponding to the N frames of video frames after the last frame of video frame corresponding to the current frame to be labeled, a detection frame whose corresponding frame identifier meets a preset recommendation condition as a recommendation frame having a suspected association relationship with the current frame to be labeled, where the detection frame whose corresponding frame identifier meets the preset recommendation condition includes: the corresponding frame identification information does not appear in the last frame video frame and the previous frame corresponding to the current frame to be marked;

for each recommended frame which has a suspected association relationship with the current frame to be annotated, determining a video frame which contains a target corresponding to the recommended frame and meets the second screening condition in the N frames of video frames after the last frame of video frame corresponding to the current frame to be annotated as a second video frame corresponding to the recommended frame, wherein the video frame which contains the target corresponding to the recommended frame and meets the second screening condition comprises: and the detected video to be annotated comprises a first frame video frame of a target corresponding to the recommendation frame.

Optionally, the apparatus further comprises:

the judging module is configured to judge whether a display function of displaying the recommending frame corresponding to the current frame to be marked is started or not before the recommending frame which is suspected to be associated with the current frame to be marked and the second video frame corresponding to each recommending frame are obtained;

if the second acquisition module is started, triggering the second acquisition module;

a third obtaining module, configured to, if not started, obtain, as a to-be-played video frame corresponding to the current to-be-annotated frame, an M-frame video frame after a last-frame video frame corresponding to the current to-be-annotated frame in the to-be-annotated video, where the last-frame video frame corresponding to the current to-be-annotated frame is: detecting a tail frame video frame containing a target corresponding to the current frame to be marked in the video to be marked;

the second display module is configured to display the first video frame containing the current frame to be marked, and play the video frame to be played corresponding to the current frame to be marked, so that a user can determine a target detection frame having an association relation with the current frame to be marked from detection frames included in the video frame to be played corresponding to the played current frame to be marked;

a second determining and establishing module, configured to determine whether a target detection frame having an association relationship with the current frame to be annotated exists in annotation frames included in the video frame to be played corresponding to the played current frame to be annotated based on second association relationship operation information of the user, and establish an association relationship between the current frame to be annotated and the target detection frame under the condition that the target detection frame having the association relationship with the current frame to be annotated is determined to exist.

Optionally, the apparatus further comprises:

the magnifying display module is configured to determine whether a target recommendation frame having an association relationship with the current frame to be annotated exists in all recommendation frames based on first association relationship operation information of the user, and under the condition that the target recommendation frame having the association relationship with the current frame to be annotated exists, before the association relationship between the current frame to be annotated and the target recommendation frame is established, under the condition that a first magnifying instruction triggered by the user for a first recommendation frame having a suspected association relationship with the current frame to be annotated is detected, magnify and display the first recommendation frame and a second video frame corresponding to the first recommendation frame, wherein the triggering operation of the first magnifying instruction includes: right clicking a second video frame corresponding to the first recommendation frame; and/or

Optionally, the apparatus further comprises:

a first determining module, configured to determine, in the first association relationship operation information based on the user, whether a target recommendation frame having an association relationship with the current frame to be labeled exists in all recommendation frames, and in a case that it is determined that the target recommendation frame having the association relationship with the current frame to be labeled exists, after establishing an association relationship between the current frame to be labeled and the target recommendation frame, determine the target recommendation frame as a new current frame to be labeled;

a second determining module, configured to determine, from the video to be annotated, a detected video frame that includes a target corresponding to the new current frame to be annotated and that satisfies a first screening condition, as a first video frame corresponding to the new current frame to be annotated;

a fourth obtaining module, configured to obtain a new recommended frame having a suspected association relationship with the new current frame to be annotated and a second video frame corresponding to each new recommended frame, where the second video frame corresponding to each new recommended frame is: the detected video to be annotated comprises video frames of the target corresponding to the new recommendation frame and meeting a second screening condition;

the third display module is configured to display the first video frame containing the new current frame to be annotated and a second video frame corresponding to each new recommendation frame and containing the new recommendation frame, so that a user can determine a new target recommendation frame which has an association relation with the new current frame to be annotated from the displayed new recommendation frames based on the second video frame corresponding to each new recommendation frame;

and the fourth display module is configured to display the first video frames containing the current frames to be marked and the second video frames containing the recommended frames corresponding to each recommended frame if the user is detected to trigger a jump instruction indicating to jump to the previous section.

Optionally, the apparatus further comprises:

a fifth obtaining module, configured to obtain frame number information of a first video frame corresponding to the current frame to be marked and frame identification information corresponding to the current frame to be marked before displaying the first video frame including the current frame to be marked and a second video frame including the recommended frame corresponding to each recommended frame;

a sixth obtaining module, configured to obtain frame number information corresponding to a second video frame corresponding to each recommended frame having a suspected association relationship with the current frame to be labeled and frame identification information corresponding to each recommended frame;

the first display module is specifically configured to display, in a first display area of a first preset display interface, frame number information of the first video frame including the current frame to be marked and the first video frame corresponding to the current frame to be marked and frame identification information corresponding to the current frame to be marked;

Optionally, the third display area of the first preset display interface further includes a first trigger area for instructing to jump to a jump instruction of the previous segment, and a second trigger area for instructing to jump to a jump instruction of the next segment, where the jump instruction instructing to jump to the previous segment is: and indicating an instruction for displaying a first video frame corresponding to a frame to be marked which is one frame before the current frame to be marked, wherein the first video frame corresponding to the frame to be marked which is one frame before the current frame to be marked: the video frames which are detected to contain the target corresponding to the frame to be labeled before the current frame to be labeled and meet the first screening condition in the video to be labeled; the jump instruction for instructing to jump to the next section is as follows: and indicating and displaying a first video frame corresponding to a frame to be marked next to the current frame to be marked, wherein the frame to be marked next to the current frame to be marked is: and in the unmarked frames corresponding to the video to be marked, the timestamp information corresponding to the corresponding tail frame video frame represents the unmarked frame with the earliest acquisition time and the later acquisition time than the acquisition time of the current frame to be marked.

As can be seen from the above, the target association method and apparatus provided in the embodiments of the present invention obtain a current frame to be labeled and a first video frame corresponding to the current frame to be labeled, where the first video frame corresponding to the current frame to be labeled is: the detected video frames which contain the target corresponding to the current frame to be labeled and meet the first screening condition in the video to be labeled; obtaining recommendation frames which have a suspected association relation with a current frame to be annotated and a second video frame corresponding to each recommendation frame, wherein the second video frame corresponding to each recommendation frame is as follows: the detected video to be annotated comprises video frames of the target corresponding to the recommendation frame and meeting a second screening condition; displaying a first video frame containing a current frame to be marked and a second video frame corresponding to each recommendation frame and containing the recommendation frame, so that a user can determine a target recommendation frame which has an association relation with the current frame to be marked from the displayed recommendation frames based on the second video frame corresponding to each recommendation frame; and determining whether a target recommendation frame with an association relation with the current frame to be marked exists in all recommendation frames based on the first association relation operation information of the user, and establishing the association relation between the current frame to be marked and the target recommendation frame under the condition that the target recommendation frame with the association relation with the current frame to be marked exists.

By applying the embodiment of the invention, the current frame to be marked, the corresponding first video frame of the current frame to be marked, the recommendation frame which has a suspected association relationship with the current frame to be marked and the second video frame corresponding to each recommendation frame can be obtained and displayed, so that the association of the frames suspected of the same target in the video frames to be marked on the image level can be realized, the recommendation frame which has the suspected association relationship with the current frame to be marked and the second video frame corresponding to each recommendation frame can be directly obtained and displayed, the time for a user to play and watch the video to be marked can be saved, the process for searching the association frame by the user can be greatly optimized by calculating the recommendation frame, the marking efficiency of the marking frame with the association relationship can be improved, namely the target association efficiency can be improved, and the convenient and effective association of the same target appearing in each frame of the video can be realized. Of course, it is not necessary for any product or method to achieve all of the above-described advantages at the same time for practicing the invention.

The innovation points of the embodiment of the invention comprise:

1. the method can obtain a current frame to be marked, a first video frame corresponding to the current frame to be marked, recommendation frames which are suspected to be in association with the current frame to be marked and a second video frame corresponding to each recommendation frame, and then display the recommendation frames and the second video frames, so that association of frames suspected to be the same target in the video frames to be marked on an image level is realized, the recommendation frames which are suspected to be in association with the current frame to be marked and the second video frames corresponding to the recommendation frames are directly obtained and displayed, time for a user to play and watch the video to be marked is saved, a process for the user to search for the association frames is greatly optimized by calculating the recommendation frames, marking efficiency of the marking frames with the association relation is improved, namely target association efficiency is improved, and convenient and effective association of the same target appearing in each frame of video frame of a video is realized.

2. The first video frame corresponding to the current frame to be labeled is the last video frame containing the target corresponding to the current frame to be labeled in the current detected video to be labeled, and the second video frame corresponding to each recommendation frame is the first video frame containing the target corresponding to the recommendation frame in the current detected video to be labeled, so that the labeling association time of a user on the detection frames corresponding to the targets of the same physical object is better saved to a certain extent, and the labeling association efficiency of the user on the detection frames corresponding to the targets of the same physical object is improved.

3. Under the condition of starting a display instruction for displaying a recommendation frame corresponding to a current frame to be marked, directly obtaining the recommendation frames with suspected association relationship in the current frame to be marked and a second video frame corresponding to each recommendation frame, displaying, under the condition that the display instruction for displaying the recommendation frame corresponding to the current frame to be marked is not started, obtaining and playing the current frame to be marked, a first video frame where the current frame to be marked is located and a video frame to be played corresponding to the current frame to be marked, so that a user can determine a target recommendation frame with association relationship with the current frame to be marked from the played video frames to be played, and under the condition that the target recommendation frame is determined to exist, establishing association relationship between the current frame to be marked and the target recommendation frame, and under the condition that the association relationship is determined, the target recommendation frame with the association relationship with the current frame to be marked is determined by simultaneously displaying the current frame to be marked and the first video frame where the current frame to be marked and the video frame to be played, so that the reference comparison is performed by the user, and the target recommendation frame to be played is determined from the video frames to be played, thereby improving the marking efficiency of the user to a certain extent.

4. Traversing N frames of video frames after the end frame video frame corresponding to the current frame to be marked, determining that the corresponding frame identification meets the preset recommendation condition, namely the corresponding frame identification is not a detection frame appearing in the end frame video frame corresponding to the current frame to be marked and the previous frame, and taking the detection frame as a recommendation frame having a suspected association relation with the current frame to be marked, so as to list out the recommendation frame which has the suspected association relation with the current frame to be marked and is most likely to be a recommendation frame having the suspected association relation with the current frame to be marked in the N frames of video frames after the end frame video frame corresponding to the current frame to be marked, and recommending the recommendation frame to a user.

5. Providing a variety of ancillary functions, such as: the function of amplifying and displaying the first recommendation frame having the suspected association relation with the current frame to be annotated and the corresponding second video frame is provided, so that a user can perform association operation after carefully comparing, the method is more friendly to the user, and the requirement of the professional skill of the user is reduced. The jump instruction of the previous section is instructed to provide a mistake correcting function for the user, and the user returns to the interface before the error association operation after the error association operation occurs, so that the cost of the error association operation is reduced. Moreover, auditing and checking of the marking result of the user by an auditor are facilitated, and the accuracy of the marking data is improved. The method comprises the steps of respectively displaying a first video frame of a current frame to be marked and frame identification information corresponding to the current frame to be marked in different areas of the same interface, a second video frame containing a recommended frame and frame number information corresponding to the second video frame, and frame identification information corresponding to the recommended frame, wherein the second video frame is corresponding to each recommended frame which is suspected to be associated with the current frame to be marked, so that the frame identification information corresponding to each recommended frame can be referred by a user for viewing, and the user marking experience can be improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It is to be understood that the drawings in the following description are merely exemplary of some embodiments of the invention. For a person skilled in the art, without inventive effort, other figures can also be derived from these figures.

Fig. 1 is a schematic flow chart of a target association method according to an embodiment of the present invention;

fig. 2 is another schematic flow chart of a target association method according to an embodiment of the present invention;

FIG. 3 is a diagram of an exemplary first predefined presentation interface;

fig. 4 is a schematic structural diagram of a target association apparatus according to an embodiment of the present invention.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It is to be understood that the described embodiments are merely a few embodiments of the invention, and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.

It is to be noted that the terms "comprises" and "comprising" and any variations thereof in the embodiments and drawings of the present invention are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

The invention provides a target association method and a target association device, which are used for realizing convenient and effective association of the same target appearing in each frame of video frame of a video. The following provides a detailed description of embodiments of the present invention.

Fig. 1 is a schematic flow chart of a target association method according to an embodiment of the present invention. The method may comprise the steps of:

s101: and obtaining a current frame to be marked and a first video frame corresponding to the current frame to be marked.

The first video frame corresponding to the current frame to be marked is as follows: and the detected video frames which contain the target corresponding to the current frame to be marked and meet the first screening condition in the video to be marked.

The target association method provided by the embodiment of the invention can be applied to any type of electronic equipment, and the electronic equipment can be a server or a terminal. In an implementation manner, the electronic device may be equipped with a preset labeling tool, so as to implement the target association method provided by the embodiment of the present invention through the preset labeling tool.

The electronic device can first obtain a video to be annotated and a pre-annotation result corresponding to the video to be annotated. The video to be annotated can be any type of video, for example, it can be: the video to be marked is as follows: when the test vehicle runs in a target scene, the road video collected by the image collecting equipment is aimed at the surrounding environment of the test vehicle in the running process.

In one case, the electronic device may display a display interface for the user to select the video to be tagged, which needs to be tagged, where the display interface displays icons corresponding to the videos to be tagged, and the user may select the video to be tagged, which needs to be tagged, through a mouse, a stylus, or a finger, so that the electronic device obtains the video to be tagged and the pre-tagging result corresponding to the video to be tagged.

The pre-annotation result corresponding to the video to be annotated can be: and tracking and detecting each target in the video to be annotated by using a tracking algorithm based on target detection, wherein the result is obtained by tracking and detecting each target in the video to be annotated and can also be a result of manually annotating each target in the video to be annotated by a user history. The pre-annotation result corresponding to the video to be annotated may include: the method comprises the steps of detecting position information of a detection frame of a target contained in each video frame in a video to be labeled, detecting an association relation between targets which are the same physical object and are detected from different video frames, labeling information corresponding to the detection frame of each target, and semantic information corresponding to point cloud data corresponding to each target.

The association relationship between the targets detected in different video frames as the same physical object can be represented by the frame identification information of the labeled frame corresponding to the target. For example: can be as follows: the frame identification information of the corresponding labeled frame is the same between the targets which are detected in different video frames and are the same physical object. The frame identification information of the detection frames corresponding to the targets of the same physical object is the same, and the frame identification information of the detection frames corresponding to the targets of different physical objects is different.

The label information corresponding to the detection frame is information indicating whether the detection frame is labeled.

In the embodiment of the present invention, the detection frame of the target may be referred to as a labeling frame after being labeled by the user; before it is not marked by the user, it can be referred to as an unmarked box.

The tracking algorithm based on the target detection can be a target tracking model based on deep learning, and can also be a traditional target tracking algorithm, which is optional, and the embodiment of the invention does not limit the specific type of the tracking algorithm based on the target detection.

In one implementation, in order to ensure the accuracy of the pre-annotation result corresponding to the video to be annotated, manual examination and detection of the pre-annotation result corresponding to the video to be annotated are required. After the electronic device obtains the video to be marked and the corresponding pre-marking result thereof, in one case, the electronic device may first traverse the pre-marking result corresponding to the video to be marked, that is, traverse the detection frame corresponding to the video to be marked, determine the current frame to be marked from the unmarked detection frame corresponding to the video to be marked, that is, the unmarked frame, and determine the first video frame corresponding to the current frame to be marked from the video to be marked; in another case, the electronic device may obtain a selection operation triggered by a user for an unmarked detection frame, i.e., an unmarked frame, corresponding to the video to be marked, and determine the unmarked frame selected by the user as a current frame to be marked; in another implementation, the other electronic device may determine the current frame to be annotated from the pre-annotation result corresponding to the video to be annotated, determine the first video frame corresponding to the current frame to be annotated from the video to be annotated, and send the first video frame to the electronic device, where the electronic device obtains the current frame to be annotated and the first video frame corresponding to the current frame to be annotated. And the first video frame corresponding to the current frame to be marked comprises the target corresponding to the current frame to be marked.

In consideration of the characteristics of target tracking results, the same target often appears in multiple frames of video frames, and one physical target appears once at the same time. In order to reduce the workload of user annotation to a certain extent, the detected video frames, which include the target corresponding to the current frame to be annotated and satisfy the first screening condition, in the video to be annotated may be: the last frame video frame containing the target corresponding to the current frame to be marked in the detected video to be marked is the tail frame video frame, that is, the first video frame corresponding to the current frame to be annotated may be a last frame video frame including a target corresponding to the current frame to be annotated in the detected video to be annotated. In one case, in order to ensure that a user can better label a current frame to be labeled, it is determined whether a target physically corresponding to a target corresponding to the current frame to be labeled exists in other video frames, and a detected video frame which includes the target corresponding to the current frame to be labeled and satisfies a first screening condition in a video to be labeled may be: and the detected video frames which contain the target corresponding to the current frame to be marked and have the definition of the target exceeding a preset definition threshold value in the video to be marked.

S102: and obtaining recommendation frames which have suspected association relation with the current frame to be marked and a second video frame corresponding to each recommendation frame.

Wherein, the second video frame corresponding to each recommendation frame is: and the detected video to be annotated comprises video frames of the target corresponding to the recommendation frame and meeting a second screening condition. In consideration of characteristics of a target tracking result, the same target often appears in multiple frames of video frames, and in order to reduce workload of user annotation to a certain extent, the detected video frames, which include the target corresponding to the recommendation frame and satisfy the second screening condition, in the video to be annotated may be: the currently detected video to be annotated comprises a first frame video frame of a target corresponding to the recommending frame. In one case, in order to ensure that the user can better label the current frame to be labeled, it is determined whether there is a target that is physically corresponding to the same target as the target corresponding to the current frame to be labeled from other video frames, and the detected video frames that include the target corresponding to the recommended frame and satisfy the second screening condition in the video to be labeled may be: can be as follows: and the detected video frames which contain the target corresponding to the recommendation frame and the definition of which exceeds a preset definition threshold value in the video to be labeled.

In this step, the electronic device may determine, for the current frame to be labeled, a detection frame having a suspected association relationship with the current frame to be labeled from detection frames corresponding to multiple frames of video frames subsequent to the last frame of video frame corresponding to the current frame to be labeled in the video to be labeled, as a recommendation frame corresponding to the current frame to be labeled, and determine, from the video to be labeled, a second video frame corresponding to each recommendation frame.

S103: and displaying a first video frame containing the current frame to be annotated and a second video frame corresponding to each recommendation frame and containing the recommendation frame, so that a user can determine a target recommendation frame which has an association relation with the current frame to be annotated from the displayed recommendation frames based on the second video frame corresponding to each recommendation frame.

In this step, the electronic device may display, in different areas of the first preset display interface, a first video frame including a current frame to be marked, and display a second video frame including a recommended frame corresponding to each recommended frame corresponding to the current frame to be marked. Therefore, a user can visually compare a target corresponding to a current frame to be marked in the first video frame with a target corresponding to the recommended frame in the second video frame corresponding to each recommended frame, and determine whether the recommended frame and the current frame to be marked have an association relationship, namely whether the target corresponding to the recommended frame and the target corresponding to the current frame to be marked are the same physical object.

In one case, the electronic device may display, in the first preset display interface, a second video frame corresponding to each recommended frame corresponding to the currently to-be-annotated frame in the form of an image list, so that the user drags to view the second video frame. Correspondingly, the first preset display interface can also display a selection area with or without the associated frame, so that the user can perform corresponding operation when determining that the recommended frame with the associated relationship does not exist in the recommended frame corresponding to the current frame to be marked.

S104: and determining whether a target recommending frame with an association relation with the current frame to be marked exists in all recommending frames based on the first association relation operation information of the user, and establishing the association relation between the current frame to be marked and the target recommending frame under the condition that the target recommending frame with the association relation with the current frame to be marked exists.

In the step, the electronic equipment obtains first incidence relation operation information triggered by a user based on a displayed first video frame containing a current frame to be marked and a second video frame containing a recommended frame corresponding to each recommended frame corresponding to the current frame to be marked; and determining whether a target recommending frame with an association relation with the current frame to be marked exists in all recommending frames based on the first association relation operation information, and establishing the association relation between the current frame to be marked and the target recommending frame under the condition of determining that the target recommending frame with the association relation with the current frame to be marked exists. The process of establishing the association relationship between the current frame to be labeled and the target recommendation frame may be: and modifying the frame identification information corresponding to the current frame to be marked and the frame identification information corresponding to the target recommendation frame into the same information.

The electronic equipment determines that a target recommendation frame having an association relation with a current frame to be marked exists in all recommendation frames corresponding to the current frame to be marked on the basis of first association relation operation information when the electronic equipment determines that a user selects at least one recommendation frame in the recommendation frames corresponding to the current frame to be marked, and establishes the association relation between the current frame to be marked and the target recommendation frame by taking the recommendation frame selected by the user as the target recommendation frame, namely, a target corresponding to the current frame to be marked and a target corresponding to the target recommendation frame are marked to be the same physical object, namely, the association between the target corresponding to the current frame to be marked and the target corresponding to the target recommendation frame is realized.

Subsequently, in order to ensure the ordering of the user labeling process and better save the labeling association time of the user for the detection frames corresponding to the targets of the same physical object to a certain extent, improve the labeling association efficiency of the user for the detection frames corresponding to the targets of the same physical object, the target recommendation frame selected by the user and having an association relationship with the current frame to be labeled can be continuously used as a new current frame to be labeled, and then, the next labeling process is executed for the new current frame to be labeled.

In another case, when the electronic device determines that the user does not select any one of the recommendation frames corresponding to the current frame to be marked based on the first association relationship operation information, that is, when the user triggers the non-association-frame selection area, the electronic device determines that no target recommendation frame having an association relationship with the current frame to be marked exists in all the recommendation frames corresponding to the current frame to be marked.

Subsequently, in order to ensure the ordering of the user labeling process and improve the user labeling efficiency, the electronic device may determine a new current frame to be labeled from the unmarked frame corresponding to the video to be labeled, and then execute the next labeling process for the new current frame to be labeled.

By applying the embodiment of the invention, the current frame to be marked and the corresponding first video frame thereof, the recommendation frame which has a suspected association relationship with the current frame to be marked and the second video frame corresponding to each recommendation frame can be obtained and displayed, so that the association of the frames which are suspected to be the same target in the video frames to be marked on the image level is realized, the recommendation frame which has a suspected association relationship with the current frame to be marked and the second video frame corresponding to each recommendation frame are directly obtained and displayed, the time for a user to play and watch the video to be marked is saved, the process for the user to search the association frame is greatly optimized by calculating the recommendation frame, the marking efficiency of the marking frame having the association relationship is improved, namely the target association efficiency is improved, and the convenient and effective association of the same target appearing in each frame of the video is realized.

In another embodiment of the present invention, the S101 may be implemented by any one of the following two implementation manners:

the first implementation mode comprises the following steps:

and determining the unmarked frame with the earliest time representation and acquisition time of the timestamp information corresponding to the corresponding tail frame video frame from the unmarked frames corresponding to the video to be marked as the current frame to be marked.

In one case, considering the possibility that a user triggers the annotation operation on the video to be annotated for the first time in the annotation task, correspondingly, after the electronic device obtains the video to be annotated and the pre-annotation result corresponding to the video to be annotated, the electronic device may first traverse the pre-annotation result corresponding to the video to be annotated, that is, traverse the detection frame corresponding to the video to be annotated, determine the current frame to be annotated from the unmarked detection frame corresponding to the video to be annotated, that is, the unmarked frame, and determine the first video frame corresponding to the current frame to be annotated from the video to be annotated. In consideration of characteristics of a target tracking result, the same target often appears in multiple continuous video frames, and in order to reduce the labeling workload of a user to a certain extent and ensure the orderliness of a user labeling process, the electronic device can determine an unlabeled frame with earliest time stamp information representation acquisition time corresponding to a last frame video frame from an unlabeled detection frame, namely the unlabeled frame, corresponding to a video to be labeled as a current frame to be labeled.

The end frame video frame corresponding to the unmarked frame may be: and the video to be marked comprises the last frame of video frames in the video frames of the target corresponding to the frame which is not marked.

In another case, considering that the user does not trigger the annotation operation of the video to be annotated for the first time in the annotation task, in the previous annotation operation process of the user, for the displayed historical frame to be annotated, if a recommended frame having an association relationship with the historical frame to be annotated is not selected from the displayed recommended frames corresponding to the displayed historical frame to be annotated, the electronic device may determine, from the unmarked frames corresponding to the video to be annotated, the unmarked frame having the earliest time for representing and acquiring timestamp information corresponding to the corresponding end frame video frame as the current frame to be annotated.

The second implementation mode comprises the following steps:

and under the condition that the displayed historical frames to be labeled are detected to be selected by the user from the corresponding displayed recommendation frames, determining the recommendation frame selected by the user as the current frame to be labeled.

In the implementation manner, considering that the user does not trigger the marking operation of the video to be marked for the first time in the marking task, in the previous marking operation process of the user, under the condition that the displayed historical frame to be marked is selected from the corresponding displayed recommendation frames, and the recommendation which has the association relation with the historical frame to be marked is selected, in order to ensure the convenience of the marking process and the marking efficiency of the user, the detection frames of the same target can be marked. And determining the recommendation frame selected by the user as the current frame to be labeled when detecting that the user selects the recommendation frame having an association relation with the displayed historical frame to be labeled from the corresponding displayed recommendation frames, and further executing a subsequent target association process.

In another embodiment of the present invention, the step S102 may include the following steps 011-012:

011: traversing the pre-labeling results corresponding to the N frames of video frames after the end frame video frame corresponding to the current frame to be labeled, and determining the detection frame of which the corresponding frame identifier meets the preset recommendation condition from the detection frames corresponding to the N frames of video frames after the end frame video frame corresponding to the current frame to be labeled as the recommendation frame which has a suspected association relationship with the current frame to be labeled.

Wherein, the detection frame that the corresponding frame mark satisfies the preset recommendation condition includes: the corresponding frame identification information does not exist in the last frame video frame and the previous frame corresponding to the current frame to be marked.

012: and for each recommendation frame which has a suspected association relation with the current frame to be labeled, determining a video frame which meets a second screening condition and contains a target corresponding to the recommendation frame in N frames of video frames after the tail frame video frame corresponding to the current frame to be labeled as a second video frame corresponding to the recommendation frame.

The video frames which contain the targets corresponding to the recommendation boxes and meet the second screening condition comprise: the detected video to be annotated comprises the first frame video frame of the target corresponding to the recommending frame.

In the embodiment of the invention, the condition that the missed association occurs is considered that the targets which are the same physical object are detected as different targets, the frame identification information of the detection frames which are detected as corresponding to the different targets is different, and only one same physical target is arranged in one frame of video frame at the same time; and allows for a reduction in the computational load of the electronic device. In the process of obtaining the recommended frame corresponding to the current frame to be labeled, the electronic device may: traversing the pre-labeling result corresponding to the N frames of video frames after the last frame of video frame corresponding to the current frame to be labeled, and determining a new detection frame from the detection frames corresponding to the N frames of video frames after the last frame of video frame corresponding to the current frame to be labeled, wherein the detection frame corresponding to the frame identifier meets the preset recommendation condition and serves as the recommendation frame having the suspected association relation with the current frame to be labeled. The detection frame that the corresponding frame identification satisfies the preset recommendation condition comprises the following steps: the corresponding frame identification information does not exist in the last frame video frame and the previous frame corresponding to the current frame to be marked.

And for each recommendation frame which has a suspected association relation with the current frame to be annotated, determining a video frame which contains a target corresponding to the recommendation frame and meets a second screening condition in N frames of video frames after a last frame of video frame corresponding to the current frame to be annotated as a first frame of video frame which contains the target corresponding to the recommendation frame in the detected video to be annotated as a second video frame corresponding to the recommendation frame. For example: the method comprises the steps that N-P, N-P +1, N-P +2, N-P +3 and N-P +4 frames of video frames after a last frame of video frame corresponding to a current frame to be marked contain a target corresponding to a recommended frame A corresponding to the current frame to be marked, and at the moment, a first frame of video frame, namely the N-P frame of video frame, containing the target corresponding to the recommended frame in the video to be marked is determined as a second video frame corresponding to the recommended frame A.

Wherein, N is a positive integer, and N is a numerical value preset according to experience. Considering the factors of avoiding omission of the association relationship among the detection frames, the user workload and the work efficiency, and the detection frames with the association relationship with the overlarge frame number of the interval video frames, the significance of the subsequent target tracking model based on the depth science is not large, the value of N is not too small or too large, and in one case, the value range of N can be [20, 30].

In the embodiment of the invention, the recommending frame which is most likely to have the suspected association relation with the current frame to be labeled is listed in the N frames of video frames after the last frame of video frame corresponding to the current frame to be labeled and recommended to the user, so that the user can only pay attention to the target corresponding to the recommending frame, the watching amount and the working amount of the user are reduced, and the labeling efficiency is improved.

In another embodiment of the present invention, before the S102, the method may further include the following steps 021-024:

021: judging whether a display function of displaying a recommendation frame corresponding to a current frame to be marked is started or not; if yes, executing S102; if not, executing 022;

022: and obtaining M frames of video frames behind the tail frame video frame corresponding to the current frame to be marked in the video to be marked as the video frame to be played corresponding to the current frame to be marked.

The end frame video frame corresponding to the current frame to be marked is: and detecting a tail frame video frame containing a target corresponding to the current frame to be marked in the video to be marked.

023: displaying the first video frame containing the current frame to be marked, and playing the video frame to be played corresponding to the current frame to be marked so that a user can determine a target detection frame having an association relation with the current frame to be marked from detection frames included in the video frame to be played corresponding to the played current frame to be marked;

024: and determining whether a target detection frame with an association relation with the current frame to be marked exists or not from the marking frame included in the video frame to be played corresponding to the played current frame to be marked based on the second association relation operation information of the user, and establishing the association relation between the current frame to be marked and the target detection frame under the condition that the target detection frame with the association relation with the current frame to be marked exists.

In the embodiment of the present invention, when the electronic device determines that the display function for displaying the recommendation frame corresponding to the current frame to be marked is started, the electronic device may provide the display function for displaying the recommendation frame corresponding to the current frame to be marked for the user, that is, perform the step of obtaining the recommendation frames having suspected association with the current frame to be marked and the second video frame corresponding to each recommendation frame and the subsequent processes. Under the condition that the display function of displaying the recommendation frame corresponding to the current frame to be marked is judged to be not started, M frames of video frames behind the tail frame video frame corresponding to the current frame to be marked in the video to be marked can be directly determined and serve as the video frames to be played corresponding to the current frame to be marked; displaying a first video frame containing a current frame to be marked in a first display area of a second preset display interface; and playing the video frame to be played corresponding to the current frame to be marked frame by frame in a second display area of a second preset display interface. The user can check the target in each video frame of the video frames to be played, check whether the target corresponding to the current frame to be marked is the target of the same physical object, that is, whether a detection frame having an association relationship with the current frame to be marked exists in the detection frames, and trigger corresponding operations.

And the electronic equipment acquires second association relation operation information based on corresponding operation triggered by the user. And under the condition that a user selects one detection frame from the detection frames contained in the video to be played corresponding to the current frame to be marked based on the second association relation operation information, determining that a target detection frame having an association relation with the current frame to be marked exists in the detection frames contained in the video to be played, taking the detection frame selected by the user as a target detection frame, and establishing the association relation between the current frame to be marked and the target detection frame, namely marking that a target corresponding to the current frame to be marked and a target corresponding to the target detection frame are the same physical object, namely realizing the association between the target corresponding to the current frame to be marked and the target corresponding to the target detection frame.

Subsequently, in order to ensure the ordering of the user labeling process and better save the labeling association time of the user for the detection frames corresponding to the targets of the same physical object to a certain extent, improve the labeling association efficiency of the user for the detection frames corresponding to the targets of the same physical object, the target detection frame selected by the user and having an association relationship with the current frame to be labeled can be continuously used as a new current frame to be labeled, and then, the next labeling process is executed for the new current frame to be labeled.

In another case, when the electronic device determines that the user does not select one detection box from the detection boxes included in the video to be played corresponding to the current frame to be marked based on the second association relationship operation information, the electronic device determines that no target detection box having an association relationship with the current frame to be marked exists in the detection boxes included in the video to be played corresponding to the current frame to be marked.

In another embodiment of the present invention, in order to ensure better improvement of user experience and improve accuracy of the user annotation result, before S104, the method may further include:

under the condition that a first amplification instruction triggered by a user aiming at a first recommendation frame suspected to be in an association relation with a current frame to be marked is detected, amplifying and displaying the first recommendation frame and a second video frame corresponding to the first recommendation frame.

The triggering operation of the first amplification instruction comprises the following steps: and right clicking a second video frame corresponding to the first recommendation frame.

And/or under the condition that a second amplification instruction triggered by the user aiming at the first video frame corresponding to the current frame to be marked is detected, amplifying and displaying the first video frame corresponding to the current frame to be marked.

The embodiment of the invention provides an amplifying and displaying function, and under the condition that a user is detected to trigger an amplifying instruction aiming at a first recommended frame which is suspected to be in an association relation with a current frame to be annotated, namely that the user right clicks a second video frame corresponding to the first recommended frame, the first recommended frame and the second video frame corresponding to the first recommended frame are amplified and displayed, so that the user can carefully compare a target corresponding to the recommended frame with a target corresponding to the current frame to be annotated and determine whether the target and the target are the same physical object. And under the condition that a second amplification instruction triggered by a user aiming at the first video frame corresponding to the current frame to be marked is detected, amplifying and displaying the first video frame corresponding to the current frame to be marked. The triggering operation of the second amplification instruction may be: pressing a left mouse button or a right mouse button to roll the mouse in a display area of a first video frame corresponding to a current frame to be marked; or, sliding the display area of the first video frame corresponding to the current frame to be marked by two fingers.

Subsequently, the user can drag the picture by pressing a shift key and a left mouse key or pressing a middle mouse key, namely dragging a first video frame corresponding to the current frame to be marked or a second video frame corresponding to the first recommended frame.

In one case, the designated display position may be a first display area of the first preset display interface.

In another embodiment of the present invention, after the S104, as shown in fig. 2, the method may further include:

s201: and determining the target recommendation frame as a new current frame to be marked.

S202: and determining the detected video frame which contains the target corresponding to the new current frame to be marked and meets the first screening condition from the video to be marked as the first video frame corresponding to the new current frame to be marked.

S203: and acquiring a new recommendation frame having a suspected association relation with the new current frame to be annotated and a second video frame corresponding to each new recommendation frame.

Wherein, the second video frame corresponding to each new recommendation frame is: and the detected video to be annotated comprises video frames of the target corresponding to the new recommendation frame and meeting the second screening condition.

S204: and displaying a first video frame containing a new current frame to be annotated and a second video frame containing the new recommendation frame corresponding to each new recommendation frame, so that a user can determine a new target recommendation frame having an association relation with the new current frame to be annotated from the displayed new recommendation frames based on the second video frame corresponding to each new recommendation frame.

S205: and if a user is detected to trigger a jump instruction indicating to jump to the previous section, displaying a first video frame containing the current frame to be marked and a second video frame corresponding to each recommendation frame and containing the recommendation frame.

In this embodiment, after the electronic device establishes the association relationship between the current frame to be labeled and the target recommendation frame, the target recommendation frame may be directly determined as a new current frame to be labeled. And executes S202-S204. In S202, the process of determining the first video frame corresponding to the new current frame to be annotated may refer to the process of determining the first video frame corresponding to the current frame to be annotated; s203, referring to the process of obtaining a new recommended frame having a suspected association relationship with a new current frame to be annotated and a second video frame corresponding to each new recommended frame, the process of obtaining a new recommended frame having a suspected association relationship with a current frame to be annotated and a second video frame corresponding to each recommended frame may be referred to above; s204 shows a process of displaying a first video frame including a new frame to be annotated and a second video frame including a new recommended frame corresponding to each new recommended frame, which may refer to a process of displaying a first video frame including a current frame to be annotated and a second video frame including a recommended frame corresponding to each recommended frame, and is not repeated herein.

In order to reduce the cost of the error correlation operation, the embodiment of the invention provides an error correction function, when a user determines that the user has an error in a selected target recommendation frame aiming at a current frame to be labeled, the user can trigger a jump instruction for indicating to jump to the previous section, and if the electronic equipment detects that the user triggers the jump instruction for indicating to jump to the previous section, the electronic equipment displays a first video frame containing the current frame to be labeled and a second video frame corresponding to each recommendation frame and containing the recommendation frame.

In one implementation manner, in the process of establishing the association relationship between the current frame to be marked and the target recommendation frame, the electronic device may modify frame identification information corresponding to the target recommendation frame into identification information that is the same as the frame identification information corresponding to the current frame to be marked. In order to provide error correction information for a user, after the electronic device establishes an association relationship between a current frame to be labeled and a target recommendation frame, the electronic device still stores information before the association relationship between the current frame to be labeled and the target recommendation frame is established, for example: and keeping the corresponding frame identification information before the association relationship between the current frame to be marked and the target recommendation frame is established. Correspondingly, if the electronic equipment detects that the user triggers a jump instruction indicating to jump to the previous section, the electronic equipment can redisplay the first video frame containing the current frame to be marked and the second video frame corresponding to each recommendation frame and containing the recommendation frame.

In another embodiment of the present invention, before S103, the method may further include the following steps 031-032:

031: frame number information of a first video frame corresponding to a current frame to be marked and frame identification information corresponding to the current frame to be marked are obtained.

032: frame number information corresponding to a second video frame corresponding to each recommended frame having a suspected association relation with the current frame to be marked and frame identification information corresponding to each recommended frame are obtained.

S103, comprising the following steps 041-042:

041: and displaying the frame number information of a first video frame containing a current frame to be marked, a first video frame corresponding to the current frame to be marked and frame identification information corresponding to the current frame to be marked in a first display area of a first preset display interface.

042: and displaying a second video frame containing the recommended frame and frame number information corresponding to each recommended frame having a suspected association relation with the current frame to be marked and frame identification information corresponding to each recommended frame in a second display area of the first preset display interface.

In the embodiment of the present invention, in order to better assist the labeling work of the user, after displaying the first video frame including the current frame to be labeled and the second video frame including the recommended frame corresponding to each recommended frame, the electronic device may further obtain frame number information of the first video frame corresponding to the current frame to be labeled and frame identification information corresponding to the current frame to be labeled, where the frame identification information may be information capable of uniquely identifying a detected frame corresponding to a detected certain target, such as a frame ID, and the frame number information of the first video frame corresponding to the current frame to be labeled identifies: and position information, namely the information of the frame number, of the first video frame corresponding to the current frame to be marked in the video to be marked. The electronic device obtains frame number information corresponding to a second video frame corresponding to each recommended frame having a suspected association relationship with a current frame to be marked and frame identification information corresponding to each recommended frame, wherein each recommended frame having a suspected association relationship with the current frame to be marked can be called as: a recommendation frame corresponding to the current frame to be marked; frame number information identification corresponding to a second video frame corresponding to each recommendation frame having a suspected association relation with the current frame to be labeled: and the position information, namely the information of the frame number, of the second video frame in the video to be labeled, corresponding to each recommended frame which has a suspected association relation with the current frame to be labeled.

The electronic equipment displays a first video frame containing a current frame to be marked, frame number information of the first video frame corresponding to the current frame to be marked and frame identification information corresponding to the current frame to be marked in a first display area of a first preset display interface; and displaying a second video frame containing the recommended frame and frame number information corresponding to each recommended frame having a suspected association relation with the current frame to be marked and frame identification information corresponding to each recommended frame in a second display area of the first preset display interface. The display sequence of the second video frame corresponding to each recommendation frame corresponding to the current frame to be marked may be: and displaying the second video frames represented by the frame number information of the second video frames corresponding to each recommendation frame corresponding to the current frame to be marked in the front-back sequence of the video to be marked.

Fig. 3 is a diagram illustrating a first predetermined display interface according to an exemplary structure. The first display area of the first preset display interface may be referred to as an image display main panel, is located on the left side of the first preset display interface, and displays a first video frame including a current frame to be marked, frame number information of the first video frame corresponding to the current frame to be marked, and frame identification information corresponding to the current frame to be marked. The second display area of the first preset display interface may be referred to as a recommended frame selection panel, and displays a second video frame corresponding to each recommended frame corresponding to the current frame to be marked, frame identification information corresponding to each recommended frame, and frame number information corresponding to the second video frame corresponding to each recommended frame. And displaying the second video frame corresponding to each recommendation frame corresponding to the current frame to be marked in a preview mode, namely a thumbnail mode.

As shown in fig. 3, a "no matching frame" is also displayed at the end of the second video frame corresponding to the recommended frame in the second display area, that is, the above mentioned no-associated-frame selection area, the user selects the area corresponding to the "no matching frame", and the electronic device determines that there is no recommended frame having an association relationship with the current frame to be labeled in the recommended frame corresponding to the current frame to be labeled.

Taking fig. 3 as an example, when the user determines that the recommended frame corresponding to the frame identification information, that is, "ID" is 8, is a frame having an association relationship with the frame to be currently labeled, that is, the frame identification information corresponding to the frame, that is, "ID" is 7, the user may click on the area where the second video frame corresponding to the recommended frame having "ID" of 8 is located, so as to select the recommended frame having "ID" of 8; subsequently, the electronic device establishes an association relationship between the current frame to be labeled and the recommendation frame with "ID" of 8 selected by the user, for example: the "ID" of the recommendation box having "ID" of 8 is modified from 8 to "7".

In another embodiment of the present invention, the third display area of the first preset display interface may further include a first trigger area for instructing a jump instruction to jump to an upper section, and a second trigger area for instructing a jump instruction to jump to a lower section. Wherein, the jump instruction for indicating the jump to the previous section is: the instruction of the first video frame corresponding to the frame to be marked which is previous to the current frame to be marked is indicated, and the first video frame corresponding to the frame to be marked which is previous to the current frame to be marked is indicated: and the video frames which are detected to be the video frames which comprise the targets corresponding to the frames to be marked before the current frames to be marked and meet the first screening condition. The jump instruction instructing a jump to the next section is: and indicating an instruction for displaying a first video frame corresponding to a frame to be marked next to the current frame to be marked, wherein the frame to be marked next to the current frame to be marked is: and in the unmarked frames corresponding to the video to be marked, the timestamp information corresponding to the corresponding tail frame video frame represents the unmarked frame with the earliest acquisition time and the later acquisition time than the acquisition time of the current frame to be marked.

In one case, the third display area may be located below the second display area. As shown in fig. 3, "jump to previous stage" represents a first trigger area for instructing a jump instruction to jump to the previous stage, and "jump to next stage" represents a second trigger area for instructing a jump instruction to jump to the next stage.

Corresponding to the foregoing method embodiment, an embodiment of the present invention provides a target association apparatus, as shown in fig. 4, where the apparatus includes:

a first obtaining module 410, configured to obtain a current frame to be labeled and a first video frame corresponding to the current frame to be labeled, where the first video frame corresponding to the current frame to be labeled is: the detected video frames which contain the target corresponding to the current frame to be marked and meet the first screening condition in the video to be marked;

a second obtaining module 420, configured to obtain recommendation frames having a suspected association relationship with the current frame to be annotated and a second video frame corresponding to each recommendation frame, where the second video frame corresponding to each recommendation frame is: the detected video to be annotated comprises video frames of the target corresponding to the recommendation frame and meeting a second screening condition;

the first display module 430 is configured to display the first video frame including the current frame to be annotated and a second video frame corresponding to each recommendation frame and including the recommendation frame, so that a user determines a target recommendation frame having an association relationship with the current frame to be annotated from the displayed recommendation frames based on the second video frame corresponding to each recommendation frame;

the first determining and establishing module 440 is configured to determine whether a target recommending frame having an association relationship with the current frame to be annotated exists in all recommending frames based on first association relationship operation information of a user, and establish an association relationship between the current frame to be annotated and the target recommending frame when it is determined that the target recommending frame having an association relationship with the current frame to be annotated exists.

By applying the embodiment of the invention, the current frame to be marked, the corresponding first video frame of the current frame to be marked, the recommendation frame which has a suspected association relationship with the current frame to be marked and the second video frame corresponding to each recommendation frame can be obtained and displayed, so that the association of the frames suspected of the same target in the video frames to be marked on the image level can be realized, the recommendation frame which has the suspected association relationship with the current frame to be marked and the second video frame corresponding to each recommendation frame can be directly obtained and displayed, the time for a user to play and watch the video to be marked can be saved, the process for searching the association frame by the user can be greatly optimized by calculating the recommendation frame, the marking efficiency of the marking frame with the association relationship can be improved, namely the target association efficiency can be improved, and the convenient and effective association of the same target appearing in each frame of the video can be realized.

In another embodiment of the present invention, the video frames that include the target corresponding to the current frame to be annotated and satisfy the first filtering condition in the detected video to be annotated are: the last frame of video frame of the target corresponding to the current frame to be marked is contained in the detected video to be marked;

the detected video frames which contain the target corresponding to the recommendation frame and meet the second screening condition in the video to be annotated are as follows: and the detected video to be annotated comprises the first frame video frame of the target corresponding to the recommendation frame.

In another embodiment of the present invention, the first obtaining module 410 is specifically configured to:

or, the method is specifically configured to determine the recommendation frame selected by the user as the current frame to be labeled when detecting that the user selects the recommendation frame having an association relation with the displayed history frame to be labeled from the corresponding displayed recommendation frame.

In another embodiment of the present invention, the second obtaining module 420 is specifically configured to traverse the pre-labeling result corresponding to the N frames of video frames after the end frame video frame corresponding to the current frame to be labeled, and determine, from the detection frames corresponding to the N frames of video frames after the end frame video frame corresponding to the current frame to be labeled, a detection frame whose corresponding frame identifier meets the preset recommendation condition, as a recommendation frame that has a suspected association relationship with the current frame to be labeled, where the detection frame whose corresponding frame identifier meets the preset recommendation condition includes: the corresponding frame identification information does not appear in the last frame video frame and the previous frame corresponding to the current frame to be marked;

for each recommendation frame having a suspected association relationship with the current frame to be annotated, determining a video frame which contains a target corresponding to the recommendation frame and meets the second screening condition in the N frames of video frames after the end frame video frame corresponding to the current frame to be annotated as a second video frame corresponding to the recommendation frame, wherein the video frame which contains the target corresponding to the recommendation frame and meets the second screening condition comprises: and the detected video to be annotated comprises the first frame video frame of the target corresponding to the recommendation frame.

In another embodiment of the present invention, the apparatus further comprises: a determining module (not shown in the figure), configured to determine whether a display function of displaying a recommended frame corresponding to the current frame to be marked is started before the recommended frame having a suspected association relationship with the current frame to be marked and the second video frame corresponding to each recommended frame are obtained;

if yes, triggering the second obtaining module 420;

a third obtaining module (not shown in the figure), configured to, if not started, obtain, as the to-be-played video frame corresponding to the current to-be-labeled frame, an M-frame video frame after the last-frame video frame corresponding to the current to-be-labeled frame in the to-be-labeled video, where the last-frame video frame corresponding to the current to-be-labeled frame is: detecting a tail frame video frame containing a target corresponding to the current frame to be marked in the video to be marked;

a second display module (not shown in the figure), configured to display the first video frame including the current frame to be annotated, and play the video frame to be played corresponding to the current frame to be annotated, so that a user determines, from detection frames included in the video frame to be played corresponding to the played current frame to be annotated, a target detection frame having an association relationship with the current frame to be annotated;

a second determining and establishing module (not shown in the figure), configured to determine, based on second association relationship operation information of the user, whether a target detection box having an association relationship with the current frame to be annotated exists in annotation boxes included in the video frame to be played corresponding to the played current frame to be annotated, and establish an association relationship between the current frame to be annotated and the target detection box under the condition that the target detection box having the association relationship with the current frame to be annotated exists.

In another embodiment of the present invention, the apparatus further comprises: an enlargement display module (not shown in the figure), configured to determine, in the first association relationship operation information based on the user, whether there is a target recommendation frame having an association relationship with the current frame to be annotated from all recommendation frames, and in a case that it is determined that there is a target recommendation frame having an association relationship with the current frame to be annotated, before establishing an association relationship between the current frame to be annotated and the target recommendation frame, in a case that a first enlargement instruction triggered by the user for a first recommendation frame having a suspected association relationship with the current frame to be annotated is detected, enlarge and display a first recommendation frame and a second video frame corresponding to the first recommendation frame, where a triggering operation of the first enlargement instruction includes: right clicking a second video frame corresponding to the first recommendation frame;

and/or under the condition that a second amplification instruction triggered by a user aiming at the first video frame corresponding to the current frame to be marked is detected, amplifying and displaying the first video frame corresponding to the current frame to be marked.

In another embodiment of the present invention, the apparatus further comprises: a first determining module (not shown in the figure), configured to determine, in the first association relationship operation information based on the user, whether a target recommendation frame having an association relationship with the current frame to be labeled exists in all recommendation frames, and in a case that it is determined that a target recommendation frame having an association relationship with the current frame to be labeled exists, after establishing an association relationship between the current frame to be labeled and the target recommendation frame, determine the target recommendation frame as a new current frame to be labeled;

a second determining module (not shown in the figure), configured to determine, from the video to be annotated, a detected video frame that includes a target corresponding to the new current frame to be annotated and that satisfies a first filtering condition, as a first video frame corresponding to the new current frame to be annotated;

a fourth obtaining module (not shown in the figure), configured to obtain new recommended frames that have suspected associations with the new current frame to be annotated, and a second video frame corresponding to each new recommended frame, where the second video frame corresponding to each new recommended frame is: the detected video to be annotated comprises video frames of the target corresponding to the new recommendation frame and meeting a second screening condition;

a third display module (not shown in the figure) configured to display the first video frame containing the new current frame to be annotated and a second video frame corresponding to each new recommended frame and containing the new recommended frame, so that a user determines a new target recommended frame having an association relationship with the new current frame to be annotated from the displayed new recommended frames based on the second video frame corresponding to each new recommended frame;

a fourth display module (not shown in the figure), configured to display the first video frame including the current frame to be marked and the second video frame including the recommended frame corresponding to each recommended frame if the user triggers a jump instruction indicating to jump to the previous segment is detected.

In another embodiment of the present invention, the apparatus further comprises:

a fifth obtaining module (not shown in the figure), configured to obtain frame number information of a first video frame corresponding to the current frame to be marked and frame identification information corresponding to the current frame to be marked before displaying the first video frame containing the current frame to be marked and a second video frame corresponding to each recommended frame and containing the recommended frame;

a sixth obtaining module (not shown in the figure), configured to obtain frame number information corresponding to a second video frame corresponding to each recommended frame having a suspected association relationship with the current frame to be labeled and frame identification information corresponding to each recommended frame;

the first display module 430 is specifically configured to display, in a first display area of a first preset display interface, the frame number information of the first video frame including the current frame to be marked and the first video frame corresponding to the current frame to be marked, and the frame identification information corresponding to the current frame to be marked;

and displaying a second video frame containing the recommended frame and frame number information corresponding to each recommended frame corresponding to the current frame to be marked having the suspected association relationship with the recommended frame and frame identification information corresponding to each recommended frame in a second display area of the first preset display interface.

In another embodiment of the present invention, the third display area of the first preset display interface further includes a first trigger area for instructing a jump instruction to jump to a previous segment, and a second trigger area for instructing a jump instruction to jump to a next segment, where the jump instruction instructing to jump to the previous segment is: and indicating an instruction for displaying a first video frame corresponding to a frame to be marked which is one frame before the current frame to be marked, wherein the first video frame corresponding to the frame to be marked which is one frame before the current frame to be marked: the video frames which comprise targets corresponding to a frame to be marked before the current frame to be marked and meet a first screening condition in the detected video to be marked are detected; the jump instruction for instructing to jump to the next section is as follows: and indicating and displaying a first video frame corresponding to a frame to be marked next to the current frame to be marked, wherein the frame to be marked next to the current frame to be marked is: and in the unmarked frames corresponding to the video to be marked, the timestamp information corresponding to the corresponding tail frame video frame represents the unmarked frame with the earliest acquisition time and the later acquisition time than the acquisition time of the current frame to be marked.

The device and system embodiments correspond to the method embodiments, and have the same technical effects as the method embodiments, and specific descriptions refer to the method embodiments. The device embodiment is obtained based on the method embodiment, and for specific description, reference may be made to the method embodiment section, which is not described herein again. Those of ordinary skill in the art will understand that: the figures are merely schematic representations of one embodiment, and the blocks or flow diagrams in the figures are not necessarily required to practice the present invention.

Those of ordinary skill in the art will understand that: modules in the devices in the embodiments may be distributed in the devices in the embodiments according to the description of the embodiments, or may be located in one or more devices different from the embodiments with corresponding changes. The modules of the above embodiments may be combined into one module, or further split into multiple sub-modules.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for target association, the method comprising:

obtaining a current frame to be marked and a first video frame corresponding to the current frame to be marked, wherein the first video frame corresponding to the current frame to be marked is as follows: the detected video frames which contain the target corresponding to the current frame to be labeled and meet the first screening condition in the video to be labeled;

obtaining recommendation frames which have suspected association relation with the current frame to be marked and a second video frame corresponding to each recommendation frame, wherein the second video frame corresponding to each recommendation frame is as follows: the detected video to be annotated comprises video frames of the target corresponding to the recommendation frame and meeting a second screening condition;

determining whether a target recommendation frame having an association relation with the current frame to be labeled exists in all recommendation frames based on first association relation operation information of a user, and establishing an association relation between the current frame to be labeled and the target recommendation frame under the condition that the target recommendation frame having the association relation with the current frame to be labeled is determined to exist;

the step of obtaining the recommendation frames having the suspected association relationship with the current frame to be annotated and the second video frame corresponding to each recommendation frame includes:

traversing the pre-labeling result corresponding to the N frames of video frames after the last frame of video frame corresponding to the current frame to be labeled, and determining the detection frame with the corresponding frame identifier meeting the preset recommendation condition from the detection frames corresponding to the N frames of video frames after the last frame of video frame corresponding to the current frame to be labeled as the recommendation frame with the suspected association relation with the current frame to be labeled, wherein the detection frame with the corresponding frame identifier meeting the preset recommendation condition comprises: the corresponding frame identification information is a detection frame which does not appear in a last frame video frame and a previous frame corresponding to the current frame to be marked, the first video frame corresponding to the current frame to be marked is a last frame video frame which contains a target corresponding to the current frame to be marked in the detected video to be marked, and the last frame video frame is a last frame video frame;

for each recommendation frame having a suspected association relationship with the current frame to be annotated, determining a video frame which contains a target corresponding to the recommendation frame and meets the second screening condition in the N frames of video frames after the end frame video frame corresponding to the current frame to be annotated as a second video frame corresponding to the recommendation frame, wherein the video frame which contains the target corresponding to the recommendation frame and meets the second screening condition comprises: and the detected video to be annotated comprises a first frame video frame of a target corresponding to the recommendation frame.

2. The method of claim 1, wherein the step of obtaining the current frame to be labeled in the video to be labeled is implemented by any one of the following two implementations:

the first implementation mode comprises the following steps:

the second implementation mode comprises the following steps:

3. The method of claim 1, wherein before the step of obtaining the recommendation frames suspected to have an association relationship with the current frame to be labeled and the second video frame corresponding to each recommendation frame, the method further comprises:

4. The method according to any one of claims 1 to 3, wherein, before the step of determining whether there is a target recommended box in association with the current frame to be annotated from all recommended boxes based on the first association operation information of the user, and in the case that it is determined that there is a target recommended box in association with the current frame to be annotated, establishing an association between the current frame to be annotated and the target recommended box, the method further comprises:

under the condition that a first amplification instruction triggered by a user for a first recommended frame having a suspected association relation with the current frame to be annotated is detected, amplifying and displaying the first recommended frame and a second video frame corresponding to the first recommended frame, wherein the triggering operation of the first amplification instruction comprises: right clicking a second video frame corresponding to the first recommendation frame; and/or

5. The method according to any one of claims 1 to 3, wherein after the steps of determining whether there is a target recommended box having an association relationship with the current frame to be labeled from all recommended boxes based on the first association relationship operation information of the user, and establishing an association relationship between the current frame to be labeled and the target recommended box in the case that it is determined that there is a target recommended box having an association relationship with the current frame to be labeled, the method further comprises:

displaying the first video frame containing the new current frame to be annotated and a second video frame corresponding to each new recommendation frame and containing the new recommendation frame, so that a user can determine a new target recommendation frame which has an association relation with the new current frame to be annotated from the displayed new recommendation frames based on the second video frame corresponding to each new recommendation frame;

6. The method according to any one of claims 1-3, wherein before the step of presenting the first video frame containing the current frame to be annotated and the second video frame containing each recommendation frame corresponding to the recommendation frame, the method further comprises:

displaying the frame number information of the first video frame comprising the current frame to be marked and the first video frame corresponding to the current frame to be marked and the frame identification information corresponding to the current frame to be marked in a first display area of a first preset display interface;

7. The method as claimed in claim 6, wherein the third display area of the first preset display interface further comprises a first trigger area for indicating a jump instruction to jump to a previous segment, and a second trigger area for indicating a jump instruction to jump to a next segment, wherein the jump instruction to jump to a previous segment is: and indicating an instruction for displaying a first video frame corresponding to a frame to be marked which is one frame before the current frame to be marked, wherein the first video frame corresponding to the frame to be marked which is one frame before the current frame to be marked: the video frames which comprise targets corresponding to a frame to be marked before the current frame to be marked and meet a first screening condition in the detected video to be marked are detected; the jump instruction for instructing to jump to the next section is as follows: and indicating and displaying a first video frame corresponding to a frame to be marked next to the current frame to be marked, wherein the frame to be marked next to the current frame to be marked is: and in the unmarked frames corresponding to the video to be marked, the timestamp information corresponding to the corresponding end frame video frame represents the unmarked frame with the earliest acquisition time and the later acquisition time than the current acquisition time of the frame to be marked.

8. An apparatus for target association, the apparatus comprising:

a first obtaining module, configured to obtain a current frame to be labeled and a first video frame corresponding to the current frame to be labeled, where the first video frame corresponding to the current frame to be labeled is: the detected video frames which contain the target corresponding to the current frame to be marked and meet the first screening condition in the video to be marked;

the first determination and establishment module is configured to determine whether a target recommendation frame having an association relationship with the current frame to be annotated exists in all recommendation frames based on first association relationship operation information of a user, and establish the association relationship between the current frame to be annotated and the target recommendation frame under the condition that the target recommendation frame having the association relationship with the current frame to be annotated exists;

the second obtaining module is specifically configured to traverse a pre-labeling result corresponding to an N frame of video frames after the last frame of video frame corresponding to the current frame to be labeled, and determine, from detection frames corresponding to the N frame of video frames after the last frame of video frame corresponding to the current frame to be labeled, a detection frame whose corresponding frame identifier meets a preset recommendation condition as a recommendation frame having a suspected association relationship with the current frame to be labeled, where the detection frame whose corresponding frame identifier meets the preset recommendation condition includes: the corresponding frame identification information is a detection frame which does not appear in a last frame video frame and a previous frame corresponding to the current frame to be marked, the first video frame corresponding to the current frame to be marked is a last frame video frame which contains a target corresponding to the current frame to be marked in the detected video to be marked, and the last frame video frame is a last frame video frame;