CN108491774B - Method and device for tracking and labeling multiple targets in video - Google Patents

Method and device for tracking and labeling multiple targets in video Download PDF

Info

Publication number
CN108491774B
CN108491774B CN201810198882.5A CN201810198882A CN108491774B CN 108491774 B CN108491774 B CN 108491774B CN 201810198882 A CN201810198882 A CN 201810198882A CN 108491774 B CN108491774 B CN 108491774B
Authority
CN
China
Prior art keywords
annotation
current
frame
change
identifier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810198882.5A
Other languages
Chinese (zh)
Other versions
CN108491774A (en
Inventor
耿益锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Horizon Robotics Technology Research and Development Co Ltd
Original Assignee
Beijing Horizon Robotics Technology Research and Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Horizon Robotics Technology Research and Development Co Ltd filed Critical Beijing Horizon Robotics Technology Research and Development Co Ltd
Priority to CN201810198882.5A priority Critical patent/CN108491774B/en
Publication of CN108491774A publication Critical patent/CN108491774A/en
Application granted granted Critical
Publication of CN108491774B publication Critical patent/CN108491774B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

A method and a device for tracking and labeling a plurality of targets in a video are disclosed. The method includes obtaining annotated frames of one or more portions by initially annotating a plurality of frames extracted from the video that are consecutive in time. For the current one of the annotation frames in the current one of the parts, the method further comprises modifying and adjusting the identity of each annotation object in the current annotation frame according to the identity change table, the identity deletion table, and the annotation target reference set of the current part. By the method, the efficiency and the accuracy of the labeling task can be improved.

Description

Method and device for tracking and labeling multiple targets in video
Technical Field
The present disclosure relates generally to data annotation, and in particular, to a method and apparatus for tracking annotation of multiple targets in a video.
Background
In the video labeling task of multi-target tracking, each frame extracted from a video can be labeled firstly, then the two frames before and after are compared manually, and a unique identifier is distributed for a labeled object in each frame; the unique identifier can also be completely and manually assigned to the labeled target while labeling the target in each frame.
These approaches need to take into account the correlation between different frames, the complexity of the labeling task is high, and often a large number of iterative modifications are required. In the case where the video is long or the number of frames extracted from the video is large, the time and labor costs required for the annotation task are very high.
It is desirable to be able to label multiple targets in a video while assigning a unique identifier to each labeled target with high efficiency and high accuracy, such that the same target has the same unique identifier in different frames of the video.
Disclosure of Invention
According to an aspect of the present disclosure, a method for performing tracking annotation on a plurality of targets in a video is provided. The method may include obtaining annotation frames for one or more portions by initially annotating a plurality of temporally successive frames extracted from the video, wherein each portion may include one or more annotation frames, and each annotation object in each annotation frame may be assigned a respective identification. For a current one of the annotation frames in the current one of the portions, the method may further comprise: modifying the identifier of each to-be-modified labeled object in the current labeling frame according to the identifier change table and the identifier deletion table of the current part, wherein the current labeling frame becomes the current modified frame of the current part after the modification; adjusting the mark of each to-be-adjusted labeling object in the current modification frame according to a plurality of targets to be labeled and a labeling target reference set of the current part, wherein the current modification frame becomes the current adjustment frame of the current part after adjustment; and after the adjustment, updating the identification change table, the identification deletion table and the labeling target reference set.
According to another aspect of the present disclosure, there is also provided a non-transitory storage medium having stored thereon program instructions, wherein the program instructions, when executed, may instruct one or more processors to perform the above-described method.
According to another aspect of the present disclosure, an apparatus for performing tracking annotation on multiple targets in a video is also provided. The apparatus may include one or more processors that may be configured to perform the above-described method.
According to another aspect of the present disclosure, an apparatus for performing tracking annotation on multiple targets in a video is also provided. The apparatus may include an annotator and an adjuster. The annotator can be configured to obtain annotation frames of one or more portions by initially annotating a plurality of temporally successive frames extracted from the video, wherein each portion can include one or more annotation frames, and each annotation object in each annotation frame can be assigned a respective identification. For a current annotation frame in each annotation frame in a current portion in each portion, the adjuster may be configured to: modifying the identifier of each to-be-modified labeled object in the current labeling frame according to the identifier change table and the identifier deletion table of the current part, wherein the current labeling frame becomes the current modified frame of the current part after modification; adjusting the mark of each to-be-adjusted labeling object in the current modification frame according to a plurality of targets to be labeled and a labeling target reference set of the current part, wherein the current modification frame becomes the current adjustment frame of the current part after adjustment; and after the adjustment, updating the identification change table, the identification deletion table and the labeling target reference set.
According to the method and the device disclosed by the embodiment of the disclosure, the complexity of the labeling task can be obviously reduced, the number of times of manual intervention can be obviously reduced, the parallelization of the tracking labeling task can be supported, and the execution efficiency and the accuracy of the labeling task can be obviously improved.
Drawings
Fig. 1 shows a flowchart of a method for tracking and annotating multiple targets in a video according to an embodiment of the present disclosure.
FIG. 2 illustrates an example of tracking annotation of multiple targets in a video according to an embodiment of the present disclosure.
FIG. 3 illustrates an example of tracking annotation of multiple targets in a video according to an embodiment of the present disclosure.
FIG. 4 illustrates an example of tracking annotation of multiple targets in a video according to an embodiment of the present disclosure.
FIG. 5 illustrates an example of tracking annotation of multiple targets in a video according to an embodiment of the present disclosure.
FIG. 6 illustrates an example of tracking annotation of multiple targets in a video according to an embodiment of the present disclosure.
Fig. 7 illustrates an example of merging annotation results for multiple parts according to an embodiment of the present disclosure.
Fig. 8 illustrates an example of an apparatus for tracking and annotating multiple targets in a video according to an embodiment of the present disclosure.
Fig. 9 illustrates another example of an apparatus for tracking and annotating multiple targets in a video according to an embodiment of the present disclosure.
Detailed Description
Fig. 1 shows a flowchart of a method for tracking and annotating multiple targets in a video according to an embodiment of the present disclosure.
As shown in fig. 1, a method according to an embodiment of the present disclosure may include step S110. The method according to an embodiment of the present disclosure may further include, for a current annotation frame in each annotation frame in a current part in each part obtained in step S110, step S210, step S220, and step S230.
In step S110, one or more parts of annotation frames can be obtained by initially annotating a plurality of frames extracted from the video that are consecutive in time, wherein each part can include one or more annotation frames, and each annotation object in each annotation frame can be assigned a corresponding identifier.
In step S210, the identifier of each to-be-modified annotation object in the current annotation frame may be modified according to the identifier change table and the identifier deletion table of the current portion. The current annotation frame may become the current modification frame of the current portion after the processing of step S210.
In step S220, the identifier of each annotation object to be adjusted in the current modification frame can be adjusted according to the multiple targets to be annotated and the annotation target reference set of the current part. The current modification frame may become the current adjustment frame of the current portion after the process of step S220.
In step S230, the identity change table, the identity deletion table, and the annotation target reference set of the current part may be updated.
The individual steps of the above-described method are described in detail below.
In one embodiment, in step S110, a plurality of temporally successive frames (or images) may first be extracted from the video.
As desired, for example, in accordance with various aspects of video length, cost, efficiency requirements, accuracy requirements, etc., various suitable framing tools (e.g., non-linear editors, video clips, etc., which may be used for post-desired production) or framing methods (e.g., rate-adaptive image extraction methods, etc.) or devices that may be used to implement framing (e.g., including a processor such as a graphics processor or a general purpose processor and that may run the aforementioned framing tools or any computing device or component that performs the aforementioned framing methods) may be employed to extract temporally successive frames from a video in accordance with suitable framing rules or parameters.
Each frame extracted may have a globally unique identification or number. For example, each frame may be assigned with a flag according to a time point position of each frame on the time axis, or a number may be set for each frame in the order of each frame on the time axis.
The present disclosure is not limited to a particular video framing method, nor to a particular video length, frame rate, and number of frames decimated. For example, for a video segment with a length of 10 seconds and a frame rate of 24 frames/second, 240 frames may be extracted from the video segment, or 80 frames may be extracted from the video segment by selecting 8 frames from 24 frames per second.
In a further embodiment, a plurality of temporally successive frames that have been extracted from the video may be provided as input to step S110.
Then, in one embodiment, in step S110, various automatic image annotation methods, such as an automatic image annotation method based on depth learning, an automatic image annotation method based on multi-scale context, an automatic image annotation method based on a deep boltzmann machine and canonical correlation analysis, and the like, may be adopted to automatically detect and label the objects that may belong to the multiple targets to be labeled in each extracted frame. The present disclosure is not limited to a particular automatic image annotation process.
Each frame, after being annotated, may be referred to as an annotated frame. Each annotation frame may include one or more annotation boxes or detection boxes of rectangular or other shape (e.g., polygonal or closed curve). Each annotation box in each annotation frame can correspond to or frame a partial region or portion of a image in the annotation frame, which can be referred to herein as an annotation object.
Automatic image annotation methods generally do not guarantee that the results of target detection and annotation are necessarily completely correct. Even if the detection and labeling is performed manually, there may be errors in the obtained detection and labeling results. Therefore, in each of the labeling frames obtained by the above-described labeling, the partial image in each partial region may or may not correspond to one of the objects to be labeled, or the setting of one or more labeling boxes is inappropriate (for example, having an inappropriate size, not completely framing the objects to be labeled, or the like).
Then, in step S110, an identifier may be assigned to each annotation object in each annotation frame.
For example, all annotation objects in all annotation frames can be assigned different identifications from each other, regardless of whether two annotation objects from two annotation frames respectively actually correspond to the same target to be annotated. For example, a global counter may be set, and each annotation object may be assigned an identifier according to the count value of the global counter.
In another example, in the above-mentioned labeling process, all the obtained labeling objects may be recorded, and the newly obtained labeling object may be image-matched with all the obtained labeling objects to determine whether to assign a new unique identifier to the newly obtained labeling object or to use an identifier of a certain obtained labeling object. The processing in this example may come at the expense of hardware cost and processing efficiency. For example, a large storage space may be required to hold images of all of the possible acquired annotation objects, and the more target objects that are held, the greater the amount of processing. The processing efficiency may be affected in the case of a long video or a large number of frames extracted from the video.
Then, in step S110, all the annotation frames can be divided into one or more non-overlapping sections in time order, so that each section can include one or more annotation frames.
In this document, the term "divided" means that each part may include one or more annotation frames that are temporally successive, any two parts (including any two parts that are temporally successive) do not include the same frame, and the union of all annotation frames in all parts is all annotation frames.
The present disclosure is not limited to the number of divided portions. For example, all annotation frames can be divided into one part, which can be effectively equivalent to no division. All annotation frames can also be divided into multiple parts. For example, all annotation frames can be divided into 2 parts, 3 parts, or any number of parts that do not overlap each other, any two parts being consecutive in time and not including the same annotation frame, depending on different aspects of the length of the video, the number of annotation frames, labor costs, efficiency requirements, accuracy requirements, etc.
In addition, the present disclosure is not limited to the number of annotation frames in each section. For example, one portion may include only one annotation frame, while another portion may include multiple annotation frames. In addition, each portion may include the same number or a different number of one or more annotation frames. In addition, where a portion includes multiple annotation frames, all of the annotation frames in the portion can be consecutive in time.
It should be appreciated that step S110 of the method according to an embodiment of the present disclosure is not limited to the particular order of labeling, partitioning, and assigning the identifications in the initial labeling described above.
As described above, in one embodiment, in step S110, each frame extracted from the video may be annotated, then an identifier may be assigned to each annotation object in each annotation frame, and then all annotation frames may be divided into one or more parts.
In another embodiment, in step S110, each frame extracted from the video may be labeled, all the labeled frames are divided into one or more parts, and then each label object in each labeled frame in each part is assigned with an identifier.
In another embodiment, in step S110, a plurality of frames extracted from the video may be divided into one or more non-overlapping portions, all frames in each portion are labeled respectively, and then each label object in the label frame in each portion is assigned with an identifier.
In another embodiment, in step S110, while labeling each object in each frame, a corresponding identifier may be assigned to the labeled object.
In the case of allocating identifiers after division, it is also possible to allocate different numbers to each part, and then allocate different identifiers to each annotation object in each part in combination with the number of each part, so that all annotation objects in all parts have different identifiers. For example, in the case of division into two parts, the numbers of the two parts may be set to "a" and "B", respectively, and for each annotation object in the part numbered "a", an identification in the form of "AXXXX" may be allocated, and for each annotation object in the part numbered "B", an identification in the form of "BXXXX" may be allocated, so that the identification of any annotation object in part a and the identification of any annotation object in part B are different from each other.
Assigning mutually different identifications to all annotation objects in all annotation frames can enable only the case where the same annotation object has different identifications to be considered in subsequent steps (e.g., merging processing described below) of the method according to the embodiment of the present disclosure, rather than the case where different annotation objects have the same annotation to be considered and processed, which can reduce the complexity of processing in the subsequent steps and is advantageous for ensuring the correctness of the processing result. Of course, in other embodiments, it may be ensured that only the identity of each annotation object is unique in the corresponding portion.
After obtaining annotation frames for one or more parts by the initial annotation of step S110, the method according to the embodiment of the present disclosure may sequentially perform steps S210, S220, and S230 for the current part in each obtained part, and further for the current annotation frame in each annotation frame in the current part.
As shown in FIG. 1, after the annotation frames of one or more parts are obtained by the initial annotation of step S110, the steps S210, S220 and S230 can be performed in turn for the current annotation frame in each annotation frame in the current part in each part.
In different embodiments, any one of the obtained one or more parts that have not been processed may be selected as the current part, each of the parts may be sequentially selected as the current part in time order, each of the parts may be treated as the current part, and each of the current parts may be treated in parallel.
For each current section, each annotation frame in the current section may be selected one by one in chronological order as a current annotation frame, and steps S210, S220, and S230 are sequentially performed with respect to the selected current annotation frame.
As shown in FIG. 1, the first annotation frame in the current portion may be selected as the current annotation frame, which may be earlier in time than any other annotation frame in the current portion, and steps S210, S220, and S230 are performed in sequence for the selected current annotation frame. In each subsequent cycle, selecting a next annotation frame of the current annotation frame in the previous cycle as the current annotation frame in the current cycle, and sequentially performing steps S210, S220, and S230 with respect to the current annotation frame in the current cycle.
After the processing of steps S210, S220, and S230 is completed for each annotation frame in the current portion, the processing for the current portion ends. In the case where each part is processed one by one, another not-yet-processed part may be selected and execution of steps S210, S220, and S230 may be started for each annotation frame in the selected another part.
As described earlier, the annotation frame in the current section may become a modification frame after the process of step S210, and may further become an adjustment frame after the process of step S220. Herein, "annotation frame", "modification frame" and "adjustment frame" are actually used to indicate a state of the same frame after being subjected to processing of different steps of the method according to the embodiment of the present disclosure, for example, the annotation frame may become a modification frame after being subjected to processing of step S210 (even though no modification is actually made in step S210), or to indicate a state of the same frame when being a processing object of different steps of the method according to the embodiment of the present disclosure, for example, the annotation frame may be referred to as a modification frame in being subjected to step S220.
In addition, for convenience of description, each annotation object in the current annotation frame may be referred to as an annotation object to be modified in step S210, regardless of whether the annotation object is actually modified in step S210, and may be referred to as an annotation object to be adjusted in step S220, regardless of whether the annotation object is actually adjusted in step S220.
In addition, since steps S210, S220, and S230 are performed in sequence for each annotation frame in the current portion, if the current frame (annotation frame or modification frame or adjustment frame) is not the first frame in the current portion (i.e., located at an intermediate position of the frame sequence of the current portion), all frames in the current portion that precede the current frame are already adjustment frames, and may be referred to herein as previous adjustment frames or processed annotation frames with respect to the current frame.
The identity change table, the identity deletion table, and the annotation target reference set of the current part are used in steps S210, S220, and S230.
The identified change table for the current portion may be used to record information regarding identified changes performed on all previous adjustment frames in the current portion prior to adjusting (step S220) the current annotation frame (or modification frame).
The identification change table may not include any information in the case where the current annotation frame is the first annotation frame in the current portion. For example, in the case that the current annotation frame is the first annotation frame in the current portion, the identification change table of the current portion may be initialized to an empty list before performing step S210 for the current annotation frame.
In the case where the identification change table is not an empty list, the identification change table may include one or more change entries, where each change entry indicates a change from one pre-change identification to one post-change identification.
The present disclosure is not limited to identifying a particular form of change table, nor identifying a particular form of information in a change table. For example, the identification change table may be implemented in a different manner, such as a hash table, a file, a database, etc., than in a list. In various embodiments, the change entry in the change table may take any form that clearly indicates a change from a pre-change identifier to a post-change identifier, such as a string of words, data structures, binary fields or files, such as "S- > E" or "(S, E)", where S may represent a pre-change identifier and E may represent a post-change identifier. For example, "4- > 6" or "(4, 6)" may indicate that the pre-alteration flag "4" is altered to the post-alteration flag "6".
The identification deletion table for the current portion may be used to record all identifications deleted from all previous adjustment frames in the current portion prior to adjusting (step S220) the current annotation frame (or modification frame).
Similarly to the case of the identification change table of the current part, the identification deletion table may not include any information in the case where the current tagged frame is the first tagged frame in the current part. For example, in the case that the current annotation frame is the first annotation frame in the current portion, the deletion table for the identification of the current portion may be initialized to an empty list before performing step S210 for the current annotation frame.
In the case where the identity deletion table is not an empty list, the identity deletion table may include one or more deletion entries, where each change entry indicates a deletion of an identity or an identity that is deleted.
The present disclosure is not limited to a particular form of identifying the deletion table, nor to a particular form of identifying the information in the deletion table. For example, the identification deletion table may be implemented in a different manner, such as in a hash table, a file, a database, etc., than in a list. In various embodiments, identifying a deleted entry in the deletion table may take any form that clearly indicates the deletion of an identity or an identity that is deleted. For example, if the tag deletion table for the current portion includes the deletion entry "7," this indicates that tag "7" was deleted from all previous alignment frames in the current portion.
The annotation target reference set for the current portion records all annotation objects and corresponding identifications in one or more previous adjustment frames in the current portion.
All previous adjustment frames in the current portion of the annotation target reference set and the current annotation frame are consecutive in time, and all previous adjustment frames in the current portion of the annotation target reference set temporally precede the current annotation frame. The number of previous adjustment frames in the annotation target reference set of the current portion may be less than or equal to a predetermined number, or all previous adjustment frames within a predetermined time period may be recorded in the annotation target reference set of the current portion.
In the case where the current annotation frame is the first annotation frame in the current portion, the annotation target reference set of the current portion may not include any information.
The annotated target reference set of the current portion may organize the internal information in a variety of different ways.
For example, each item in the annotation target reference set can correspond to a previous adjustment frame, and each item can include an image and an identification for each annotation object in the previous adjustment frame. In further examples, each item in the annotation target reference set can correspond to an annotation object, and each item can include information regarding one or more partial images corresponding to the annotation object, an identification of the annotation object, and a number of a previous adjustment frame that includes the annotation object.
Different organizations may involve different information redundancies, different query and update efficiencies. For example, in the case where each item in the annotation target reference set corresponds to one previous adjustment frame, redundant information can be included in the annotation target reference set, but the efficiency of performing an update operation (e.g., adding an item or deleting an item) in units of frames can be relatively high. In the case where each item in the annotation target reference set corresponds to an annotation object, less redundant information can be included in the annotation target reference set, the efficiency of performing a query on certain annotation target(s) from the annotation target reference set may be relatively high, but the efficiency of performing an update operation in frame units may be relatively low,
the present disclosure is not limited to a particular organization of the items in the annotation target reference set. The annotation target reference set according to embodiments of the present disclosure may also combine the recorded information in other ways, for example, each item in the annotation target reference set also corresponds to one of a plurality of targets to be annotated, and each item may include zero, one, or multiple sub-items, each sub-item may correspond to one annotation object corresponding to the target and may include an image, an identification of the annotation object, a number of a previous adjustment frame in which it is located, and so on.
In addition, the annotation target reference set can adopt different implementation modes. For example, the annotation target reference set can be implemented by a data pool or a database or a file system, or some or all of the items in the annotation target reference set can be displayed on a Graphical User Interface (GUI) or a certain area of the GUI.
In addition, interfaces for queries and updates may be provided for the annotated target reference set.
The input parameter of the query interface may be an image, for example, a partial image cut from an image of the current labeling frame, a partial image corresponding to a labeling target to be queried in the current labeling frame, or a partial image of an area framed by one labeling frame or a detection frame in the current labeling frame. For images received from the query interface, current information in the annotation target reference set can be queried and images and corresponding identifications of one or more possible or candidate annotation objects matching the received image are returned. In further examples, the input parameters of the query interface may also include the location of the partial image in the entire image (e.g., a coordinate or set of coordinates that can indicate the location) to further improve the efficiency and accuracy of the query.
In different embodiments, the matching of the query and/or image labeling the target reference set may be accomplished based on a variety of different methods such as a mean absolute difference algorithm, a sum of absolute differences algorithm, a sum of squared errors algorithm, a sum of squared average errors algorithm, a normalized product correlation algorithm, a sequential similarity detection algorithm, a Hadamard transform algorithm, a scale invariant feature transform matching algorithm, a finite element method, a wavelet and relative moment based shape feature extraction and matching method, and the like.
The input to the update interface may be the number of the adjustment frame and the update operation may include adding and deleting items. In the case of adding an item, all the annotation objects in the adjustment frame of the specified number may be added to the annotation target reference set. For example, an item corresponding to the adjustment frame with the specified number may be added to the annotation target reference set, and the images and corresponding identifications of all the annotation objects in the adjustment frame are recorded in the item. In the case of deleting an item, all annotation objects in the adjustment frame of the designated number may be deleted from the annotation target reference set. For example, an item corresponding to a specified number of adjustment frames may be deleted from the annotation target reference set.
In addition, the annotation target reference set can be configured to accommodate the relevant information in all the previous adjustment frames within a predetermined period of time or in up to a predetermined number of previous adjustment frames, depending on different aspects such as the capacity of the memory that can be used to store the annotation target reference set, the size of the GUI area that can be used to display some or all of the items in the annotation target reference set, the method of querying and updating the items in the annotation target reference set, and so on. In the case where the capacity of the annotation target reference set has reached the upper limit, the oldest or oldest item stored in the annotation target reference set may be deleted from the annotation target reference set before adding a new item to the annotation target reference set.
As described above, for the current annotation frame in the current portion, the method according to the embodiment of the present disclosure may first perform step S210 to modify the identifier of each annotation object to be modified in the current annotation frame according to the identifier change table and the identifier deletion table of the current portion.
In step S210, for the current to-be-modified markup object in each to-be-modified markup object, when the identifier change table of the current portion includes a change item, and the pre-change identifier indicated by the change item is the same as the identifier of the current to-be-modified markup object, the identifier of the current to-be-modified markup object may be changed to the post-change identifier indicated by the change item. In addition, in the case that the tag deletion table includes a deletion entry, and the tag indicated by the deletion entry is the same as the tag of the annotation object to be modified, the annotation frame and the tag of the annotation object to be modified can be deleted from the current annotation frame.
Then, after the modification processing in step S210 is performed on each to-be-modified annotation object in the current annotation frame according to the identity change table and the identity deletion table of the current portion, the current annotation frame may become or be referred to as a current modification frame, and the method according to the embodiment of the present disclosure may continue to step S220 for the current modification frame to adjust the identity of each to-be-adjusted annotation object in the current modification frame according to the plurality of targets to be annotated and the annotation target reference set of the current portion.
For the current object to be adjusted in each object to be adjusted, in step S220, in the case that the current labeling target reference set of the current portion includes a labeling object matched with the current labeling object to be adjusted and the identifier of the matched labeling object is different from the identifier of the current labeling object to be adjusted, the identifier of the current labeling object to be adjusted may be changed to the identifier of the matched labeling object. Then, a change record item can be generated by using the identifier of the current annotation object to be adjusted as a pre-change identifier and the identifier of the matched annotation object as a post-change identifier, and the generated change record item is added to the change record of the current modification frame.
In the case that the current object to be adjusted does not belong to multiple targets to be labeled, the labeling frame and the identifier of the current object to be adjusted may be deleted in step S220, and the identifier of the current object to be adjusted may be added as a deletion record item to the deletion record of the current modification frame.
In addition, in step S220, in the case where it is found that there is an unlabeled object in the current modification frame that actually belongs to the plurality of targets to be labeled but is not labeled in the previous step S110, the unlabeled object may be labeled, so as to obtain a new labeled object of the current modification frame.
For a new annotation object, the annotation target reference set can be queried to determine whether the new annotation object is annotated for the first time. In the event that it is determined that one or more annotation objects in the current portion of the annotation target reference set correspond to the same one of the plurality of targets to be annotated and match the new annotation object, the identity of the new annotation object may be assigned in accordance with the identity of any of the determined one or more annotation objects. In the case that it is determined that all the annotation objects in the annotation target reference set of the current portion do not match the new annotation object, a new identifier may be assigned to the new annotation object, and the assigned new identifier may be made different from the identifiers of all the annotation objects in the current portion.
After the adjustment of step S220, the current modified frame may become or be referred to as a current adjustment frame, and the method according to the embodiment of the present disclosure may continue to step S230 for the current adjustment frame to update the identity change table, the identity deletion table, and the annotation target reference set of the current portion for use by a next annotation frame of the current portion.
In step S230, in the case that the change record in step S220 includes one or more change record items, for a current change record item in each change record item in the change record, current information in the identification change table of the current portion may be checked.
If a changed item is found in the current part of the mark change table, and the mark after change indicated by the changed item is the same as the mark before change indicated by the current change record item, a new changed item can be generated by using the mark before change indicated by the changed item as the mark before change and using the mark after change indicated by the current change record item as the mark after change. And if the pre-change identification and the post-change identification indicated by the new change item are the same, deleting the found change item from the identification change table of the current part. If the pre-change identification and the post-change identification indicated by the new change item are different, the discovered change item can be replaced by the new change item.
In the case that the post-change identifier indicated by each change item in the identifier change table of the current portion is the same as the pre-change identifier indicated by the current change record item, the current change record item may be added to the identifier change table of the current portion.
In step S230, in the case that the deletion record in step S220 includes one or more deletion record items, the identification deletion of the current part may be updated by adding all deletion record items in the deletion record to the identification deletion table of the current part.
In addition, in step S230, the annotation target reference set of the current portion can also be updated by adding all the annotation objects and corresponding identifiers in the current adjustment frame to the annotation target reference set of the current portion.
As described above, the number of previous adjustment frames in the current portion of the annotation target reference set may be less than or equal to the predetermined number, or all previous adjustment frames within the predetermined time period may be recorded in the current portion of the annotation target reference set. In this embodiment, before all the annotation objects and corresponding identifications in the current adjustment frame are added to the annotation target reference set, it can be checked whether all the annotation objects and corresponding identifications in a predetermined number of previous adjustment frames are already included in the annotation target reference set of the current portion. In the case that the annotation target reference set already includes all the annotation objects and corresponding identifiers in a predetermined number of previous adjustment frames, the information related to the first previous adjustment frame in the current portion may be deleted from the annotation target reference set, and then all the annotation objects and corresponding identifiers in the current adjustment frame are added to the annotation target reference set; otherwise, all the annotation objects and corresponding identifiers in the current adjusting frame can be directly added to the annotation target reference set.
After step S230 is performed for the current adjustment frame, the method according to the embodiment of the present disclosure may detect whether there is still an unprocessed annotation frame after the current adjustment frame in the current portion. If so, the next annotation frame in the current portion can be retrieved, and the process returns to step S210 to continue the modification, adjustment, and update processes for the next annotation frame.
If all the annotation frames in the current portion have been processed, in the case where each of the plurality of portions (which may be all or one or more of all of the portions divided in step S110) is processed one by one in one processing procedure, the first annotation frame in another portion that has not been processed may be taken, and return to step S210 to continue execution of steps S210 to S230 of the method starting from the first annotation frame in the other portion.
As described above, in step S110, all the annotation frames can be divided into one part, which may be actually equivalent to no division. In this embodiment, after steps S210 to S230 have been performed for the last annotation frame, the method may end.
The method according to the embodiment of the present disclosure can perform automatic and intelligent modification by maintaining and using the state information in the labeling process, thereby greatly reducing the number of manual interventions. In addition, by using and maintaining the reference labeling target set, the target matching time can be obviously reduced, the labeling efficiency can be obviously improved, and the accuracy of the labeling task can be ensured.
In the case of division into a plurality of parts in step S110, for example, in the case of division into A, B and C in step S110, steps S210 to S230 may be performed on each annotation frame in part a, then steps S210 to S230 may be performed on each annotation frame in part B, and then steps S210 to S230 may be performed on each annotation frame in part C in one processing procedure (for example, one process or thread); portions A, B and C may also be processed in parallel in three different processes (e.g., three processes or three threads); two processes in parallel (e.g., two processes or two threads) may also be used, one processing portions a and B one by one, and the other processing portion C.
In the case of division into multiple parts in step S110, after the modification (step S210), adjustment (step S220), and update (step S230) are performed on each annotation frame in each part, merging of the adjustment results for all parts may be achieved by two-segment merging one or more times, so as to obtain a final annotation result.
For the previous part and the next part in any two parts that are successive in time, the annotation target reference set of the next part can be initialized according to all the annotation objects and the corresponding identifications in the predetermined number of adjustment frames in the previous part, wherein the predetermined number of adjustment frames in the previous part and the predetermined number of adjustment frames in the next part are successive in time. In addition, the latter part of the identification change table may be initialized to an empty list.
Then, for the current adjustment frame in each adjustment frame in the next portion, it may be referred to as a current frame to be corrected, and the identifier of the annotation object in the current frame to be corrected may be corrected according to the annotation target reference set and the identifier change table of the next portion.
And under the condition that the current frame to be modified belongs to the preset number of adjusting frames in the next part, further modifying the identifier of each to-be-further-modified labeling object in the current frame to be modified according to the identifier change table of the next part, wherein the current frame to be modified can become the current further-modified frame after further modification. Then, the identifier of each annotation object to be further adjusted in the current further modification frame can be further adjusted according to the reference set of the annotation target of the latter part, wherein the current further modification frame can become the current further modification frame after further adjustment. After further adjustment, the tag change table of the next part may be updated according to the change operation in the further adjustment, and all the tagged objects and corresponding tags in the current further adjustment frame are added to the tagged target reference set of the next part.
In the case that the current frame to be modified does not belong to the predetermined number of adjustment frames in the subsequent portion, the identifier of each annotation object to be further modified in the current frame to be modified may be further modified according to a reference identifier change table of the subsequent portion, where the reference identifier change table may be the identifier change table of the subsequent portion after all the adjustment frames in the predetermined number of adjustment frames in the subsequent portion are modified.
For example, for two parts a and B that are consecutive in time (part a being earlier in time than part B), the annotation target reference set for part B may be initialized with the last N adjustment frames in part a, and the identity change table for part B may be initialized as an empty list.
Then, for each of the first N frames in section B, the adjustment frame F may be started from the first adjustment frame F1Starting with the Nth alignment frame FNUp to this point, the aforementioned steps S210, S220 and S230 are sequentially performed.
Steps S220 and S230 in the merge phase do not modify or delete the annotation box, nor delete any annotation objects and their identities, and therefore do not involve the use and maintenance of identity delete tables and delete records.
Alignment frame F in processed part BNThereafter, the identification change table at this time of the part B may be referred to as a reference identification change table, and all of the remaining adjustment frames in the part B (i.e., from the adjustment F) may be adjusted according to the reference identification change table of the part BN+1Beginning with the last adjustment frame in part B) performs step S210.
Thus, the labeling results for parts a and B can be put together. Then, a merging process similar to the above merging may be performed a plurality of times to merge all the parts together.
The method according to this embodiment can support segment annotation of video. By improving the parallelism of the labeling tasks, the efficiency of the labeling tasks can be improved, and the time consumption of the labeling tasks is reduced. In addition, in the segmentation labeling, the number of frames in each portion is smaller relative to the total number of all frames. Compared with the method for processing all frames simultaneously, the method for labeling the segments can consider the correlation among the frames and/or the correlation among objects in the frames less, so that the complexity of a labeling task can be reduced, the accuracy of the labeling task is ensured, and the efficiency of the labeling task can be improved.
An example of performing processing for one section using the method according to the present disclosure is described below in conjunction with fig. 2 to 6.
FIG. 2 shows an example of annotation frames in a portion, which are sequential in time, F210, F220, and F230, in accordance with an embodiment of the disclosure. The annotation frame F210 includes annotation objects LO1 through LO6, identified by "1" through "6", respectively. For example, "LO 1 ═ 1" in the annotation frame F210 indicates that the identification of the annotation object LO1 marked in step S110 is "1". The annotation frame F220 likewise includes annotation objects LO1 through LO 6. However, unlike the case in the annotation frame F210, the identifications of the annotation objects LO4 and LO6 in the annotation frame F220 are "6" and "4", respectively, which indicates at least that the annotation objects LO4 and LO6 in the annotation frames F210 and F220 may be assigned inappropriate identifications. The annotation frame F230 likewise comprises 6 annotation objects LO1 to LO6, and also annotation objects LO7 and LO8, the annotation objects LO7 and LO8 being identified by "7" and "8", respectively.
As shown in fig. 3, the method according to the embodiment of the present disclosure first selects the annotation frame F210 that is earliest in time from among the annotation frames F210, F220, and F230 in the section to perform steps S210 to S230.
As described above, the identity change table, the identity deletion table, and the reference annotation target set of the current portion may be initialized before step S210 is performed for the first frame F210. For example, the identity change table and identity deletion table associated with the current part may be initialized to an empty list, represented in FIG. 3 as "CHG: < >" and "DEL: < >, respectively.
As shown in fig. 3, the currently partially associated identification change table CHG and the identification deletion table DEL are both currently empty lists, and thus the modification of the annotation frame F210 in step S210 may not include any actual modification operation.
The current annotation frame F210 may then be referred to as the current modified frame F210, and the method may continue to step S220.
In step S220, it may be checked or detected whether the annotation box or the detection box in the current modification frame F210 frames a desired object, whether the size of the annotation box or the detection box in the current modification frame F210 is proper, whether a plurality of annotation objects in the annotation objects LO1 through LO6 in the current modification frame F210 substantially correspond to the same one of the targets to be annotated, whether one or more annotation objects in the annotation objects LO1 through LO6 in the current modification frame F210 all belong to the targets to be annotated, whether there are one or more additional objects in the current modification frame F210 besides the annotation objects LO1 through LO6, whether the identification assigned to each annotation object in the current modification frame F210 is correct or proper, and so on.
In the example of fig. 3, it is determined in step S220 that the identity of the annotation object in the current modified frame F210 does not need to be adjusted, so the method can continue to step S230.
In step S230, since no actual adjustment is made to the identification of the annotation object in F210 in step S220, the identification change table CHG and the identification deletion table DEL associated with the current part may remain as empty lists. In addition, all annotation objects LO1 through LO6 in F210 may be added (e.g., as an item) to the reference annotation target set associated with the current section.
Then, the next annotation frame F220 of F210 can be retrieved, and the process returns to step S210 to continue the processing for the annotation frame F220.
As shown in fig. 4, the identity change table CHG and the identity deletion table DEL associated with the current part are still empty lists, so that no modification is required to the identities of the annotation objects LO1 through LO6 in the current annotation frame F220 in step S210.
Then, for the modified frame F220, the method may continue to step S220.
In the example of fig. 4, the reference annotation target set currently includes all annotation objects LO1 through LO6 and corresponding identifications in the previously added adjustment frame F210. With respect to the annotation object LO4 in the current modified frame F220, the query result for the set of reference annotation targets can include, for example, the annotation object LO4 and the corresponding identification "4" in the adjustment frame F210. Thus, it can be determined that the identification of the annotation object LO4 in the current modified frame F220 needs to be adjusted from "6" to "4". Accordingly, the identification of the annotation object LO4 in the current modification frame F220 may be adjusted from "6" to "4" in step S220 (see annotation object LO4 in the adjustment frame F220' in the example of fig. 4). Meanwhile, the alteration operation "6- > 4" may be added to an alteration record C _ CHG associated with the current modification frame F220 or the current adjustment frame F220' of the current portion, for example, "C _ CHG:6- > 4" in FIG. 4.
Similarly, in step S220, it is also determined that the identity of the annotation object LO4 in the current modified frame F220 needs to be adjusted from 6 to 4. Accordingly, the identity of the annotation object LO6 in the current modified frame F220 may be adjusted from 4 to 6 in step S220 (see annotation object LO6 in adjustment frame F220' in the example of fig. 4). Meanwhile, this change operation "4- > 6" may be added to the change record C _ CHG, for example, "C _ CHG:6- >4,4- > 6" in FIG. 4.
In addition, in the example of fig. 4, it is also checked or detected in step S220 that the object LO7 actually belongs to multiple targets to be annotated but was not annotated in the previous step S110. The object LO7 may then be tagged with a tagging or detection box and a partial image corresponding to the tagged object LO7 (e.g., the partial image framed with the tagging or detection box) is provided to the query interface of the set of reference tagging targets associated with the current portion to determine whether the tagged object LO7 is a new tagged object, i.e., is present in the previous adjustment frame of the current portion.
In the example of fig. 4, the current reference annotation target set includes all annotation objects LO1 through LO6 (see fig. 3) in the previously added adjustment frame F210. Accordingly, with respect to the annotation object LO7, it can be determined that there is no image that can match the image of the annotation object LO7 in the current reference annotation target set. Thus, the query result for the current set of reference annotation targets may be empty, i.e., may not include any images and corresponding identifications of candidate annotation objects. Thus, the annotation object LO7 can be determined to be a new annotation object, and can be assigned a unique identification of "1001" (see LO7 in the current adjustment frame F220' in FIG. 4). In a further embodiment, in the case that the annotation object LO7 is determined to be a new annotation object, the query result for the current reference annotation target set may also be the unique identifier "1001" assigned to the new annotation object.
In addition, in the example of fig. 4, annotation objects that need to be deleted are not found or detected in step S220, that is, the annotation objects LO1 to LO6 in F220 all belong to a plurality of objects to be substantially annotated. Thus, the adjustment to F220 may not include deletion. Accordingly, the deleted record associated with F220 or F220' may be empty (shown as "C _ DEL: < >" in FIG. 4).
Through the processing of step S220, a current adjustment frame F220' of the frame F220 can be obtained (see fig. 4), which includes annotation objects LO1 to LO7, and information in the change record C _ CHG and the deletion record C _ DEL associated therewith is "6- >4,4- > 6" and null, respectively.
The method may then continue to step S230 to update the identity change table, the identity deletion table, and the reference annotation target set associated with the current portion.
For example, in the example of fig. 4, since the current tag change table CHG is empty, all items "6- > 4" and "4- > 6" in the change record C _ CHG may be added to the current tag change table CHG, thereby obtaining an updated tag change table "CHG: 6-4,4- > 6". Since both the current tag delete table DEL and the delete record C _ DEL are empty, the updated tag delete table DEL is still empty. In addition, all the annotation objects LO1 to LO7 and their corresponding identities in the adjustment frame F220 'may be added to the reference annotation target set PL, so that the updated reference annotation target set may include all the identity objects LO1 to LO6 in the previously added adjustment frame F210 and all the identity objects LO1 to LO7 in the adjustment frame F220' that is added this time.
In a further example, if the reference annotation target set is configured to accommodate information of all annotation objects in only 1 adaptation frame, then in step S230 of the example of fig. 4, the updating of the reference annotation target set PL may include deleting the item corresponding to the adaptation frame F210 previously added to the reference annotation target set PL from the reference annotation target set PL, and then adding the relevant information of all annotation objects LO1 to LO7 in the adaptation frame F220' to the reference annotation target set PL. In a further example, if the reference annotation target set is configured to accommodate information of all annotation objects in up to 10 adjustment frames, information on all annotation objects LO1 to LO7 in the adjustment frame F220' may be directly added to the reference annotation target set PL.
The method can then return to S210 to continue processing the next annotation frame F230 in the current portion that has not yet been processed.
At the beginning of processing the annotation frame F230, the information in the tag change table CHG and the tag deletion table DEL associated with the current portion is "6- >4,4- > 6" and null, respectively. Therefore, in step S210, the identification of the annotation object in the current annotation frame F220 can be modified according to the identification change table CHG and the identification deletion table DEL associated with the current part.
As shown in fig. 5, according to the modification items "6- > 4" and "4- > 6" in the identity modification table CHG, the identity of the annotation object LO6 in the current annotation frame F230 is modified from 6 to 4, and the identity of the annotation object LO4 is modified from 4 to 6. Meanwhile, since the tag deletion table DEL is empty, tags of the objects not tagged in the current tag frame F230 are deleted. Thus, a current modified frame F230' of the current portion is obtained.
Then, for the current modified frame F230', the method may continue to step S220. In step S220, the corresponding partial image of each annotation object (each of LO1 through LO 8) in the current modified frame F230' can be provided to the query interface referencing the annotation target set, respectively.
In the example of fig. 5, it is assumed that the reference annotation target set currently includes the previously added related information of all the annotation objects LO1 through LO6 in the adjustment frame F210 in fig. 3 and the related information of all the annotation objects LO1 through LO7 in the adjustment frame F220' in fig. 4.
With respect to the annotation object LO4 in the modification frame F230 ', the query result for the reference annotation object set may include a partial image and a corresponding identification "4" of the annotation object LO4 in the adjustment frame F210 and a partial image and a corresponding identification "4" of the annotation object LO4 in the adjustment frame F220'. Thus, it may be determined that the identity of the annotation object LO4 in the current modified frame F230' needs to be adjusted from 6 to 4. Thus, in step S220, the identity of the annotation object LO4 in the current modification frame F230 'may be adjusted from 6 to 4 (see annotation object LO4 in adjustment frame F230 "in the example of FIG. 5), and this change operation" 6- >4 "may be added to the change record C _ CHG associated with the current portion of the current modification frame F230' or current adjustment frame F230", e.g., "C _ CHG:6- > 4" in FIG. 5.
With respect to the annotation object LO6 in the modified frame F230 ', the query result for the reference annotation object set may include a partial image and corresponding identification "6" of the annotation object LO6 in the adjustment frame F210 and a partial image and corresponding identification "6" of the annotation object LO6 in the adjustment frame F220'. Accordingly, in step S220, the identity of the annotation object LO6 in the current modification frame F230' can be adjusted from 4 to 6 (see annotation object LO6 in adjustment frame F230 "in the example of FIG. 5), and this change operation" 4- >6 "can be added to the change record C _ CHG, e.g.," C _ CHG:4- >6,6- >4 "in FIG. 5.
With respect to the annotation object LO7 in the modified frame F230 ', the query result for the set of reference annotation targets may include a partial image of the annotation object LO7 in the adjusted frame F220' and the corresponding identification "1001". Accordingly, in step S220, the identity of the annotation object LO7 in the current modification frame F230' can be adjusted from 7 to 1001 (see annotation object LO7 in adjustment frame F230 "in the example of FIG. 5), and this change operation" 7- >1001 "can be added to the change record C _ CHG, e.g.," C _ CHG:4- >6,6- >4,7- >1001 "is obtained.
In addition, in the example of fig. 5, it is also found or detected in step S220 that the annotation object LO8 (which is identified as 8) does not actually belong to the plurality of objects to be annotated. Thus, for example, the annotation box or detection box and the associated annotation information (including the previously assigned identification) of the annotation object LO8 may be deleted from the current modification frame F230 ', and the identification "8" may be added to the deletion record C _ DEL associated with the current modification frame F230' or the current adjustment frame F230 ", thereby obtaining" C _ DEL:8 ".
Through the processing of step S220, the current adjustment frame F230 "(see fig. 5) of the frame F230' can be obtained, which includes annotation objects LO1 to LO7, and the contents of the change record C _ CHG and the deletion record C _ DEL associated therewith are" 4- >6,6- >4,7- >1001 "and" 8 ", respectively.
The method may then continue to step S230 to update the identity change table, the identity deletion table, and the reference annotation target set associated with the current portion.
In the example of fig. 4, for the item "4- > 6" in the change record C _ CHG, since the change item "6- > 4" is included in the current tag change table CHG, the change item "6- > 4" can be deleted from the tag change table CHG. As for the item "6- > 4" in the change record C _ CHG, since the change item "4- > 6" is included in the current flag change table CHG, the change item "4- > 6" can be deleted from the flag change table CHG. For the item "7- > 1001" in the change record C _ CHG, it may be added to the identification change table CHG. Thus, the updated flag change table "CHG: 7- > 1001" can be obtained. For the delete record C _ DEL, item "8" in C _ DEL can be added to the current identify delete table DEL to obtain an updated identify delete table "DEL: 8".
In the example of fig. 5, the relevant information of all the annotation objects LO1 to LO7 in the adjustment frame F230 ″ can be added to the reference annotation target set PL. In a further example, if the reference annotation target set is configured to accommodate information of only all annotation objects in 2 adjustment frames, the item corresponding to the adjustment frame F210 previously added to the reference annotation target set PL may be deleted from the reference annotation target set PL, and then the relevant information of all annotation objects LO1 to LO7 in the adjustment frame F230 ″ may be added to the reference annotation target set PL, so that the updated reference annotation target set includes the item corresponding to the adjustment frame F220' previously added to the reference annotation target set PL and the item corresponding to the adjustment frame F230 ″ this time added to the reference annotation target set PL.
As shown in fig. 6, after performing steps S210 (modification), S220 (adjustment), and S230 (update) for each annotation frame F210, F220, and F230 in the current portion in turn, the adjusted frame sequences F210, F220', and F230 "of the current portion can be obtained, and the information in the identity change table CHG and the identity deletion table DEL associated with the current portion is" 7- >1001 "and" 8 ", respectively.
Fig. 7 shows an example of merging labeling results of two parts a and B consecutive one after another according to an embodiment of the present disclosure. In the example of FIG. 7, part A is earlier than part B, and part A includes an adjustment frame FA1To FBM+NPart B comprises a regulation frame FB1To FBN+K
As shown in FIG. 7, the last N adjustment frames FA in part A may be used firstM+1To FAM+NTo initialize the set of referencing targets PLB for part B. Then, for the first N adjustment frames FB in the part B1To FBNSteps S210, S220, and S230 are sequentially performed. As previously described, the modification of the callout box and the deletion of the identification are not deleted during the merge process. As shown in fig. 7, the identification change table CHG may be used in the merging process without including the identification deletion table. The last adjusting frame FB in the N adjusting frames before the processing is finishedNThereafter, the flag change table CHG (FB) at this time can be usedN) For the remaining adjustment frames FB in part BN+1To FBN+KEach performs step S210. Thus, merging of the labeling results for parts a and B can be achieved.
Fig. 8 and 9 illustrate example apparatuses that may be used to implement methods in accordance with embodiments of the disclosure.
In the example of fig. 8, the apparatus may comprise one or more processors PROC. The processor PROC may be any form of processing unit having data processing capabilities and/or instruction execution capabilities, such as a general purpose Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or a dedicated processor or accelerator, etc. For example, the processor PROC may be configured to perform a method of tracking annotation of multiple targets in a video according to an embodiment of the present disclosure. The processor PROC may furthermore be connected to the memory MEM and to the input/output interface I/O via a bus system and/or other forms of connection means (not shown) and may control other components in the apparatus to perform the desired functions.
The memory MEM may include various forms of computer readable and writable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, Random Access Memory (RAM), cache memory (or the like). The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. The readable and writable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. In an embodiment, the memory MEM may comprise program instructions for instructing the apparatus or a processor PROC in the apparatus to perform a method according to the disclosed embodiments.
The I/O interface may be used to provide instructions, parameters or data to the processor PROC and to output result data processed by the processor PROC.
In the example shown in fig. 9, the apparatus may include an annotator ANT and an adjuster TUN.
The annotator can be configured to obtain annotation frames of one or more parts by initially annotating a plurality of temporally successive frames extracted from the video, wherein each part comprises one or more annotation frames and each annotation object in each annotation frame is assigned a respective identification.
The annotator ANT may comprise one or more processors, such as a general purpose processor or a graphics processor or a special purpose processor. For example, in the case of implementing an algorithm for detecting and labeling an image based on a convolutional neural network, the labeling device ANT may be implemented as a dedicated artificial intelligence processing chip, which may include a plurality of multipliers and adders and/or an on-chip memory.
The adjuster TUN may be configured to receive one or more parts from the annotator ANT, and modify, for a current annotation frame in each annotation frame in a current part of each part, an identification of each annotation object to be modified in the current annotation frame according to an identification alteration table and an identification deletion table of the current part, the current annotation frame becoming a current modified frame of the current part after modification. The adjuster TUN may be further configured to adjust the identity of each annotation object to be adjusted in the current modification frame, which becomes the current adjustment frame of the current portion after the adjustment, according to the plurality of targets and the annotation target reference set of the current portion. The adjuster TUN may be further configured to update the identity change table, the identity deletion table, and the annotation target reference set after the adjustment.
In the case that the adjuster TUN receives a plurality of parts from the annotator ANT, the adjuster TUN may be further configured to initialize the annotation target reference set of the subsequent part according to all annotation objects and corresponding identifications in a predetermined number of adjustment frames in the previous part and the predetermined number of adjustment frames in the subsequent part for the previous part and the subsequent part in any two parts consecutive in time after performing the modifying, adjusting and updating on each annotation frame in each part. The adjuster TUN may also be configured to initialize the latter part of the identification change table as an empty list. The adjuster TUN may be further configured to, for a current adjustment frame in each adjustment frame in the subsequent portion, take the current adjustment frame in each adjustment frame in the subsequent portion as a current frame to be corrected, and correct the identifier of the tagged object in the current frame to be corrected according to the tagged target reference set and the tag change table of the subsequent portion.
In one embodiment, the adjuster TUN may include one or more processors, such as a general purpose processor or a graphics processor or a special purpose processor. For example, in case of a convolutional neural network based implementation of the matching of the images, the adjuster TUN may comprise a dedicated artificial intelligence processing chip, which may comprise a plurality of multipliers and adders and/or on-chip memories.
Additionally, in the example apparatus of FIG. 9, input/output component I/O may also be included. Input/output component I/O in the example of fig. 9 may include output devices such as a display, printer, and may also include input devices such as a keyboard, mouse, joystick, buttons, and the like.
The devices or systems shown in fig. 8 and 9 are exemplary only, and not limiting. An apparatus or system according to embodiments of the present disclosure may have other components and/or structures and/or implementations as desired.
Unless the context clearly requires otherwise, the words "comprise", "comprising", and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense, that is, in a sense of "including but not limited to". Additionally, the words "herein," "above," "below," and words of similar import, when used in this application, shall refer to this application as a whole and not to any particular portions of this application. Where the context permits, words in the above description using the singular or plural number may also include the plural or singular number respectively. With respect to the word "or" when referring to a list of two or more items, the word covers all of the following interpretations of the word: any of the items in the list, all of the items in the list, and any combination of the items in the list.
The above detailed description of embodiments of the disclosure is not intended to be exhaustive or to limit the disclosure to the precise form disclosed above. For example, while processes or blocks are presented in a given order, alternative embodiments may perform processes having the steps or employ systems having the blocks in a different order, and some processes or blocks may be deleted, moved, added, subdivided, combined, and/or modified. Each of these processes or blocks may be implemented in a variety of different ways. In addition, while processes or blocks are sometimes shown as being performed in series, these processes or blocks may instead be performed in parallel, or may be performed at different times.

Claims (20)

1. A method for tracking and labeling a plurality of targets in a video comprises the following steps:
obtaining one or more parts of annotation frames by initially annotating a plurality of frames extracted from the video which are successive in time, each part comprising one or more annotation frames, each annotation object in each annotation frame being assigned a corresponding identifier; and
for a current one of the annotation frames in a current one of the each portions,
modifying the identifier of each to-be-modified tagged object in the current tagged frame according to the identifier change table and the identifier deletion table of the current portion, wherein the current tagged frame becomes the current modified frame of the current portion after the modification,
adjusting the identification of each annotation object to be adjusted in the current modification frame according to the multiple targets and the annotation target reference set of the current part, the current modification frame becomes the current adjustment frame of the current part after the adjustment, and
updating the identity change table, the identity deletion table and the annotation target reference set after the adjustment,
wherein, in case that the current annotation frame is not the first annotation frame in the current portion, the annotation target reference set records all annotation objects and corresponding identifications in one or more previous adjustment frames in the current portion.
2. The method of claim 1, wherein,
in the case that the current annotation frame is the first annotation frame in the current portion, the tag alteration table and the tag deletion table are empty lists,
in the case where the identity change table is not an empty list, the identity change table includes one or more change entries, each change entry indicating a change from a pre-change identity to a post-change identity,
in the case where the tag deletion table is not an empty list, the tag deletion table includes one or more deletion entries, each deletion entry indicating deletion of a tag, an
In the case where the current annotation frame is not the oldest annotation frame, the one or more previous adjustment frames and the current annotation frame are consecutive in time, the one or more previous adjustment frames precede the current annotation frame, and the number of the one or more previous adjustment frames is less than or equal to a predetermined number.
3. The method of claim 2, wherein modifying the identity of each annotation object to be modified in the current annotation frame comprises: for the current to-be-modified annotation object in each to-be-modified annotation object, in the case that the identifier change table includes a first change item and the identifier before change indicated by the first change item is the same as the identifier of the current to-be-modified annotation object,
and changing the identifier of the current to-be-modified labeled object into the changed identifier indicated by the first change item.
4. The method of claim 2, wherein modifying the identity of each annotation object to be modified in the current annotation frame comprises: for the current to-be-modified annotation object in each to-be-modified annotation object, in the case that the identifier deletion table includes a first deletion item and the identifier indicated by the first deletion item is the same as the identifier of the to-be-modified annotation object,
and deleting the marking frame and the identification of the marking object to be modified currently.
5. The method of claim 2, wherein adjusting the identity of each to-be-adjusted annotation object in the current modified frame comprises: for the current object to be adjusted in each object to be adjusted, in the case that the annotation target reference set comprises a first annotation object matched with the current annotation object to be adjusted and the identifier of the current annotation object to be adjusted is different from the identifier of the first annotation object,
changing the identification of the current annotation object to be adjusted into the identification of the first annotation object,
generating a change record item by using the identifier of the current to-be-adjusted labeled object as a pre-change identifier and the identifier of the first labeled object as a post-change identifier, an
Adding the generated change record entry to the change record of the current modification frame.
6. The method of claim 5, wherein updating the change table with the identity comprises: for each of the change record items in the change record,
generating a second change item when the mark change table comprises a first change item, and the mark after change indicated by the first change item is the same as the mark before change indicated by the current change record item, wherein the mark before change and the mark after change indicated by the second change item are respectively the mark before change indicated by the first change item and the mark after change indicated by the current change record item,
deleting the first change item from the identification change table in a case where the pre-change identification and the post-change identification indicated by the second change item are the same,
in the event that the pre-change indication and the post-change indication indicated by the second change item are different, replacing the first change item with the second change item, an
And under the condition that the post-change identifier indicated by each change item in the identifier change table is different from the pre-change identifier indicated by the current change record item, adding the current change record item into the identifier change table.
7. The method of claim 2, wherein adjusting the identity of each to-be-adjusted annotation object in the current modified frame comprises: for the current object to be adjusted in each object to be adjusted, in the case that the current object to be adjusted does not belong to the plurality of targets,
deleting the marking frame and the identification of the current object to be adjusted, and
and adding the identifier of the current object to be adjusted into a deletion record of the current modification frame as a deletion record item.
8. The method of claim 7, wherein updating the identity deletion table comprises:
and adding all deleted record entries in the deleted record into the identification deleted table.
9. The method of claim 2, wherein adjusting the identity of each to-be-adjusted annotation object in the current modified frame comprises:
labeling the unlabeled objects in the current modification frame to obtain new labeled objects of the current modification frame, wherein the unlabeled objects belong to the multiple targets and are not labeled during the initial labeling;
in the case that the annotation target reference set comprises one or more annotation objects, and the one or more annotation objects correspond to the same one of the plurality of targets and match the new annotation object, assigning an identity of the new annotation object in accordance with an identity of any one of the one or more annotation objects; and
and under the condition that all the labeled objects in the labeled target reference set are not matched with the new labeled object, allocating a new identifier to the new labeled object, wherein the new identifier is different from the identifiers of all the labeled objects in the current part.
10. The method of claim 2, wherein updating the annotation target reference set comprises:
and adding all the labeled objects and corresponding identifications in the current adjusting frame into the labeled target reference set.
11. The method of claim 10, wherein updating the annotation target reference set further comprises: in the case that the annotation target reference set already includes all the annotation objects and corresponding identifications in the predetermined number of previous adjustment frames before all the annotation objects and corresponding identifications in the current adjustment frame are added into the annotation target reference set,
deleting information related to a first previous adjustment frame in the current portion from the annotation target reference set.
12. The method of claim 2, further comprising: in the case where the one or more parts are a plurality of parts, after the modifying, the adjusting, and the updating are performed for each annotation frame in each part, for a preceding part and a succeeding part in any two parts that are successive in time,
initializing the annotation target reference set of the subsequent portion in accordance with all annotation objects and corresponding identifications in a predetermined number of adjustment frames in the previous portion, the predetermined number of adjustment frames in the previous portion and the predetermined number of adjustment frames in the subsequent portion being consecutive in time,
initializing the latter part of the table of identity changes to an empty list, an
And for the current adjusting frame in each adjusting frame in the next part, taking the current adjusting frame in each adjusting frame in the next part as the current frame to be corrected, and correcting the mark of the mark object in the current frame to be corrected according to the mark target reference set and the mark change table of the next part.
13. The method of claim 12, wherein modifying the identity of the annotation object in the current frame to be modified comprises: in the case where the current frame to be modified belongs to the predetermined number of adjustment frames in the latter part,
further modifying the identifier of each to-be-further-modified annotation object in the current to-be-modified frame according to the latter part of identifier change table, wherein the current to-be-modified frame becomes a current further-modified frame after the further modification,
further adjusting the identification of each annotation object to be further adjusted in the current further modified frame according to the latter part of the annotation target reference set, the current further modified frame becoming the current further adjusted frame after the further adjustment, an
After the further adjustment, updating the identification change table of the next part according to the change operation in the further adjustment, and adding all the annotation objects and corresponding identifications in the current further adjustment frame to the annotation target reference set of the next part.
14. The method of claim 13, wherein modifying the identity of the annotation object in the current frame to be modified comprises: in the case where the current frame to be modified does not belong to the predetermined number of modification frames in the latter part,
and further modifying the identifier of each to-be-further-modified annotation object in the current to-be-modified frame according to the reference identifier change table of the latter part, wherein the reference identifier change table is the identifier change table of the latter part after all the adjusting frames in the predetermined number of adjusting frames in the latter part are modified.
15. The method of any of claims 1 to 14, wherein where the one or more parts are multiple parts, the identities of any two annotation objects from any two different parts of the annotation frame respectively are different from each other.
16. An apparatus for tracking and annotating a plurality of targets in a video, comprising:
a annotator configured to obtain annotation frames of one or more parts by initially annotating a plurality of temporally successive frames extracted from the video, each part comprising one or more annotation frames, each annotation object in each annotation frame being assigned a respective identification; and
a regulator configured to
For a current part in the each part and for a current annotation frame in each annotation frame in the current part, modifying an identifier of each annotation object to be modified in the current annotation frame according to an identifier change table and an identifier deletion table of the current part, the current annotation frame becoming a current modification frame of the current part after the modification,
adjusting an identification of each annotation object to be adjusted in the current modification frame that becomes a current adjustment frame of the current portion after the adjustment, according to the plurality of targets and the annotation target reference set of the current portion, an
Updating the identity change table, the identity deletion table and the annotation target reference set after the adjustment,
wherein, in case that the current annotation frame is not the first annotation frame in the current portion, the annotation target reference set records all annotation objects and corresponding identifications in one or more previous adjustment frames in the current portion.
17. The apparatus of claim 16, wherein the adjuster is further configured to
Labeling unlabeled objects in the current modification frame to obtain new labeled objects of the current modification frame, the unlabeled objects belonging to the plurality of targets but not labeled at the time of the initial labeling, and
and distributing the identification of the new labeling object according to the labeling target reference set.
18. The apparatus according to claim 16 or 17, wherein the adjuster is further configured to, in case the one or more parts are a plurality of parts, after performing the modifying, the adjusting and the updating for each annotation frame in each part, for a previous part and a subsequent part of any two parts consecutive in time,
initializing the annotation target reference set of the subsequent portion in accordance with all annotation objects and corresponding identifications in a predetermined number of adjustment frames in the previous portion, the predetermined number of adjustment frames in the previous portion and the predetermined number of adjustment frames in the subsequent portion being consecutive in time,
initializing the latter part of the table of identity changes to an empty list, an
And for the current adjusting frame in each adjusting frame in the next part, taking the current adjusting frame in each adjusting frame in the next part as the current frame to be corrected, and correcting the mark of the mark object in the current frame to be corrected according to the mark target reference set and the mark change table of the next part.
19. An apparatus for tracking and annotating a plurality of targets in a video, comprising:
one or more processors configured to perform the method of any one of claims 1 to 15.
20. A non-transitory storage medium having stored thereon program instructions that, when executed, instruct one or more processors to perform the method of any one of claims 1-15.
CN201810198882.5A 2018-03-12 2018-03-12 Method and device for tracking and labeling multiple targets in video Active CN108491774B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810198882.5A CN108491774B (en) 2018-03-12 2018-03-12 Method and device for tracking and labeling multiple targets in video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810198882.5A CN108491774B (en) 2018-03-12 2018-03-12 Method and device for tracking and labeling multiple targets in video

Publications (2)

Publication Number Publication Date
CN108491774A CN108491774A (en) 2018-09-04
CN108491774B true CN108491774B (en) 2020-06-26

Family

ID=63338340

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810198882.5A Active CN108491774B (en) 2018-03-12 2018-03-12 Method and device for tracking and labeling multiple targets in video

Country Status (1)

Country Link
CN (1) CN108491774B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325463A (en) * 2018-10-16 2019-02-12 浙江中正智能科技有限公司 A kind of real time face tracking method
CN112507760A (en) * 2019-09-16 2021-03-16 杭州海康威视数字技术股份有限公司 Method, device and equipment for detecting violent sorting behavior
CN113127666B (en) * 2020-01-15 2022-06-24 魔门塔(苏州)科技有限公司 Continuous frame data labeling system, method and device
CN111479119A (en) * 2020-04-01 2020-07-31 腾讯科技(成都)有限公司 Method, device and system for collecting feedback information in live broadcast and storage medium
CN111860302B (en) * 2020-07-17 2024-03-01 北京百度网讯科技有限公司 Image labeling method and device, electronic equipment and storage medium
CN111860305B (en) * 2020-07-17 2023-08-01 北京百度网讯科技有限公司 Image labeling method and device, electronic equipment and storage medium

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1159272A (en) * 1994-07-25 1997-09-10 麦克罗维西恩公司 Apparatus and method for comprehensive copy protection for video platofmrs and unprotected source material
CN1295690A (en) * 1999-01-28 2001-05-16 皇家菲利浦电子有限公司 System and method for analyzing video content using detected text in video frames
US6285775B1 (en) * 1998-10-01 2001-09-04 The Trustees Of The University Of Princeton Watermarking scheme for image authentication
CN1692609A (en) * 2002-08-22 2005-11-02 日本电气株式会社 Frame transfer method and node in ethernet(R)
CN101479742A (en) * 2006-05-17 2009-07-08 美国唯美安视国际有限公司 Efficient application of video marking technologies
CN101894374A (en) * 2009-03-31 2010-11-24 索尼株式会社 The method and apparatus that is used for target following
CN101930779A (en) * 2010-07-29 2010-12-29 华为终端有限公司 Video commenting method and video player
CN102982076A (en) * 2012-10-30 2013-03-20 新华通讯社 Multi-dimensionality content labeling method based on semanteme label database
CN103688272A (en) * 2011-03-03 2014-03-26 赛弗有限责任公司 System for autononous detection and separation of common elements within data, and methods and devices associated therewith
CN106650705A (en) * 2017-01-17 2017-05-10 深圳地平线机器人科技有限公司 Region labeling method and device, as well as electronic equipment
CN107564004A (en) * 2017-09-21 2018-01-09 杭州电子科技大学 It is a kind of that video labeling method is distorted based on computer auxiliary tracking
CN107615766A (en) * 2015-04-16 2018-01-19 维斯克体育科技有限公司 System and method for creating and distributing content of multimedia

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6014137A (en) * 1996-02-27 2000-01-11 Multimedia Adventures Electronic kiosk authoring system
US8537219B2 (en) * 2009-03-19 2013-09-17 International Business Machines Corporation Identifying spatial locations of events within video image data

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1159272A (en) * 1994-07-25 1997-09-10 麦克罗维西恩公司 Apparatus and method for comprehensive copy protection for video platofmrs and unprotected source material
US6285775B1 (en) * 1998-10-01 2001-09-04 The Trustees Of The University Of Princeton Watermarking scheme for image authentication
CN1295690A (en) * 1999-01-28 2001-05-16 皇家菲利浦电子有限公司 System and method for analyzing video content using detected text in video frames
CN1692609A (en) * 2002-08-22 2005-11-02 日本电气株式会社 Frame transfer method and node in ethernet(R)
CN101479742A (en) * 2006-05-17 2009-07-08 美国唯美安视国际有限公司 Efficient application of video marking technologies
CN101894374A (en) * 2009-03-31 2010-11-24 索尼株式会社 The method and apparatus that is used for target following
CN101930779A (en) * 2010-07-29 2010-12-29 华为终端有限公司 Video commenting method and video player
CN103688272A (en) * 2011-03-03 2014-03-26 赛弗有限责任公司 System for autononous detection and separation of common elements within data, and methods and devices associated therewith
CN102982076A (en) * 2012-10-30 2013-03-20 新华通讯社 Multi-dimensionality content labeling method based on semanteme label database
CN107615766A (en) * 2015-04-16 2018-01-19 维斯克体育科技有限公司 System and method for creating and distributing content of multimedia
CN106650705A (en) * 2017-01-17 2017-05-10 深圳地平线机器人科技有限公司 Region labeling method and device, as well as electronic equipment
CN107564004A (en) * 2017-09-21 2018-01-09 杭州电子科技大学 It is a kind of that video labeling method is distorted based on computer auxiliary tracking

Also Published As

Publication number Publication date
CN108491774A (en) 2018-09-04

Similar Documents

Publication Publication Date Title
CN108491774B (en) Method and device for tracking and labeling multiple targets in video
US20140310245A1 (en) Partition level backup and restore of a massively parallel processing database
KR102031588B1 (en) Method and system for implementing index when saving file
US20130138613A1 (en) Synthetic backup data set
US10657171B2 (en) Image search device and method for searching image
US11681691B2 (en) Presenting updated data using persisting views
CN114721972B (en) Garbage recycling method and device, readable storage medium and electronic equipment
CN111443912A (en) Page rendering method and device based on components, computer equipment and storage medium
US10872075B2 (en) Indexing flexible multi-representation storages for time series data
JP2012519923A (en) Method and system for updating images in an image database
WO2015165545A1 (en) Embedded processing of structured and unstructured data using a single application protocol interface (api)
CN109657803B (en) Construction of machine learning models
US20170083537A1 (en) Mapping logical identifiers using multiple identifier spaces
Mao et al. A dynamic feature generation system for automated metadata extraction in preservation of digital materials
WO2015134310A1 (en) Cross indexing with grouplets
CN105447064B (en) Electronic map data making and using method and device
US11182342B2 (en) Identifying common file-segment sequences
CN108153777A (en) The acquisition methods and device of data access information
CN112861652B (en) Video target tracking and segmentation method and system based on convolutional neural network
JP2017504091A (en) Continuous image processing
Sudheer et al. Edge and Texture Feature Extraction Using Canny and Haralick Textures on SPARK Cluster
US20220270403A1 (en) Face search method and device, apparatus, and computer-readable storage medium
KR20170039524A (en) Image search system and method using visual part vocabulary
US20170242882A1 (en) An overlay stream of objects
CN113392252B (en) Data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant