CN112418335B

CN112418335B - Model training method based on continuous image frame tracking annotation and electronic equipment

Info

Publication number: CN112418335B
Application number: CN202011355146.XA
Authority: CN
Inventors: 李虹敏
Original assignee: Beijing Yunju Intelligent Technology Co ltd
Current assignee: Beijing Yunce Data Technology Co ltd
Priority date: 2020-11-27
Filing date: 2020-11-27
Publication date: 2024-04-05
Anticipated expiration: 2040-11-27
Also published as: CN112418335A

Abstract

The application discloses a model training method based on continuous image frame tracking annotation, electronic equipment and a computer readable storage medium, comprising the following steps: acquiring a batch of continuous image frames; generating a plurality of data packets based on the batch of continuous image frames, wherein the data packets comprise M continuous image frames, the last N image frames of the previous data packet in any two adjacent data packets are the same as the first N image frames of the next data packet in sequence, and M, N is a positive integer and N is smaller than M; tracking and marking the image frames in the data packets respectively; unifying labels of corresponding image frames in the last N image frames of the previous data packet and the last N image frames of the next data packet in adjacent data packets of the plurality of data packets after tracking labels; and performing model training based on the image frames in the plurality of data packets after unified labeling. The method and the device can improve the model training efficiency based on continuous image frame tracking annotation.

Description

Model training method based on continuous image frame tracking annotation and electronic equipment

Technical Field

The present disclosure relates to the field of computer data, and in particular, to a model training method based on continuous image frame tracking and labeling, an electronic device, and a computer readable storage medium.

Background

In training an identification model of a road vehicle or other object, it is often necessary to provide a predetermined number of training samples. Typically, the training samples are labeled by acquiring successive image frames from a video captured in the field scene and tracking the same sample objects that appear in the multiple image frames to determine the labels of the training samples.

The number of image frames that typically need to be tracked for annotation is large, thousands of frames or more. However, the existing tracking label has a certain limit on the number of image frames, and various problems are encountered when tracking labels are performed on continuous frames of 200 frames, for example, exceeding a certain number of frames. For example, to operate on continuous 2000 frames of data, it is necessary to complete the operation of 2000 frames of data continuously at a time, and once a bit of error occurs in the middle, all subsequent labels will be affected, and modification is extremely difficult. Moreover, since tracking labels of the same object are continuous and unique, the operations cannot be performed in parallel at the same time. Therefore, the accuracy and efficiency of tracking and labeling are greatly reduced, and subsequent model training is correspondingly caused to bring certain difficulty.

How to improve the model training efficiency based on continuous image frame tracking annotation is a technical problem to be solved.

Disclosure of Invention

An object of the embodiments of the present application is to provide a model training method based on continuous image frame tracking annotation, an electronic device, and a computer readable storage medium, so as to solve the problem of low model training efficiency.

In order to solve the technical problems, the present specification is implemented as follows:

in a first aspect, a model training method based on continuous image frame tracking annotation is provided, including: acquiring a batch of continuous image frames; generating a plurality of data packets based on the batch of continuous image frames, wherein the data packets comprise M continuous image frames, the last N image frames of the previous data packet in any two adjacent data packets are the same as the first N image frames of the next data packet in sequence, and M, N is a positive integer and N is smaller than M; tracking and marking the image frames in the data packets respectively; unifying labels of corresponding image frames in the last N image frames of the previous data packet and the last N image frames of the next data packet in adjacent data packets of the plurality of data packets after tracking labels; and performing model training based on the image frames in the plurality of data packets after unified labeling.

Optionally, tracking and labeling the image frames in the plurality of data packets respectively includes: and carrying out tracking and labeling on each data packet in parallel, wherein labels of the same objects included in the frame image of the same data packet are the same.

Optionally, before unifying labels of the last N image frames of the preceding data packet and corresponding image frames in the first N image frames of the following data packet in adjacent data packets of the plurality of data packets after tracking and labeling, the method further includes: acquiring file names and/or image frame identifications of image frames in the plurality of data packets after tracking and labeling, wherein the file names and the image frame identifications of each image frame are sequentially generated and unique according to the corresponding continuous image frame sequence; and sequentially ordering the plurality of frame data packets according to the file name and/or the image frame identifier of the image frame.

Optionally, unifying labels of corresponding image frames in the last N image frames of the previous data packet and the last N image frames of the next data packet in adjacent data packets of the plurality of data packets after tracking and labeling includes: unifying the labels of the same object in the last N image frames of the previous data packet and the first N image frames of the next data packet in the adjacent data packets of the plurality of data packets after tracking labels; and unifying the labels of the same object included in the image frames of the preceding data packet and the image frames of the following data packet in the adjacent data packets of the plurality of data packets according to the sequence ordering of the plurality of frame data packets and the plurality of frame data packets after unifying the labels of the same object.

Optionally, unifying the labeling of the same object in the last N image frames of the previous data packet and the first N image frames of the next data packet in the adjacent data packets of the plurality of data packets after tracking labeling includes: modifying labels of the same objects included in the first N image frames of the subsequent data packet to labels of the same objects included in the last N image frames of the preceding data packet; or modifying all labels of the same object included in the last N image frames of the prior data packet to labels of the same object included in the first N image frames of the prior data packet.

Optionally, according to the sequence ordering of the plurality of frame data packets and the plurality of frame data packets after the uniform labeling of the same object, unifying the labeling of the same object included in the image frame of the preceding data packet and the image frame of the following data packet in the adjacent data packets of the plurality of data packets includes:

the method comprises the steps of obtaining labels of the same object in the last N image frames of a previous data packet and labels of the same object in the first N image frames of the previous data packet in the plurality of frame data packets after the same object is uniformly labeled;

in the case that the labels of the same objects included in the first N image frames of the subsequent data packet are all modified to the labels of the same objects included in the last N image frames of the preceding data packet, sequentially modifying the labels of the same objects included in the image frames of the subsequent data packet to the labels of the same objects included in the image frames of the preceding data packet according to the front-to-back ordering of the plurality of frame data packets; or alternatively

And under the condition that all labels of the same object included in the last N image frames of the prior data packet are modified to be labels of the same object included in the first N image frames of the prior data packet, sequentially modifying all labels of the same object included in the image frames of the prior data packet to be labels of the same object included in the image frames of the prior data packet according to the sequence of the plurality of frame data packets from back to front.

Optionally, the continuous image frame includes a plurality of different objects, each object being at least partially present in the continuous image frame.

Optionally, after tracking and labeling the image frames in the plurality of data packets, the method further includes:

and performing accuracy correction on the labels of the image frames in the data packets so as to perform the step of unifying the labels of the last N image frames of the preceding data packet and the corresponding image frames in the first N image frames of the following data packet in the adjacent data packets of the data packets after tracking and labeling when the labels pass the accuracy correction.

In a second aspect, there is provided an electronic device comprising a processor and a processor electrically connected to the memory, the memory storing a computer program executable by the processor to perform the steps of the method according to the first aspect.

In a third aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method according to the first aspect.

In the embodiment of the application, a batch of continuous image frames are acquired; generating a plurality of data packets based on the batch of continuous image frames, wherein the data packets comprise M continuous image frames, the last N image frames of the previous data packet in any two adjacent data packets are the same as the first N image frames of the next data packet in sequence, and M, N is a positive integer and N is smaller than M; tracking and marking the image frames in the data packets respectively; unifying labels of corresponding image frames in the last N image frames of the previous data packet and the last N image frames of the next data packet in adjacent data packets of the plurality of data packets after tracking labels; the model training is carried out based on the image frames in the data packets after unified marking, so that errors in the tracking marking process can be reduced or avoided to a great extent, and the accuracy and the efficiency of the tracking marking are improved. Therefore, the image frames after tracking and labeling are used as samples for model training, difficulties brought by model training can be reduced, and the model training efficiency based on continuous image frame tracking and labeling is improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:

fig. 1 is a flow chart of a model training method based on continuous image frame tracking annotation according to an embodiment of the application.

Fig. 2 is a block diagram of the structure of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application. The reference numerals in the present application are only used to distinguish the steps in the scheme, and are not used to limit the execution sequence of the steps, and the specific execution sequence controls the description in the specification.

In order to solve the problems in the prior art, an embodiment of the present application provides a model training method based on continuous image frame tracking and labeling, and fig. 1 is a flow chart of the model training method based on continuous image frame tracking and labeling in the embodiment of the present application. As shown in fig. 1, the method comprises the following steps:

step 102, a batch of consecutive image frames is acquired.

In the embodiment of the present application, the continuous image frames are, for example, super-long continuous frames with a number greater than 200 frames. And acquiring the acquired batch of continuous image frames, such as videos shot by the camera acquisition equipment, and the like, and acquiring the video based on a preset time interval to obtain the corresponding batch of continuous image frames. Optionally, the acquired successive image frames include a plurality of different objects, each object being present in at least a portion of the successive image frames. Of course, only the same object may be included in consecutive image frames. For the case where there are a plurality of different objects, there is also a case where one or some of the objects exist in only one image frame. The present application is not limited to this particular embodiment.

The video may be, for example, a monitoring video shot by a camera set at a certain intersection, and is mainly used for monitoring vehicles running on a road. Multiple vehicles may appear in a single image frame and during the period of time that the intersection is being traversed, one or more vehicles may appear simultaneously in the corresponding captured successive image frames.

Step 104, generating a plurality of data packets based on the batch of continuous image frames, wherein the data packets comprise M continuous image frames, the last N image frames of the previous data packet in any two adjacent data packets are identical to the first N image frames of the next data packet in sequence, and M, N is a positive integer and N is smaller than M.

The generation of a plurality of data packets based on a batch of successive image frames is to divide the very long successive image frames with a larger number of image frames into a plurality of smaller number of data packets, for example, 100-200 frames per packet. That is, the value for M is 100-200. The method can facilitate the independent labeling of each data packet, and can facilitate the searching and the independent modification without affecting other data packets if the quality inspection data packet has errors.

When each acquired continuous image frame is generated, the file name and/or the image frame identification of each image frame are correspondingly generated, wherein the file name and the image frame identification of each image frame are sequentially generated and unique according to the corresponding continuous image frame sequence. For example, the file names of the image frames of which the total number is 1000 frames are 1001,1002, …,2000, and the corresponding image frames are identified as 1,2, …,1000, indicating that the image frames are 1 st, 2 nd, … nd, 1000 th frames.

For the data packets adjacent to each other, the last N image frames of the previous data packet are the same as the first N image frames of the next data packet in sequence, which means that N image frames of two adjacent data packets are coincident, and the N image frames exist in the two data packets at the same time. The number of N may be 1-10.

And 106, respectively carrying out tracking and labeling on the image frames in the data packets.

Optionally, in step 106, tracking and labeling the image frames in the plurality of data packets respectively includes: and carrying out tracking and labeling on each data packet in parallel, wherein labels of the same objects included in the frame image of the same data packet are the same.

That is, each data packet is tracked and marked independently, but the same object in the frame image of each data packet can be marked as different IDs independently and not related to each other. For example, the vehicles a and B appearing in the frame image in one data packet a may be labeled "1" and "2", respectively, and the vehicles a and B appearing in the frame image in the other data packet B may be labeled "5" and "6", respectively. However, all frame images of the vehicle a and the vehicle B that appear in the same packet a must be labeled as "1" and "2", respectively. Likewise, all frame images of vehicle a and vehicle B that appear in the same data packet B must be labeled "5" and "6", respectively.

Optionally, after tracking and labeling the image frames in the plurality of data packets in step 106, the method further includes: and performing accuracy correction on the labels of the image frames in the data packets so as to perform the step of unifying the labels of the last N image frames of the preceding data packet and the corresponding image frames in the first N image frames of the following data packet in the adjacent data packets of the data packets after tracking and labeling when the labels pass the accuracy correction.

Therefore, when the same object labeling is executed in the follow-up step and unified, the corresponding modified labeling is free from errors, and the accuracy of the object labeling is ensured.

And step 108, unifying labels of corresponding image frames in the last N image frames of the previous data packet and the last N image frames of the next data packet in adjacent data packets of the plurality of data packets after tracking and labeling.

Optionally, before performing step 108, the method further includes: acquiring file names and/or image frame identifications of image frames in the plurality of data packets after tracking and labeling, wherein the file names and the image frame identifications of each image frame are sequentially generated and unique according to the corresponding continuous image frame sequence; and sequentially ordering the plurality of frame data packets according to the file name and/or the image frame identifier of the image frame.

Although the same object may be labeled as different IDs in image frames of different data packets, there is no association or sequential representation with each other. However, the sequence of the data packets may be associated according to the file name and the image frame identifier of the image frame of each data packet.

In step 108, first, labeling the same object in the last N image frames of the previous data packet and the first N image frames of the next data packet in the adjacent data packets of the plurality of data packets after tracking labeling is unified. And then, according to the sequence ordering of the plurality of frame data packets and the plurality of frame data packets after the uniform marking of the same object, unifying the marking of the same object included in the image frame of the preceding data packet and the image frame of the following data packet in the adjacent data packets of the plurality of data packets.

Unifying the labeling of the same object in the last N image frames of the previous data packet and the first N image frames of the next data packet in the adjacent data packets of the plurality of data packets after tracking labeling comprises the following steps:

modifying labels of the same objects included in the first N image frames of the subsequent data packet to labels of the same objects included in the last N image frames of the preceding data packet; or alternatively

And modifying all labels of the same object included in the last N image frames of the prior data packet into labels of the same object included in the first N image frames of the prior data packet.

Thus, the corresponding identical objects in the adjacent data packets in the plurality of data packets can be sequentially associated in a front-to-back sequence or a back-to-front sequence. For example, the label of a certain vehicle C in the last N image frames in the 1 st data packet remains unchanged from the original label "3", while the labels of the certain vehicle C in the first N image frames in the 2 nd data packet are all modified from the original label "8" to "3". However, the label of the vehicle C in the last N image frames in the 2 nd data packet remains the original label "8". Accordingly, the label of the vehicle C in the first N image frames in the 3 rd data packet is modified from the original label "7" to "8", for example, while the label of the vehicle C in the last N image frames in the 3 rd data packet remains the original label "7". Thus, the label of the vehicle C in the first N image frames in the 2 nd data packet is currently '3', and is consistent with the label of the vehicle C in the 1 st data packet; the label of the vehicle C in the first N image frames in the 3 rd data packet is currently "8", which coincides with the label "8" of the vehicle C in the last N image frames of the 2 nd data packet. In this way, the labels of the same vehicle C in all the data packets can be associated by modifying in turn, and the sequence in which the vehicles C appear in the original continuous image frames can be obtained by associating.

After the labels of the same objects in the N image frames in the plurality of data packets are associated, the labels of the same objects included in the image frames of the preceding data packet and the image frames of the following data packet in adjacent data packets of the plurality of data packets are unified according to the sequence ordering of the plurality of frame data packets and the plurality of frame data packets after the unified labels of the same objects. Therefore, the correction rate and the efficiency of tracking and labeling can be improved by modifying the labeling ID of a small number of image frames for association and then executing the batch labeling ID modification of all subsequent image frames.

The manner in which labels of the same objects in the entire data packet are modified in batches is also different according to the order in which labels of the same objects of N image frames in adjacent data packets are associated.

For example, the label association of the vehicle C is to modify the labels of the same objects appearing in the N image frames associated with the 2 nd data packet, the 3 rd data packet, and the … last data packet in order from the 1 st data packet. Thus, at the time of batch modification, the labels of the image frames about the vehicle C in the 1 st data packet are already identical, i.e., all are labeled "3". Thus, the association label "3" can be associated to the 2 nd data packet, and the labels of the vehicles C in the 2 nd data packet are all modified to be "3", including the labels "8" in the last N image frames. And (3) the 3 rd data packet is associated with through the association label "8" in the 2 nd data packet, and the identification of the vehicle C in the 3 rd data packet is all modified to be the label "3" of the vehicle C in the 2 nd data packet. In this way, all labels for the vehicle C in all data packets can be modified to "3" in turn backwards.

If the label association of the vehicle C is that the label of the same object in the whole data packet is modified in batches from the last data packet, the label of the vehicle C in the image frame of the last data packet is used as the label of the vehicle C, and the label of the vehicle C in the image frame of the last data packet is modified in sequence from the back to the front.

Therefore, each package of IDs is modified in batches according to the modification result, the same object can be ensured to be marked identically and uniquely, and marking of the ultra-long continuous image frame tracking can be completed.

It is noted that the same object may be present in only part of a data packet or in part of an image frame of a certain data packet. Therefore, when the same object label association of the adjacent data packet is carried out, if the same object does not exist in the adjacent data packet, the association is interrupted until the subsequent data packet reappears the object to be associated again. Or the object is not present in the subsequent data packet, the data packet associated with the subsequent occurrence of the same object is ended. If an object exists in only one image frame, the object is marked separately.

For example, the intersection-monitored vehicle D may disappear after a certain period of time passes through the monitored road section, and there are few cases where it comes back and forth. Thus, the vehicle D may be present in consecutive image frames of the front portion of the data packet or in consecutive image frames of the rear portion of the data packet. If the vehicle D is missing and then reappears, there may be different and possibly non-adjacent packets.

And step 110, performing model training based on the image frames in the plurality of data packets after unified labeling.

According to the method and the device, the continuous image frames are divided into the data packets, the last N image frames of the previous data packet in any two adjacent data packets are corresponding to the first N image frames of the next data packet in sequence, tracking and marking are respectively carried out on each data packet, and then the marking of the data packets is unified after the tracking and marking, so that errors in the tracking and marking process can be greatly reduced or avoided, and the accuracy and the efficiency of the tracking and marking are improved. Therefore, the image frames after tracking and labeling are used as samples for model training, difficulties brought by model training can be reduced, and the model training efficiency of continuous image frame tracking and labeling based on tracking and labeling can be improved.

In addition, the method and the device can divide the continuous image frame into a plurality of data packets to track and mark the data packets at the same time, improve the processing speed of tracking and marking, and improve the efficiency of auditing quality inspection of marking the data packets.

Optionally, the embodiment of the present application further provides an electronic device, and fig. 2 is a block diagram of the structure of the electronic device in the embodiment of the present application.

As shown in fig. 2, the electronic device 2000 includes a memory 2200 and a processor 2400 electrically connected to the memory 2200, where the memory 2200 stores a computer program that can be executed by the processor 2400, where the computer program implements the processes of any one of the foregoing model training method embodiments based on continuous image frame tracking labeling, and the processes achieve the same technical effects, and are not repeated herein.

The embodiment of the application further provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements each process of any one of the model training method embodiments based on continuous image frame tracking and labeling, and can achieve the same technical effect, so that repetition is avoided, and no further description is given here. Wherein the computer readable storage medium is selected from Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), including several instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method described in the embodiments of the present application.

The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those of ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are also within the protection of the present application.

Claims

1. A model training method based on continuous image frame tracking annotation is characterized by comprising the following steps:

acquiring a batch of continuous image frames;

generating a plurality of data packets based on the batch of continuous image frames, wherein the data packets comprise M continuous image frames, the last N image frames of the previous data packet in any two adjacent data packets are the same as the first N image frames of the next data packet in sequence, and M, N is a positive integer and N is smaller than M;

tracking and marking the image frames in the data packets respectively;

acquiring file names and/or image frame identifications of image frames in the plurality of data packets after tracking and labeling, wherein the file names and the image frame identifications of each image frame are sequentially generated and unique according to the corresponding continuous image frame sequence;

sequentially ordering the plurality of frame data packets according to the file name and/or the image frame identifier of the image frame;

unifying labels of corresponding image frames in the last N image frames of the previous data packet and the last N image frames of the next data packet in adjacent data packets of the plurality of data packets after tracking labels;

and performing model training based on the image frames in the plurality of data packets after unified labeling.

2. The method of claim 1, wherein tracking and labeling the image frames in the plurality of data packets, respectively, comprises:

and carrying out tracking and labeling on each data packet in parallel, wherein labels of the same objects included in the frame image of the same data packet are the same.

3. The method of claim 1, wherein unifying labels of the last N image frames of the preceding data packet and corresponding image frames of the first N image frames of the following data packet in adjacent data packets of the plurality of data packets after tracking labels, comprises:

unifying the labels of the same object in the last N image frames of the previous data packet and the first N image frames of the next data packet in the adjacent data packets of the plurality of data packets after tracking labels;

and unifying the labels of the same object included in the image frames of the preceding data packet and the image frames of the following data packet in the adjacent data packets of the plurality of data packets according to the sequence ordering of the plurality of frame data packets and the plurality of frame data packets after unifying the labels of the same object.

4. The method of claim 3, wherein unifying the labeling of the same object in the last N image frames of the preceding data packet and the first N image frames of the following data packet in adjacent data packets of the plurality of data packets after tracking labeling comprises:

And modifying all labels of the same object included in the last N image frames of the previous data packet into labels of the same object included in the first N image frames of the subsequent data packet.

5. The method of claim 4, wherein unifying the labeling of the same object included in the image frame of the preceding data packet and the image frame of the following data packet in the adjacent data packets of the plurality of data packets according to the sequential ordering of the plurality of frame data packets and the plurality of frame data packets after unifying the labeling of the same object, comprising:

And under the condition that all labels of the same object included in the last N image frames of the prior data packet are modified to be labels of the same object included in the first N image frames of the subsequent data packet, sequentially modifying all labels of the same object included in the image frames of the prior data packet to be labels of the same object included in the image frames of the subsequent data packet according to the sequence of the plurality of frame data packets from back to front.

6. The method of claim 1, wherein the successive image frames include a plurality of different objects, each object being present at least in part in the successive image frames.

7. The method of claim 1, further comprising, after tracking and labeling the image frames in the plurality of data packets, respectively:

8. An electronic device, comprising: a memory and a processor electrically connected to the memory, the memory storing a computer program executable by the processor, the computer program implementing the steps of the method of any one of claims 1 to 7 when executed by the processor.

9. A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, implements the steps of the method according to any one of claims 1 to 7.