CN111832549A

CN111832549A - Data labeling method and device

Info

Publication number: CN111832549A
Application number: CN202010604308.2A
Authority: CN
Inventors: 胡淑萍; 程骏; 张惊涛; 郭渺辰; 王东; 顾在旺; 庞建新; 熊友军
Original assignee: Ubtech Robotics Corp
Current assignee: Ubtech Robotics Corp
Priority date: 2020-06-29
Filing date: 2020-06-29
Publication date: 2020-10-27
Anticipated expiration: 2040-06-29
Also published as: CN111832549B

Abstract

The application is applicable to the technical field of image processing, and provides a data annotation method and a device, wherein the data annotation method comprises the following steps: for each non-first frame image in the image data, updating a historical tracking model corresponding to each marked area according to the area characteristics corresponding to each marked area in the last frame image of the non-first frame image to obtain each tracking model corresponding to the non-first frame image; performing region tracking on the non-first frame image through each tracking model to obtain a region to be marked corresponding to each tracking model; and for each region to be marked, marking the region to be marked according to the similarity between the region to be marked and the corresponding marked region. The tracking model obtained based on the region feature update can identify and label each region to be labeled in the non-first frame image, so that the workload of labeling the image data is reduced, the time spent on labeling the image data is reduced, and the efficiency of labeling the image data is improved.

Description

Data labeling method and device

Technical Field

The application belongs to the technical field of image processing, and particularly relates to a data annotation method and device.

Background

With the continuous development of image processing technology, each image frame in the image data can be labeled through the terminal device, so that a supervised learning mode can be adopted, and training is performed according to the labeled image data to obtain a model for detection.

In the related art, the terminal device may label a part of image data according to a labeling requirement according to an operation triggered by a user, train according to the labeled image data to obtain a labeling model for labeling the image data, then input the remaining image data into the labeling model, and label the remaining image data through the labeling model to obtain the labeled image data.

However, in the above data annotation method, in each process of annotating data, a large number of frame images need to be manually marked to train to obtain an annotation model, which causes a problem of large workload of annotating image data.

Disclosure of Invention

The embodiment of the application provides a data annotation method and device, which can solve the problem of large workload of annotating image data.

In a first aspect, an embodiment of the present application provides a data annotation method, including:

for each non-first frame image in image data, updating a historical tracking model corresponding to each labeled region according to the region characteristics corresponding to each labeled region in a last frame image of the non-first frame image to obtain each tracking model corresponding to the non-first frame image, wherein the historical tracking models correspond to the labeled regions in the last frame image one to one;

performing region tracking on the non-first frame image through each tracking model to obtain a region to be marked corresponding to each tracking model;

and for each region to be marked, marking the region to be marked according to the similarity between the region to be marked and the corresponding marked region.

Alternatively to this, the first and second parts may,

the region tracking of the non-first frame image through each tracking model to obtain the region to be marked corresponding to each tracking model comprises:

performing region expansion in the non-first frame image based on the labeling region corresponding to each historical tracking model to obtain a plurality of expansion regions;

for each expansion region, determining the similarity between each sub-region in the expansion region and the labeled region through a tracking model corresponding to the expansion region, wherein the sub-regions are obtained by dividing in the expansion region according to a preset identification mode;

and determining the region to be marked corresponding to the expansion region from the plurality of sub-regions according to the plurality of similarity.

Optionally, the determining, by the tracking model corresponding to the extended region, a similarity between each sub-region in the extended region and the labeled region includes:

extracting the characteristics of each sub-region through a tracking model corresponding to the extended region to obtain the characteristics of each sub-region;

for each sub-region, comparing the sub-region characteristics of the sub-region with the pre-stored region characteristics of the labeled region through the tracking model corresponding to the expanded region, and obtaining the similarity between the sub-region and the labeled region.

Optionally, the determining, according to the multiple similarities, a region to be labeled corresponding to the extended region from the multiple sub-regions includes:

and selecting a sub-region corresponding to the maximum similarity from the plurality of similarities as a region to be marked corresponding to the expansion region.

Optionally, after determining, according to the multiple similarities, a region to be labeled corresponding to the extended region from the multiple sub-regions, the method further includes:

and storing the sub-region characteristics corresponding to the region to be marked.

Optionally, performing region expansion in the non-first-frame image based on the labeled region corresponding to each historical tracking model to obtain a plurality of expanded regions, where the method includes:

for each marking area, expanding the boundary of the marking area according to a preset expansion coefficient to obtain an expanded boundary;

and generating the extended area in the non-first frame image according to the extended boundary by taking the center of the marked area as a reference.

Optionally, the labeling the to-be-labeled region according to the similarity between the to-be-labeled region and the corresponding labeling region includes:

if the similarity is larger than a preset similarity threshold, marking the area to be marked;

and if the similarity is smaller than or equal to the similarity threshold, deleting the tracking model corresponding to the marked area.

Optionally, before the historical tracking models corresponding to the respective labeled regions are respectively updated according to the region features corresponding to each labeled region in the previous frame image of the non-first frame image to obtain the respective tracking models corresponding to the non-first frame image, the method further includes:

acquiring a first frame image in the image data;

and labeling the first frame image according to labeling operation triggered by a user to obtain at least one labeled area of the first frame image, wherein a historical tracking model corresponding to each labeled area of the first frame image is a preset initial tracking model.

Optionally, the method further includes:

in the process of labeling the image data, if a pause operation triggered by a user is detected, stopping labeling the image data;

and labeling the current frame image according to labeling operation triggered again by the user to obtain a newly added labeling area, wherein the historical tracking model corresponding to the newly added labeling area is a preset initial tracking model.

In a second aspect, an embodiment of the present application provides a data annotation device, including:

the updating module is used for updating a historical tracking model corresponding to each marked area according to the area characteristic corresponding to each marked area in a previous frame image of the non-first frame image for each non-first frame image in the image data to obtain each tracking model corresponding to the non-first frame image, and the historical tracking models are in one-to-one correspondence with the marked areas in the previous frame image;

the tracking module is used for carrying out region tracking on the non-first frame image through each tracking model to obtain a region to be marked corresponding to each tracking model;

and the first labeling module is used for labeling each region to be labeled according to the similarity between the region to be labeled and the corresponding labeling region.

Optionally, the tracking module is specifically configured to perform region expansion in the non-first-frame image based on the labeled region corresponding to each historical tracking model to obtain a plurality of expanded regions; for each expansion region, determining the similarity between each sub-region in the expansion region and the labeled region through a tracking model corresponding to the expansion region, wherein the sub-regions are obtained by dividing in the expansion region according to a preset identification mode; and determining the region to be marked corresponding to the expansion region from the plurality of sub-regions according to the plurality of similarity.

Optionally, the tracking module is further specifically configured to perform feature extraction on each sub-region through a tracking model corresponding to the extended region, so as to obtain sub-region features of each sub-region; for each sub-region, comparing the sub-region characteristics of the sub-region with the pre-stored region characteristics of the labeled region through the tracking model corresponding to the expanded region, and obtaining the similarity between the sub-region and the labeled region.

Optionally, the tracking module is further specifically configured to select a sub-region corresponding to the maximum similarity from the multiple similarities as the to-be-labeled region corresponding to the extended region.

Optionally, the apparatus further comprises:

and the storage module is used for storing the sub-region characteristics corresponding to the region to be marked.

Optionally, the tracking module is further specifically configured to, for each labeled region, expand the boundary of the labeled region according to a preset expansion coefficient to obtain an expanded boundary; and generating the extended area in the non-first frame image according to the extended boundary by taking the center of the marked area as a reference.

Optionally, the first labeling module is further configured to label the area to be labeled if the similarity is greater than a preset similarity threshold; and if the similarity is smaller than or equal to the similarity threshold, deleting the tracking model corresponding to the marked area.

Optionally, the apparatus further comprises:

the acquisition module is used for acquiring a first frame image in the image data;

and the second labeling module is used for labeling the first frame image according to labeling operation triggered by a user to obtain at least one labeling area of the first frame image, and the historical tracking model corresponding to each labeling area of the first frame image is a preset initial tracking model.

Optionally, the apparatus further comprises:

the stopping module is used for stopping the annotation of the image data if the pause operation triggered by a user is detected in the process of the annotation of the image data;

and the third labeling module is used for labeling the current frame image according to the labeling operation triggered again by the user to obtain a newly added labeling area, and the historical tracking model corresponding to the newly added labeling area is a preset initial tracking model.

In a third aspect, an embodiment of the present application provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor, when executing the computer program, implements the data annotation method according to any one of the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium, which stores a computer program, where the computer program is executed by a processor to implement the data annotation method according to any one of the above first aspects.

In a fifth aspect, an embodiment of the present application provides a computer program product, which, when run on a terminal device, causes the terminal device to execute the data annotation method described in any one of the above first aspects.

Compared with the prior art, the embodiment of the application has the advantages that:

in the embodiment of the application, for each non-first frame image in the image data, the terminal device can update the historical tracking model corresponding to each labeled region according to the region characteristics corresponding to each labeled region in the last frame image of the non-first frame image to obtain each tracking model corresponding to the non-first frame image, then perform region tracking on the non-first frame image through each tracking model to obtain the region to be labeled corresponding to each tracking model, for each region to be labeled, the terminal device can label the region to be labeled according to the similarity between the region to be labeled and the corresponding labeled region without training the labeling model matching with the current labeling requirement or manually labeling a large number of frame images, and can identify and label each region to be labeled in the non-first frame image through the tracking model obtained based on region characteristic update, the workload of labeling the image data is reduced, the time spent on labeling the image data is reduced, and the efficiency of labeling the image data is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic system architecture diagram of a data annotation system according to a data annotation method provided in an embodiment of the present application;

fig. 2 is a schematic structural diagram of a terminal device according to an embodiment of the present application;

FIG. 3 is a schematic flow chart of a data annotation method provided in an embodiment of the present application;

fig. 4 is a schematic flowchart of region tracking for a non-leading frame image according to an embodiment of the present application;

FIG. 5a is a schematic diagram of a labeling area provided in an embodiment of the present application;

FIG. 5b is a schematic diagram of an extended area provided by an embodiment of the present application;

fig. 6 is a block diagram illustrating a data annotation device according to an embodiment of the present application;

FIG. 7 is a block diagram of another data annotation device provided in the embodiments of the present application;

FIG. 8 is a block diagram illustrating a structure of another data annotation device provided in the embodiments of the present application;

fig. 9 is a block diagram of a structure of another data annotation device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

The terminology used in the following examples is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of this application and the appended claims, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, such as "one or more", unless the context clearly indicates otherwise. It should also be understood that in the embodiments of the present application, "one or more" means one, two, or more than two; "and/or" describes the association relationship of the associated objects, indicating that three relationships may exist; for example, a and/or B, may represent: a alone, both A and B, and B alone, where A, B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

The data annotation method provided by the embodiment of the application can be applied to terminal devices such as a mobile phone, a tablet computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a Personal Digital Assistant (PDA), and the like, and the embodiment of the application does not limit the specific type of the terminal device.

For example, the terminal device may be a STATION (ST) in a WLAN, a Personal Digital Assistant (PDA) device, a computing device or other processing device connected to a wireless modem, a computer, a laptop computer, a handheld communication device, a handheld computing device, a satellite radio, a wireless modem card, and so forth.

Fig. 1 is a schematic system architecture diagram of a data annotation system according to a data annotation method provided in an embodiment of the present application, and referring to fig. 1, the data annotation system may include: terminal device 110 and server 120, terminal device 110 is connected with server 120.

The server 120 may obtain the acquired image data and forward the image data to the terminal device 110. The terminal device 110 may receive the image data sent by the server 120, and label each frame image in the image data to obtain labeled image data for model training.

In particular, the image data may be continuous video data, for example, the image data may be video data of traffic flows collected by cameras distributed on different streets.

In a possible implementation manner, the server 120 may receive image data acquired by each image acquisition device, and then forward the acquired image data to the terminal device 110, so that the terminal device 110 may receive the image data, label a first frame image in the image data based on a labeling operation triggered by a user, generate a tracking model corresponding to a labeled area of each first frame image, and label a non-first frame image in the image data according to the tracking model, so as to obtain the labeled image data.

It should be noted that, in the process that the terminal device labels each frame image in the image data according to the time sequence, if the terminal device detects a pause operation triggered by the user, the terminal device may stop labeling the image data, form a new added labeling area according to the labeling operation triggered again by the user, and generate a tracking model corresponding to the new added labeling area.

In addition, in practical application, the terminal device 110 may also receive the image data acquired by the image acquisition device, and the server 120 does not forward the image data any more, that is, the terminal device 110 is connected to the image acquisition device, and the image acquisition device may send the image data to the terminal device 110 after acquiring the image data.

Fig. 2 is a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in fig. 2, the terminal device 2 of this embodiment includes: at least one processor 21 (only one shown in fig. 2), a memory 22, and a computer program stored in the memory 22 and executable on the at least one processor 21, the processor 21 implementing the steps in any of the various data annotation method embodiments described below when executing the computer program 22.

The terminal device 2 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device may include, but is not limited to, a processor 21, a memory 22. Those skilled in the art will appreciate that fig. 2 is only an example of the terminal device 2, and does not constitute a limitation to the terminal device 2, and may include more or less components than those shown, or combine some components, or different components, for example, and may also include input/output devices, network access devices, and the like.

The Processor 21 may be a Central Processing Unit (CPU), and the Processor 21 may also be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 22 may in some embodiments be an internal storage unit of the terminal device 2, such as a hard disk or a memory of the terminal device 2. The memory 22 may also be an external storage device of the terminal device 2 in other embodiments, such as a plug-in hard disk provided on the terminal device 2, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and so on. Further, the memory 22 may also include both an internal storage unit of the terminal device 2 and an external storage device. The memory 22 is used for storing an operating system, an application program, a BootLoader (BootLoader), data, and other programs, such as program codes of a computer program. The memory 22 may also be used to temporarily store data that has been output or is to be output.

Fig. 3 is a schematic flowchart of a data annotation method provided in an embodiment of the present application, and by way of example and not limitation, the method may be applied to the terminal device described above, and referring to fig. 3, the method includes:

and 301, labeling the first frame image.

The image data may include a first frame image and a non-first frame image, the first frame image is a frame image arranged at a first position in the image data according to a time sequence, and the non-first frame image is another frame image except the first frame image in the image data. For example, the image data is composed of frame images acquired every 1 second within 10 seconds, and the image data may include 10 frame images, where the frame image acquired in the 1 st second is a first frame image, and the frame images acquired in the 2 nd to 10 th seconds are non-first frame images.

In the process of labeling the image data, the terminal device may first obtain a first frame image in the image data, and label the first frame image according to a labeling operation triggered by a user to obtain at least one labeled region of the first frame image, so that in subsequent steps, a corresponding tracking model may be generated according to a historical tracking model corresponding to each labeled region of the first frame image, and the historical tracking model corresponding to each labeled region of the first frame image may be a preset initial tracking model.

In a possible implementation manner, the terminal device may first obtain and display the first frame image to the user according to the time sequence corresponding to each frame image in the image data, then detect the tagging operation triggered by the user, and determine an object corresponding to the tagging operation in the first frame image and an area where the object is located, that is, a tagging area.

In addition, during the process of labeling the first frame image, the user can also input attribute information corresponding to each labeled object. For example, the attribute information corresponding to the nth object is (x)_n ¹，y_n ¹)，x_n ¹For representing the labeled area corresponding to the nth labeled object in the 1 st frame image, such as the labeled areaInformation on the center coordinates, length and width of the field, y_n ¹And the label is used for representing the nth labeled object in the 1 st frame image, and information such as name, number, shielded state, rotation state and the like is obtained.

It should be noted that, in practical applications, each frame image may include at least one labeled region, for convenience of description, the embodiment of the present application is described by taking only one labeled region as an example, and the number of labeled regions included in each frame image is not limited.

In addition, the terminal device may also label the first frame image in other manners, for example, the first frame image may be labeled according to a pre-trained model, and the manner of labeling the first frame image is not limited in this application embodiment.

Step 302, for each non-first frame image in the image data, updating the historical tracking model corresponding to each labeled region according to the region characteristics corresponding to each labeled region in the last frame image of the non-first frame image, so as to obtain each tracking model corresponding to the non-first frame image.

And the historical tracking model corresponds to the labeled areas in the previous frame of image one by one.

The last frame image of the non-first frame image is used for representing the first frame image which is adjacent to the non-first frame image and is earlier in time sequence than the non-first frame image or other non-first frame images. For example, the image data includes N frame images, N is a positive integer, if x is greater than 2 and x is less than or equal to N, the previous frame image of the xth frame image is the xth-1 frame image; if x is equal to 2, the previous frame image of the x-th frame image is the first frame image, i.e. the 1 st frame image.

After the terminal device finishes labeling the first frame image according to the operation triggered by the user, the terminal device can update the preset initial tracking model according to the region characteristics corresponding to each labeled region in the first frame image to obtain the tracking model corresponding to the next frame image, so that the non-first frame image in the image data is continuously labeled through the tracking model to label each frame image in the image data.

In the process of labeling the non-first frame image, the terminal device may first acquire a region feature corresponding to any labeling region in a previous frame image of the pre-stored non-first frame image, and then update the historical tracking model corresponding to the labeling region according to the region feature to obtain a tracking model for performing region identification on the non-first frame image, so that in subsequent steps, the non-first frame image may be labeled through each tracking model to obtain the labeling region of the non-first frame image.

For example, corresponding to the above example, the non-first frame image acquired by the terminal device may be an x-th frame image, and if x is greater than 2 and x is less than or equal to N, the historical tracking model corresponding to the x-1-th frame image may be acquired, and the tracking model corresponding to the x-th frame image is generated; if x is equal to 2, a historical tracking model corresponding to the 1 st frame image, that is, a preset initial tracking model, may be obtained, and a tracking model corresponding to the 2 nd frame image may be generated.

And 303, performing region tracking on the non-first frame image through each tracking model to obtain a region to be marked corresponding to each tracking model.

The terminal device can perform region tracking on the non-first frame image through the acquired tracking model, and determine the regions to be labeled similar to each labeled region in the previous frame image, so that in the subsequent steps, the non-first frame image can be labeled according to each region to be labeled, and the labeled region of the non-first frame image is obtained.

In the process of performing region tracking on the non-first frame image, the terminal device may first expand the labeled region based on the position of the labeled region in the previous frame image, and then may search for a region to be labeled similar to the labeled region from each sub-region of the expanded region, with reference to fig. 4, where this step 303 may include: step 303a, step 303b, step 303c and step 303 d.

And 303a, performing region expansion in the non-first frame image based on the labeled region corresponding to each historical tracking model to obtain a plurality of expanded regions.

After obtaining the tracking model of the previous frame image, the terminal device may continue to perform region tracking on the non-first frame image according to the tracking model. And as time changes, the position of the object in each labeled region in the image data in the non-first frame image may also change, and before determining the region to be labeled, the labeled region corresponding to the previous frame image may be expanded to obtain an expanded region, so that in the subsequent step, the region to be labeled may be determined from the expanded region.

In a possible implementation manner, the terminal device may first obtain the position (such as the center coordinate, the length, the width, and the like of each labeled region) of each labeled region corresponding to the previous frame of image, and then, in the non-first frame of image, expand outward along the boundary of each labeled region according to the position of each labeled region, so as to obtain a plurality of expanded regions.

Optionally, in the process of expanding the labeled region, for each labeled region, the terminal device may expand the boundary of the labeled region according to a preset expansion coefficient to obtain an expanded boundary, and then generate an expanded region in the non-first-frame image according to the expanded boundary with the center of the labeled region as a reference.

Specifically, the terminal device may use the product of the expansion coefficient and the boundary length of each labeled region as each expanded boundary. Then, aiming at each expanded boundary of the same labeling area, each expanded boundary can be combined into a regular graph, and the position of the regular graph is adjusted, so that the center of the regular graph is overlapped with the position indicated by the center coordinate of the labeling area, and the expanded area in the non-first frame image is obtained.

It should be noted that, if the corresponding labeled region in the previous frame image is close to the boundary of the previous frame image, and after the labeled region is expanded, a part of the expanded region may already extend beyond the boundary of the previous frame image, the part where the expanded region intersects with the previous frame image may be used as the expanded region for determining the region to be labeled.

For example, referring to fig. 5a and 5B, fig. 5a shows A, B, C and D of the previous frame image with 4 rectangular labeling areas, respectively, and if the pre-stored expansion coefficient is 2.5, the length and width of the boundary of the labeling area can be expanded by 2.5 times, respectively, to obtain the expanded areas a ', B', C 'and D' in the non-first frame image as shown in fig. 5B.

If the labeled region D is close to the boundary of the previous frame of image, then the labeled region D is expanded, as shown in fig. 5b, the expanded region D 'is smaller than the area expanded by the expansion coefficient 2.5, and only the part intersecting the previous frame of image after expansion is taken as the expanded region D'.

In addition, in practical application, the terminal device may expand the plurality of labeled regions in the previous frame of image according to the above manner to obtain a plurality of expanded regions, or after a certain labeled region is expanded, execute step 303b and step 303c to determine a region to be labeled, return to step 303a to expand another labeled region in the previous frame of image, and execute the above process cyclically until the labeled region in the previous frame of image is expanded. Of course, the terminal device may also expand the labeled region in the previous frame image in other manners, which is not limited in this embodiment of the application.

And step 303b, for each expansion region, determining the similarity between each sub-region in the expansion region and the labeled region through the tracking model corresponding to the expansion region.

The sub-regions are obtained by dividing the expansion regions according to a preset identification mode.

After the terminal device expands to obtain the expanded areas, for each expanded area, the expanded area can be compared with the corresponding labeled area in the previous frame of image through the tracking model corresponding to the expanded area, so that the similarity between each sub-area in the expanded area and the labeled area is obtained.

In a possible implementation manner, for each extended area, the terminal device may input the extended area into a tracking model, traverse the extended area according to a preset size of the sub-area and a sliding step length through the tracking model to obtain a plurality of sub-areas, and compare each sub-area with a labeled area corresponding to the extended area to obtain a similarity between each sub-area and the labeled area.

The similarity is used for representing the degree of similarity between the sub-region and the labeled region, and the terminal device can determine whether the object in the labeled region of the previous frame image is displaced to the position corresponding to the sub-region of the non-first frame image according to the degree of similarity. For example, if the tracking model is a correlation filter, the similarity may be a correlation coefficient output from the correlation filter; if the tracking model is a twin network, the confidence of the twin network may be used as the similarity, and the tracking model and the similarity are not limited in the embodiment of the present application.

Optionally, in the process of comparing the sub-regions with the labeled region, feature extraction may be performed on each sub-region through the tracking model corresponding to the expanded region to obtain sub-region features of each sub-region, and then for each sub-region, the similarity between the sub-region and the labeled region is obtained through the sub-region features of the tracking model comparison sub-region corresponding to the expanded region and the pre-stored region features of the labeled region.

And step 303c, determining the region to be marked corresponding to the expansion region from the plurality of sub-regions according to the plurality of similarities.

After obtaining the plurality of similarities output by the tracking model, the terminal device may use a sub-region corresponding to the maximum similarity among the plurality of similarities as a region to be labeled of the non-first frame image.

In a possible implementation manner, for a plurality of sub-regions of the same extension region, the terminal device may obtain the similarity corresponding to each sub-region, sequence the similarities in descending order, then determine the similarity sequenced at the first position, and then take the sub-region corresponding to the similarity sequenced at the first position as the region to be labeled, that is, select the sub-region corresponding to the largest similarity from the plurality of similarities as the region to be labeled corresponding to the extension region. For example, information such as the center coordinate, the length, the width, and the like of the sub-region corresponding to the maximum similarity may be recorded, so that the sub-region is used as the region to be labeled.

It should be noted that, in practical applications, the size and the position of the sub-region corresponding to the maximum similarity may also be adjusted. For example, after determining the sub-region corresponding to the maximum similarity, the terminal device may respectively expand or reduce the sub-region in four directions, i.e., up, down, left, and right, according to a preset sliding step length, compare the labeled region corresponding to the previous frame of image with the adjusted sub-region through the tracking model, and obtain the similarity between the labeled region and the adjusted sub-region again, and when the number of times of adjustment on the sub-region reaches a preset adjustment threshold, may select the adjusted sub-region corresponding to the maximum similarity from the adjusted sub-regions as the region to be labeled according to the similarities corresponding to the adjusted sub-regions.

And step 303d, storing the sub-region characteristics corresponding to the region to be marked.

The terminal device may store the sub-region features corresponding to the region to be labeled, and may compare the stored sub-region features with the sub-region features corresponding to each sub-region in the expansion region in the next frame image during the process of labeling the next frame image, so as to obtain the similarity of the updated tracking model output.

For example, the terminal device may establish an initial annotation file (e.g., an empty annotation file) when reading the non-top frame image, may store the attribute information corresponding to the to-be-annotated region in the initial annotation file after identifying the to-be-annotated region, and may obtain the annotation file including the attribute information corresponding to each to-be-annotated region for the non-top frame image after identifying each to-be-annotated region in the non-top frame image.

In addition, the terminal device may update the updated tracking model again according to the stored characteristics of the sub-regions in a similar manner to step 301, so that the updated tracking model can be distinguished and identified from the next frame of image, and the region to be labeled is obtained.

And step 304, labeling each region to be labeled according to the similarity between the region to be labeled and the corresponding labeled region.

After the terminal device identifies and obtains each to-be-labeled area of the non-first frame image, for each to-be-labeled area, whether the to-be-labeled area includes an object included in the corresponding labeled area or not can be determined according to the corresponding similarity of each to-be-labeled area, and whether the to-be-labeled area is labeled or not is determined according to the judgment result.

In a possible implementation manner, for each to-be-labeled region in the non-first-frame image, the terminal device may obtain a similarity corresponding to the to-be-labeled region, compare the similarity with a preset similarity threshold, and determine a size relationship between the similarity and the similarity threshold.

Correspondingly, if the similarity is greater than the preset similarity threshold, it indicates that the to-be-labeled region may include an object corresponding to the labeled region corresponding to the previous frame of image, and the terminal device may label the to-be-labeled region, so that the to-be-labeled region becomes the labeled region of the non-first frame of image. If the similarity is smaller than or equal to the similarity threshold, it is indicated that the object marked by the corresponding marked region in the previous frame of image may not be in the identified region to be marked, the terminal device may not mark the region to be marked, and delete the tracking model corresponding to the marked region in the previous frame of image, so as to avoid the terminal device storing a redundant tracking model.

Further, if the similarity corresponding to the region to be labeled is smaller than the similarity threshold, it may be determined whether the labeled region corresponding to the previous frame image is close to the boundary of the previous frame image, and if the labeled region is close to the boundary of the previous frame image, the object labeled by the labeled region in the previous frame image may not appear in the current non-first frame image, and the region to be labeled may not be labeled. However, if the labeling area corresponding to the previous frame image is not close to the boundary of the previous frame image, and the terminal device may lose the object in the tracking process, the labeling may be suspended and the user may be reminded that the terminal device loses the object, so that the user may re-label the current non-first frame image.

In addition, after the terminal device determines the lost object, the number corresponding to each labeled area in the non-first frame image can be decreased progressively. If the nth object in the non-first frame image is lost, the number of each labeling area and the number of the corresponding tracking model need to be decreased from the (n + 1) th labeling area. For example, the non-first frame image includes 10 labeled objects, and if the 7 th object is lost, the numbers of the labeling areas and the numbers of the tracking models corresponding to the 8 th, 9 th and 10 th objects need to be reduced by 1.

It should be noted that, in the process of labeling the image data, the user may observe the labeled image data, determine whether the image data includes an object that is not labeled in the first frame image, and if the non-first frame image includes an object that is not labeled, the user may pause labeling and manually label the non-first frame image.

Correspondingly, if the pause operation triggered by the user is detected, the terminal equipment can stop marking the image data, and mark the current frame image according to the marking operation triggered again by the user to obtain a newly added marking area.

The current frame image may be a non-first frame image that is currently displayed to a user and used for labeling according to a labeling operation triggered by the user. Similarly, the newly added labeled region may be a region corresponding to a labeling operation triggered by a user, that is, a region corresponding to a target labeled by the user in a non-first frame image.

Moreover, similar to the first frame image, the history tracking model corresponding to the newly added labeled region may also be a preset initial tracking model.

In addition, if a newly added labeling area and a corresponding tracking model are added to the non-first frame image, the newly added labeling area and the corresponding tracking model can be numbered in an increasing manner on the basis of the number of the labeled object. For example, if the non-first frame image includes 10 objects and the user adds and labels 2 objects, the number of the additional labeling area corresponding to the additional 2 objects and the number of the corresponding tracking model may be 11 and 12, respectively.

The process of generating the new labeling area and the corresponding tracking model by the terminal device according to the labeling operation is similar to step 301, and is not described herein again.

In addition, in practical application, the terminal device may acquire and label each frame image in the image data according to a time sequence, and if the terminal device cannot acquire a next frame image after a certain frame image is labeled, it indicates that labeling of each frame image is completed, and may stop labeling the image data, and remind the user of the completion of labeling.

In summary, in the data annotation method provided in the embodiment of the present application, for each non-first frame image in the image data, the terminal device may update the historical tracking model corresponding to each labeled region according to the region feature corresponding to each labeled region in the last frame image of the non-first frame image, to obtain each tracking model corresponding to the non-first frame image, then perform region tracking on the non-first frame image through each tracking model, to obtain the region to be labeled corresponding to each tracking model, for each region to be labeled, the terminal device may label the region to be labeled according to the similarity between the region to be labeled and the corresponding labeled region, without training a labeling model matching with the labeling requirement of this time, or manually labeling a large number of frame images, and may identify and label each region to be labeled in the non-first frame image through the tracking model obtained based on the region feature update, the workload of labeling the image data is reduced, the time spent on labeling the image data is reduced, and the efficiency of labeling the image data is improved.

In addition, in the process of labeling the image data by the terminal equipment, the terminal equipment can stop labeling according to pause operation triggered by the user, and label the currently displayed non-first frame image again according to labeling operation triggered by the user, so that objects which are not labeled in the image data are avoided being omitted, and the flexibility and the accuracy of labeling the image data are improved.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Corresponding to the data annotation method described in the foregoing embodiment, fig. 6 is a block diagram of a data annotation device provided in the embodiment of the present application, and for convenience of description, only the parts related to the embodiment of the present application are shown.

Referring to fig. 6, the apparatus includes:

an updating module 601, configured to update, for each non-first-frame image in the image data, a historical tracking model corresponding to each labeled region according to a region feature corresponding to each labeled region in a previous-frame image of the non-first-frame image, to obtain each tracking model corresponding to the non-first-frame image, where the historical tracking models correspond to the labeled regions in the previous-frame image one to one;

a tracking module 602, configured to perform region tracking on the non-first frame image through each tracking model to obtain a region to be labeled corresponding to each tracking model;

the first labeling module 603 is configured to label, for each to-be-labeled region, the to-be-labeled region according to a similarity between the to-be-labeled region and the corresponding labeling region.

Optionally, the tracking module 602 is specifically configured to perform region expansion in the non-first frame image based on the labeled region corresponding to each historical tracking model to obtain a plurality of expanded regions; for each expansion region, determining the similarity between each sub-region in the expansion region and the labeled region through a tracking model corresponding to the expansion region, wherein the sub-regions are obtained by dividing in the expansion region according to a preset identification mode; and determining the region to be marked corresponding to the expansion region from the plurality of sub-regions according to the plurality of similarity.

Optionally, the tracking module 602 is further specifically configured to perform feature extraction on each sub-region through a tracking model corresponding to the extended region, so as to obtain sub-region features of each sub-region; for each sub-region, comparing the sub-region characteristics of the sub-region with the pre-stored region characteristics of the labeled region through the tracking model corresponding to the expanded region, and obtaining the similarity between the sub-region and the labeled region.

Optionally, the tracking module 602 is further specifically configured to select a sub-region corresponding to the maximum similarity from the multiple similarities as the to-be-labeled region corresponding to the extended region.

Optionally, referring to fig. 7, the apparatus further includes:

the storage module 604 is configured to store a sub-region feature corresponding to the to-be-labeled region.

Optionally, the tracking module 602 is further specifically configured to, for each labeled region, expand the boundary of the labeled region according to a preset expansion coefficient to obtain an expanded boundary; and generating the expanded region in the non-first frame image according to the expanded boundary by taking the center of the marked region as a reference.

Optionally, the first labeling module 603 is further configured to label the to-be-labeled region if the similarity is greater than a preset similarity threshold; if the similarity is less than or equal to the similarity threshold, deleting the tracking model corresponding to the labeled area.

Optionally, referring to fig. 8, the apparatus further includes:

an obtaining module 605, configured to obtain a first frame image in the image data;

a second labeling module 606, configured to label the first frame image according to a labeling operation triggered by a user, to obtain at least one labeled region of the first frame image, where a historical tracking model corresponding to each labeled region of the first frame image is a preset initial tracking model.

Optionally, referring to fig. 9, the apparatus further includes:

a stopping module 607, configured to, in the process of labeling the image data, stop labeling the image data if a pause operation triggered by a user is detected;

and a third labeling module 608, configured to label the current frame image according to a labeling operation triggered again by the user, so as to obtain a new labeling area, where a history tracking model corresponding to the new labeling area is a preset initial tracking model.

In summary, the data annotation device provided in the embodiment of the present application can update the historical tracking model corresponding to each labeled region according to the region feature corresponding to each labeled region in the previous frame of image of the non-first frame of image, to obtain each tracking model corresponding to the non-first frame of image, then perform region tracking on the non-first frame of image through each tracking model, to obtain the region to be labeled corresponding to each tracking model, and for each region to be labeled, the terminal device can label the region to be labeled according to the similarity between the region to be labeled and the corresponding labeled region, without training the labeling model matching with the labeling requirement of this time, or manually labeling a large number of frame images, and can identify and label each region to be labeled in the non-first frame of image through the tracking model obtained based on the region feature update, the workload of labeling the image data is reduced, the time spent on labeling the image data is reduced, and the efficiency of labeling the image data is improved.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or apparatus capable of carrying computer program code to a terminal device, recording medium, computer Memory, Read-Only Memory (ROM), Random-Access Memory (RAM), electrical carrier wave signals, telecommunications signals, and software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A method for annotating data, comprising:

2. The data annotation method of claim 1, wherein the performing region tracking on the non-first frame image through each tracking model to obtain a region to be annotated corresponding to each tracking model comprises:

3. The data annotation method of claim 2, wherein the determining the similarity between each sub-region in the extended region and the annotated region through the tracking model corresponding to the extended region comprises:

4. The data annotation method of claim 2, wherein the determining, according to the plurality of similarities, a region to be annotated corresponding to the extended region from among the plurality of sub-regions comprises:

5. The data annotation method of claim 4, wherein after determining the region to be annotated corresponding to the extended region from the plurality of sub-regions according to the plurality of similarities, the method further comprises:

6. The data annotation method of claim 2, wherein the performing region expansion in the non-first-frame image based on the annotated region corresponding to each historical tracking model to obtain a plurality of expanded regions comprises:

7. The data annotation method of claim 1, wherein the annotating the to-be-annotated region according to the similarity between the to-be-annotated region and the corresponding annotation region comprises:

8. The data annotation method according to any one of claims 1 to 7, wherein before the step of updating the historical tracking models corresponding to the respective annotated regions according to the region features corresponding to each annotated region in the previous frame image of the non-top frame image to obtain the respective tracking models corresponding to the non-top frame image, the method further comprises:

acquiring a first frame image in the image data;

9. The data annotation process of any one of claims 1 to 7, further comprising:

10. A data annotation device, comprising:

11. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 9 when executing the computer program.

12. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 9.