CN111832549B

CN111832549B - Data labeling method and device

Info

Publication number: CN111832549B
Application number: CN202010604308.2A
Authority: CN
Inventors: 胡淑萍; 程骏; 张惊涛; 郭渺辰; 王东; 顾在旺; 庞建新; 熊友军
Original assignee: Ubtech Robotics Corp
Current assignee: Ubtech Robotics Corp
Priority date: 2020-06-29
Filing date: 2020-06-29
Publication date: 2024-04-23
Anticipated expiration: 2040-06-29
Also published as: CN111832549A

Abstract

The application is applicable to the technical field of image processing, and provides a data labeling method and a device, wherein the data labeling method comprises the following steps: for each non-first frame image in the image data, updating a historical tracking model corresponding to each labeling area according to the area characteristics corresponding to each labeling area in the previous frame image of the non-first frame image to obtain each tracking model corresponding to the non-first frame image; carrying out region tracking on the non-first frame image through each tracking model to obtain a region to be marked corresponding to each tracking model; and for each region to be marked, marking the region to be marked according to the similarity between the region to be marked and the corresponding marking region. The tracking model obtained based on the region feature update can identify and label each region to be labeled in the non-initial frame image, so that the workload of labeling the image data is reduced, the time spent on labeling the image data is reduced, and the efficiency of labeling the image data is improved.

Description

Data labeling method and device

Technical Field

The application belongs to the technical field of image processing, and particularly relates to a data labeling method and device.

Background

Along with the continuous development of image processing technology, each image frame in the image data can be marked through the terminal equipment, so that a supervised learning mode can be adopted, training can be carried out according to the marked image data, and a model for detection can be obtained.

In the related art, the terminal device may first label part of the image data according to the operation triggered by the user, and train according to the labeled image data to obtain a labeling model for labeling the image data, and then input the remaining image data into the labeling model, label the remaining image data through the labeling model, and obtain labeled image data.

However, in the data labeling method, a large number of frame images need to be manually labeled in each process of labeling data to train to obtain a labeling model, so that the problem of large workload of labeling image data is caused.

Disclosure of Invention

The embodiment of the application provides a data labeling method and a data labeling device, which can solve the problem of large workload of labeling image data.

In a first aspect, an embodiment of the present application provides a data labeling method, including:

For each non-first frame image in image data, updating a history tracking model corresponding to each labeling area according to the area characteristic corresponding to each labeling area in a previous frame image of the non-first frame image to obtain each tracking model corresponding to the non-first frame image, wherein the history tracking models are in one-to-one correspondence with the labeling areas in the previous frame image;

Carrying out region tracking on the non-initial frame image through each tracking model to obtain a region to be marked corresponding to each tracking model;

And for each region to be marked, marking the region to be marked according to the similarity between the region to be marked and the corresponding marking region.

Alternatively to this, the method may comprise,

The step of carrying out region tracking on the non-initial frame image through each tracking model to obtain a region to be marked corresponding to each tracking model comprises the following steps:

performing region expansion in the non-first frame image based on the labeling region corresponding to each history tracking model to obtain a plurality of expansion regions;

For each expansion region, determining the similarity between each subarea in the expansion region and the marking region through a tracking model corresponding to the expansion region, wherein the subareas are obtained by dividing the subareas in the expansion region according to a preset recognition mode;

And determining the region to be marked corresponding to the expansion region from the plurality of sub-regions according to the similarity.

Optionally, the determining, by using the tracking model corresponding to the extended area, the similarity between each sub-area in the extended area and the labeling area includes:

extracting features of each sub-region through a tracking model corresponding to the extension region to obtain sub-region features of each sub-region;

and comparing the subarea characteristics of the subareas with the prestored area characteristics of the marked areas through the tracking model corresponding to the extension area for each subarea to obtain the similarity between the subareas and the marked areas.

Optionally, the determining, according to the similarities, the region to be marked corresponding to the extension region from the multiple sub-regions includes:

And selecting a sub-region corresponding to the maximum similarity from the multiple similarities as a region to be marked corresponding to the expansion region.

Optionally, after determining the region to be annotated corresponding to the extension region from the plurality of sub-regions according to the plurality of similarities, the method further includes:

And storing the sub-region characteristics corresponding to the region to be marked.

Optionally, the performing area expansion in the non-first frame image based on the labeling area corresponding to each history tracking model to obtain a plurality of expansion areas includes:

for each labeling area, expanding the boundary of the labeling area according to a preset expansion coefficient to obtain an expanded boundary;

And generating the expansion area in the non-first frame image according to the expanded boundary by taking the center of the labeling area as a reference.

Optionally, the labeling the to-be-labeled area according to the similarity between the to-be-labeled area and the corresponding labeling area includes:

if the similarity is larger than a preset similarity threshold, labeling the region to be labeled;

and if the similarity is smaller than or equal to the similarity threshold, deleting the tracking model corresponding to the marked area.

Optionally, before updating the historical tracking models corresponding to the labeling areas according to the area characteristics corresponding to each labeling area in the previous frame image of the non-first frame image to obtain each tracking model corresponding to the non-first frame image, the method further includes:

Acquiring a first frame image in the image data;

And marking the first frame image according to marking operation triggered by a user to obtain at least one marking area of the first frame image, wherein a historical tracking model corresponding to each marking area of the first frame image is a preset initial tracking model.

Optionally, the method further comprises:

in the process of marking the image data, stopping marking the image data if a pause operation triggered by a user is detected;

And marking the current frame image according to marking operation triggered by the user again to obtain a newly added marking area, wherein a historical tracking model corresponding to the newly added marking area is a preset initial tracking model.

In a second aspect, an embodiment of the present application provides a data labeling apparatus, including:

The updating module is used for updating the historical tracking model corresponding to each labeling area in the previous frame image of each non-first frame image in the image data according to the area characteristics corresponding to each labeling area in the previous frame image of the non-first frame image to obtain each tracking model corresponding to the non-first frame image, wherein the historical tracking models are in one-to-one correspondence with the labeling areas in the previous frame image;

The tracking module is used for carrying out area tracking on the non-first frame image through each tracking model to obtain a region to be marked corresponding to each tracking model;

The first labeling module is used for labeling each region to be labeled according to the similarity between the region to be labeled and the corresponding labeling region.

Optionally, the tracking module is specifically configured to perform region expansion in the non-first frame image based on the labeling region corresponding to each history tracking model, so as to obtain a plurality of expansion regions; for each expansion region, determining the similarity between each subarea in the expansion region and the marking region through a tracking model corresponding to the expansion region, wherein the subareas are obtained by dividing the subareas in the expansion region according to a preset recognition mode; and determining the region to be marked corresponding to the expansion region from the plurality of sub-regions according to the similarity.

Optionally, the tracking module is further specifically configured to perform feature extraction on each sub-region through a tracking model corresponding to the extended region, so as to obtain a sub-region feature of each sub-region; and comparing the subarea characteristics of the subareas with the prestored area characteristics of the marked areas through the tracking model corresponding to the extension area for each subarea to obtain the similarity between the subareas and the marked areas.

Optionally, the tracking module is further specifically configured to select, from the multiple similarities, a sub-region corresponding to the maximum similarity as a region to be marked corresponding to the extension region.

Optionally, the apparatus further includes:

and the storage module is used for storing the sub-region characteristics corresponding to the region to be marked.

Optionally, the tracking module is further specifically configured to expand, for each labeling area, a boundary of the labeling area according to a preset expansion coefficient, to obtain an expanded boundary; and generating the expansion area in the non-first frame image according to the expanded boundary by taking the center of the labeling area as a reference.

Optionally, the first labeling module is further configured to label the region to be labeled if the similarity is greater than a preset similarity threshold; and if the similarity is smaller than or equal to the similarity threshold, deleting the tracking model corresponding to the marked area.

Optionally, the apparatus further includes:

the acquisition module is used for acquiring a first frame image in the image data;

The second labeling module is used for labeling the first frame image according to labeling operation triggered by a user to obtain at least one labeling area of the first frame image, and a historical tracking model corresponding to each labeling area of the first frame image is a preset initial tracking model.

Optionally, the apparatus further includes:

The stopping module is used for stopping marking the image data if a pause operation triggered by a user is detected in the process of marking the image data;

And the third labeling module is used for labeling the current frame image according to the labeling operation triggered by the user again to obtain a newly added labeling area, and the historical tracking model corresponding to the newly added labeling area is a preset initial tracking model.

In a third aspect, an embodiment of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor executes the computer program to implement a data labeling method according to any one of the first aspect.

In a fourth aspect, an embodiment of the present application provides a computer readable storage medium, where a computer program is stored, where the computer program is executed by a processor to implement a data labeling method according to any one of the first aspects above.

In a fifth aspect, an embodiment of the present application provides a computer program product, which, when run on a terminal device, causes the terminal device to perform the data annotation method according to any of the first aspects above.

Compared with the prior art, the embodiment of the application has the beneficial effects that:

In the embodiment of the application, for each non-initial frame image in the image data, the terminal equipment can update the historical tracking model corresponding to each labeling area according to the area characteristic corresponding to each labeling area in the previous frame image of the non-initial frame image to obtain each tracking model corresponding to the non-initial frame image, then the non-initial frame image is subjected to area tracking through each tracking model to obtain the area to be labeled corresponding to each tracking model, for each area to be labeled, the terminal equipment can label the area to be labeled according to the similarity between the area to be labeled and the corresponding labeling area, the labeling model matched with the labeling requirement does not need to be trained, and a large number of frame images do not need to be labeled manually.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a system architecture of a data labeling system according to the data labeling method provided by the embodiment of the present application;

fig. 2 is a schematic structural diagram of a terminal device according to an embodiment of the present application;

FIG. 3 is a schematic flow chart of a data labeling method provided by an embodiment of the present application;

FIG. 4 is a schematic flow chart of region tracking of a non-first frame image according to an embodiment of the present application;

FIG. 5a is a schematic diagram of an annotation region according to an embodiment of the application;

FIG. 5b is a schematic diagram of an extended area provided by an embodiment of the present application;

FIG. 6 is a block diagram of a data labeling apparatus according to an embodiment of the present application;

FIG. 7 is a block diagram of another data labeling apparatus according to an embodiment of the present application;

FIG. 8 is a block diagram of a further data annotation device according to an embodiment of the application;

fig. 9 is a block diagram of still another data labeling apparatus according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

The terminology used in the following examples is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the application and the appended claims, the singular forms "a," "an," "the," and "the" are intended to include, for example, "one or more" such forms of expression, unless the context clearly indicates to the contrary. It should also be understood that in embodiments of the present application, "one or more" means one, two, or more than two; "and/or", describes an association relationship of the association object, indicating that three relationships may exist; for example, a and/or B may represent: a alone, a and B together, and B alone, wherein A, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship.

The data labeling method provided by the embodiment of the application can be applied to terminal equipment such as mobile phones, tablet computers, notebook computers, ultra-mobile personal computer (UMPC), netbooks, personal Digital Assistants (PDA) and the like, and the embodiment of the application does not limit the specific types of the terminal equipment.

For example, the terminal device may be a STATION (ST) in a WLAN, may be a Personal digital processing (Personal DIGITAL ASSISTANT, PDA) device, a computing device or other processing device connected to a wireless modem, a computer, a laptop computer, a handheld communication device, a handheld computing device, a satellite radio device, a wireless modem card, etc.

Fig. 1 is a schematic diagram of a system architecture of a data labeling system related to a data labeling method according to an embodiment of the present application, and referring to fig. 1, the data labeling system may include: terminal equipment 110 and server 120, terminal equipment 110 is connected with server 120.

The server 120 may acquire the acquired image data, and forward the image data to the terminal device 110. The terminal device 110 may receive the image data sent by the server 120, and label each frame image in the image data, to obtain labeled image data for model training.

In particular, the image data may be continuous video data, for example, the image data may be video data of traffic volume acquired by cameras distributed over different streets.

In one possible implementation manner, the server 120 may receive the image data collected by each image collecting device, and forward the collected image data to the terminal device 110, then the terminal device 110 may receive the image data, label the first frame image in the image data based on the labeling operation triggered by the user, generate a tracking model corresponding to the labeling area of each first frame image, and label the non-first frame image in the image data according to the tracking model, so as to obtain labeled image data.

In the process of marking each frame image in the image data according to the time sequence, if the terminal device detects the pause operation triggered by the user, the terminal device can stop marking the image data, form a new added marking area according to the marking operation triggered by the user again, and generate a tracking model corresponding to the added marking area.

In addition, in practical applications, the terminal device 110 may also receive the image data collected by the image collecting device, instead of forwarding the image data through the server 120, that is, the terminal device 110 is connected to the image collecting device, and the image collecting device may send the image data to the terminal device 110 after collecting the image data.

Fig. 2 is a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in fig. 2, the terminal device 2 of this embodiment includes: at least one processor 21 (only one shown in fig. 2), a memory 22, and a computer program stored in the memory 22 and executable on the at least one processor 21, the processor 21 implementing the steps in any of the various data tagging method embodiments described below when the computer program 22 is executed.

The terminal device 2 may be a computing device such as a desktop computer, a notebook computer, a palm computer, a cloud server, etc. The terminal device may include, but is not limited to, a processor 21, a memory 22. It will be appreciated by those skilled in the art that fig. 2 is merely an example of the terminal device 2 and is not meant to be limiting as to the terminal device 2, and may include more or fewer components than shown, or may combine certain components, or different components, such as may also include input-output devices, network access devices, etc.

The Processor 21 may be a central processing unit (Central Processing Unit, CPU), the Processor 21 may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 22 may in some embodiments be an internal storage unit of the terminal device 2, such as a hard disk or a memory of the terminal device 2. The memory 22 may in other embodiments also be an external storage device of the terminal device 2, such as a plug-in hard disk provided on the terminal device 2, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like. Further, the memory 22 may also include both an internal storage unit and an external storage device of the terminal device 2. The memory 22 is used to store an operating system, application programs, boot loader (BootLoader), data, and other programs, etc., such as program code for a computer program, etc. The memory 22 may also be used to temporarily store data that has been output or is to be output.

Fig. 3 is a schematic flowchart of a data labeling method according to an embodiment of the present application, which may be applied to the terminal device described above, and see fig. 3, by way of example and not limitation, and the method includes:

And step 301, labeling the first frame image.

The image data may include a first frame image and a non-first frame image, where the first frame image is a frame image arranged in time sequence in the first position in the image data, and the non-first frame image is another frame image except the first frame image in the image data. For example, the image data is composed of frame images acquired every 1 second within 10 seconds, and the image data may include 10 frame images, where the frame image acquired at 1 st second is a first frame image, and the frame images acquired at 2 nd to 10 th seconds are non-first frame images.

In the process of marking the image data, the terminal device can firstly acquire a first frame image in the image data, mark the first frame image according to marking operation triggered by a user, and obtain at least one marking area of the first frame image, so that in the subsequent step, a corresponding tracking model can be generated according to a history tracking model corresponding to each marking area of the first frame image, and the history tracking model corresponding to each marking area of the first frame image can be a preset initial tracking model.

In one possible implementation manner, the terminal device may first acquire and display the first frame image to the user according to the time sequence corresponding to each frame image in the image data, and then detect the labeling operation triggered by the user, and determine the object corresponding to the labeling operation in the first frame image, and the region where the object is located, that is, the labeling region.

In addition, the user can also input attribute information corresponding to each marked object in the process of marking the first frame image. For example, the attribute information corresponding to the nth object is (x _n ¹,y _n ¹),x_n ¹ is used for indicating the labeling area corresponding to the nth labeled object in the 1 st frame image, such as information of central coordinates, length, width and the like of the labeling area, and y _n ¹ is used for indicating the label of the nth labeled object in the 1 st frame image, such as information of name, number, blocked state, rotation state and the like.

In practical application, each frame image may include at least one labeling area, and for simplicity of explanation, the embodiment of the present application only uses one labeling area as an example for explanation, and the number of labeling areas included in each frame image is not limited.

In addition, the terminal device may label the first frame image in other manners, for example, the first frame image may be labeled according to a pre-trained model, and the manner of labeling the first frame image in the embodiment of the present application is not limited.

Step 302, for each non-first frame image in the image data, updating the historical tracking model corresponding to each labeling area according to the area characteristics corresponding to each labeling area in the previous frame image of the non-first frame image, and obtaining each tracking model corresponding to the non-first frame image.

The historical tracking models are in one-to-one correspondence with the labeling areas in the previous frame of image.

The last frame image of the non-first frame image is used for representing the first frame image or other non-first frame images which are adjacent to the non-first frame image and have the time sequence earlier than that of the non-first frame image. For example, the image data includes N frame images, N is a positive integer, if x is greater than 2 and x is less than or equal to N, the previous frame image of the xth frame image is the xth-1 frame image; if x is equal to 2, the previous frame of the x-th frame of image is the first frame of image, namely the 1 st frame of image.

After the first frame image is marked according to the operation triggered by the user, the terminal equipment can update the preset initial tracking model according to the region characteristics corresponding to each marked region in the first frame image to obtain the tracking model corresponding to the next frame image, so that the non-first frame image in the image data is marked continuously through the tracking model, and each frame image in the image data is marked.

In the process of marking the non-first frame image, the terminal equipment can firstly acquire the area characteristics corresponding to any marking area in the previous frame image of the pre-stored non-first frame image, and then update the historical tracking model corresponding to the marking area according to the area characteristics to obtain the tracking model for carrying out area identification on the non-first frame image, so that in the follow-up step, the non-first frame image can be marked through each tracking model to obtain the marking area of the non-first frame image.

For example, corresponding to the above example, the non-first frame image acquired by the terminal device may be an xth frame image, and if x is greater than 2 and x is less than or equal to N, a history tracking model corresponding to the xth-1 frame image may be acquired, and a tracking model corresponding to the xth frame image is generated; if x is equal to 2, a history tracking model corresponding to the 1 st frame image, that is, a preset initial tracking model, can be obtained, and a tracking model corresponding to the 2 nd frame image is generated.

And 303, carrying out region tracking on the non-first frame image through each tracking model to obtain a region to be marked corresponding to each tracking model.

The terminal equipment can carry out region tracking on the non-first frame image through the acquired tracking model, and determine the region to be marked similar to each marked region in the previous frame image, so that in the subsequent step, the non-first frame image can be marked according to each region to be marked, and the marked region of the non-first frame image is obtained.

In the process of performing region tracking on the non-first frame image, the terminal device may first expand the labeling region based on the position of the labeling region in the previous frame image, and then may find the region to be labeled similar to the labeling region from each sub-region of the expanded region, and referring to fig. 4, this step 303 may include: step 303a, step 303b, step 303c and step 303d.

And 303a, performing region expansion in the non-first frame image based on the labeling region corresponding to each history tracking model to obtain a plurality of expansion regions.

After the terminal equipment obtains the tracking model of the previous frame of image, the terminal equipment can continuously carry out regional tracking on the non-first frame of image according to the tracking model. And the positions of the objects in each marked area in the image data in the non-first frame image may also change along with the change of time, before the area to be marked is determined, the marked area corresponding to the previous frame image may be expanded to obtain an expanded area, so that in the subsequent step, the area to be marked may be determined from the expanded area.

In one possible implementation manner, the terminal device may first obtain the positions of the corresponding labeling areas in the previous frame image (such as the center coordinates, the length, the width, and the like of the labeling areas), and then in the non-first frame image, according to the positions of the labeling areas, the terminal device expands outwards along the boundaries of the labeling areas to obtain a plurality of expansion areas.

Optionally, in the process of expanding the labeling areas, for each labeling area, the terminal device may expand the boundary of the labeling area according to a preset expansion coefficient to obtain an expanded boundary, and then generate an expansion area in the non-first frame image according to the expanded boundary with the center of the labeling area as a reference.

Specifically, the terminal device may use a product between the expansion coefficient and the boundary length of each labeling area as each expanded boundary. And then, combining the expanded boundaries of the same marked area into a regular graph, and adjusting the position of the regular graph to enable the center of the regular graph to coincide with the position indicated by the center coordinates of the marked area, so as to obtain the expanded area in the non-first frame image.

It should be noted that, if the corresponding labeling area in the previous frame image is close to the boundary of the previous frame image, after the labeling area is expanded, a part of the expansion area may already extend beyond the boundary of the previous frame image, and then the intersecting part of the expansion area and the previous frame image may be used as the expansion area for determining the area to be labeled.

For example, referring to fig. 5a and 5B, fig. 5a shows a total of A, B, C and D rectangular labeling areas in the previous frame image, and if the prestored expansion coefficient is 2.5, the length and width of the boundary of the labeling areas can be expanded by 2.5 times, so as to obtain expansion areas a ', B', C 'and D' in the non-first frame image as shown in fig. 5B.

After the labeling area D is expanded when the labeling area D is close to the boundary of the previous frame image, as shown in fig. 5b, the expansion area D 'is smaller than the area expanded according to the expansion coefficient of 2.5, and only the portion intersecting with the previous frame image after expansion is used as the expansion area D'.

In addition, in practical application, the terminal device may expand multiple labeling areas in the previous frame image according to the above manner to obtain multiple expansion areas, or after expanding a certain labeling area, execute step 303b and step 303c to determine a region to be labeled, and then return to step 303a to expand another labeling area in the previous frame image, and execute the above process in a circulating manner until the expansion of the labeling area in the previous frame image is completed. Of course, the terminal device may also extend the labeling area in the previous frame of image in other manners, which is not limited in the embodiment of the present application.

Step 303b, for each extended region, determining the similarity between each sub-region in the extended region and the labeling region through the tracking model corresponding to the extended region.

The subareas are obtained by dividing the subareas in the expansion areas according to a preset identification mode.

After the terminal equipment expands to obtain the expansion areas, for each expansion area, the expansion area can be compared with the corresponding labeling area in the previous frame of image through a tracking model corresponding to the expansion area, so that the similarity between each sub-area in the expansion area and the labeling area is obtained.

In one possible implementation manner, for each extended area, the terminal device may input the extended area into the tracking model, traverse the extended area through the tracking model according to the size of the preset sub-area and the sliding step length to obtain a plurality of sub-areas, and then compare each sub-area with the labeling area corresponding to the extended area to obtain the similarity between each sub-area and the labeling area.

The similarity is used for representing the similarity between the sub-region and the labeling region, and the terminal device can determine whether the object in the labeling region of the previous frame image is displaced to the position corresponding to the sub-region of the non-first frame image according to the similarity. For example, if the tracking model is a correlation filter, the similarity may be a correlation coefficient output in the correlation filter; if the tracking model is a twin network, the confidence level of the twin network can be used as the similarity, and the embodiment of the application does not limit the tracking model and the similarity.

Optionally, in the process of comparing the subareas with the labeling areas, feature extraction can be performed on each subarea through a tracking model corresponding to the expansion area to obtain subarea features of each subarea, and then for each subarea, the subarea features of the subareas and the pre-stored area features of the labeling areas are compared through the tracking model corresponding to the expansion area to obtain the similarity between the subareas and the labeling areas.

And 303c, determining the region to be marked corresponding to the expansion region from the plurality of sub-regions according to the plurality of similarities.

After obtaining the multiple similarities output by the tracking model, the terminal device can use the sub-region corresponding to the maximum similarity in the multiple similarities as the region to be marked of the non-first frame image.

In one possible implementation manner, for a plurality of sub-regions of the same extension region, the terminal device may obtain the similarity corresponding to each sub-region, rank the similarities in order from large to small, determine the similarity ranked in the first position, and then use the sub-region corresponding to the similarity ranked in the first position as the region to be marked, that is, select the sub-region corresponding to the maximum similarity from the plurality of similarities as the region to be marked corresponding to the extension region. For example, the information such as the center coordinates, the length, the width and the like of the sub-region corresponding to the maximum similarity may be recorded, so that the sub-region is used as the region to be marked.

In practical application, the size and the position of the sub-region corresponding to the maximum similarity may be adjusted. For example, after determining the sub-region corresponding to the maximum similarity, the terminal device may expand or contract the sub-region in the up-down, left-right directions according to a preset sliding step, compare the labeling region corresponding to the previous frame of image with the adjusted sub-region through the tracking model, obtain the similarity between the labeling region and the adjusted sub-region again, and when the adjustment times of the sub-region reach the preset adjustment threshold, select the adjusted sub-region corresponding to the maximum similarity from the multiple adjusted sub-regions according to the similarity corresponding to the multiple adjusted sub-regions, as the region to be labeled.

Step 303d, storing the sub-region features corresponding to the region to be marked.

The terminal equipment can store the sub-region characteristics corresponding to the region to be marked, and can compare the stored sub-region characteristics with the sub-region characteristics corresponding to each sub-region in the expansion region in the next frame image in the process of marking the next frame image, so that the similarity of the updated tracking model output is obtained.

For example, the terminal device may establish an initial annotation file (such as an empty annotation file) when the non-first frame image is read, after identifying the to-be-annotated areas, may store attribute information corresponding to the to-be-annotated areas in the initial annotation file, and after identifying each to-be-annotated area in the non-first frame image, may obtain an annotation file for the non-first frame image including attribute information corresponding to each to-be-annotated area.

In addition, in a similar manner to step 301, the terminal device may further update the updated tracking model according to the stored sub-region features, so that the updated tracking model may be identified by distinguishing the next frame of image, and the region to be marked is obtained.

And 304, labeling each region to be labeled according to the similarity between the region to be labeled and the corresponding labeling region.

After the terminal device identifies each region to be marked of the non-first frame image, for each region to be marked, whether the region to be marked includes the object included in the corresponding marking region or not can be determined according to the similarity corresponding to each region to be marked, and whether the region to be marked is marked or not is determined according to the judging result.

In one possible implementation manner, for each region to be marked in the non-initial frame image, the terminal device may acquire a similarity corresponding to the region to be marked, and compare the similarity with a preset similarity threshold value to determine a magnitude relationship between the similarity and the similarity threshold value.

Correspondingly, if the similarity is greater than a preset similarity threshold, the to-be-marked area possibly comprises an object corresponding to the corresponding marked area in the previous frame of image, and the terminal equipment can mark the to-be-marked area, so that the to-be-marked area becomes the marked area of the non-first frame of image. If the similarity is smaller than or equal to the similarity threshold, the object marked by the corresponding marked area in the previous frame image is indicated, and possibly not in the identified area to be marked, the terminal equipment can not mark the area to be marked, delete the tracking model corresponding to the marked area in the previous frame image, and avoid the terminal equipment to store redundant tracking models.

Further, if the similarity corresponding to the region to be marked is smaller than the similarity threshold, it may be determined whether the marked region corresponding to the previous frame image is close to the boundary of the previous frame image, and if the marked region is close to the boundary of the previous frame image, the object marked by the marked region in the previous frame image may not appear in the current non-first frame image, and the region to be marked may not be marked. However, if the labeling area corresponding to the previous frame of image is not close to the boundary of the previous frame of image, the terminal device may lose the object in the tracking process, the labeling may be suspended and the user terminal device may be reminded of losing the object, so that the user may remark the current non-first frame of image.

In addition, after determining the lost object, the terminal device may further decrement the corresponding number of each labeling area in the non-first frame image. If the nth object in the non-first frame image is lost, starting from the (n+1) th labeling area, the label of each labeling area and the number of the corresponding tracking model need to be decremented. For example, if the non-first frame image includes 10 marked objects and the 7 th object is lost, the number of the marked area corresponding to the 8 th, 9 th and 10 th objects and the number of the tracking model need to be reduced by 1.

In the process of marking the image data, the user can observe the marked image data to determine whether the image data includes an object which is not marked in the first frame image, and if the image data includes an object which is not marked in the first frame image, the user can pause marking and manually mark the image which is not the first frame image.

Correspondingly, if the pause operation triggered by the user is detected, the terminal equipment can stop marking the image data, mark the current frame image according to the marking operation triggered by the user again, and obtain a new marking area.

The current frame image may be a non-first frame image that is currently displayed to the user and is used for labeling according to a labeling operation triggered by the user. Similarly, the newly added labeling area can be an area corresponding to the labeling operation triggered by the user, that is, an area corresponding to the object labeled by the user in the non-first frame image.

Moreover, similar to the first frame image, the history tracking model corresponding to the newly added labeling area may be a preset initial tracking model.

In addition, if a new labeling area and a corresponding tracking model are added in the non-first frame image, the new labeling area and the corresponding tracking model can be numbered incrementally based on the number of the labeled object. For example, if the non-first frame image includes 10 objects, and the user adds and marks 2 objects, the number of the newly added marked area corresponding to the newly added 2 objects and the number of the corresponding tracking model may be 11 and 12, respectively.

The process of generating the new labeling area and the corresponding tracking model by the terminal device according to the labeling operation is similar to step 301, and will not be described herein.

In addition, in practical application, the terminal device can acquire and label each frame image in the image data according to the time sequence, if the terminal device cannot acquire the next frame image after labeling a certain frame image, the terminal device can indicate that labeling of each frame image is finished, and can stop labeling the image data and remind a user of labeling.

In summary, according to the data labeling method provided by the embodiment of the application, for each non-initial frame image in the image data, the terminal device can update the historical tracking model corresponding to each labeling area according to the area characteristic corresponding to each labeling area in the previous frame image of the non-initial frame image to obtain each tracking model corresponding to the non-initial frame image, then each tracking model is used for carrying out area tracking on the non-initial frame image to obtain the area to be labeled corresponding to each tracking model, and for each area to be labeled, the terminal device can label the area to be labeled according to the similarity between the area to be labeled and the corresponding labeling area, does not need to train a labeling model matched with the labeling requirement at this time, does not need to manually label a large number of frame images, can identify and label each area to be labeled in the non-initial frame image through the tracking model obtained based on the area characteristic update, so that the workload of labeling the image data is reduced, the time spent on labeling the image data is reduced, and the efficiency of labeling the image data is improved.

In addition, in the process of labeling the image data by the terminal equipment, the terminal equipment can stop labeling according to pause operation triggered by a user, label the currently displayed non-first frame image again according to labeling operation triggered by the user, objects which are not labeled in the image data are avoided from being omitted, and flexibility and accuracy of labeling the image data are improved.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.

Corresponding to the data labeling method described in the above embodiments, fig. 6 is a block diagram of a data labeling device according to an embodiment of the present application, and for convenience of explanation, only the portions related to the embodiment of the present application are shown.

Referring to fig. 6, the apparatus includes:

The updating module 601 is configured to update, for each non-first frame image in the image data, a history tracking model corresponding to each labeling area according to an area feature corresponding to each labeling area in a previous frame image of the non-first frame image, so as to obtain each tracking model corresponding to the non-first frame image, where the history tracking model corresponds to the labeling area in the previous frame image one by one;

The tracking module 602 is configured to perform area tracking on the non-first frame image through each tracking model, so as to obtain a region to be marked corresponding to each tracking model;

the first labeling module 603 is configured to label, for each of the to-be-labeled areas, the to-be-labeled area according to the similarity between the to-be-labeled area and the corresponding labeling area.

Optionally, the tracking module 602 is specifically configured to perform region expansion in the non-first frame image based on the labeling region corresponding to each of the history tracking models, so as to obtain a plurality of expansion regions; for each expansion region, determining the similarity between each subarea in the expansion region and the marking region through a tracking model corresponding to the expansion region, wherein the subareas are obtained by dividing the subareas in the expansion region according to a preset recognition mode; and determining the region to be marked corresponding to the expansion region from the plurality of sub-regions according to the plurality of similarities.

Optionally, the tracking module 602 is further specifically configured to perform feature extraction on each of the sub-regions through a tracking model corresponding to the extended region, to obtain a sub-region feature of each of the sub-regions; and comparing the subarea characteristics of each subarea with the prestored area characteristics of the labeling area through a tracking model corresponding to the expansion area to obtain the similarity between the subarea and the labeling area.

Optionally, the tracking module 602 is further specifically configured to select, from the multiple similarities, a sub-region corresponding to the maximum similarity as a region to be marked corresponding to the extended region.

Optionally, referring to fig. 7, the apparatus further includes:

And the storage module 604 is used for storing the sub-region features corresponding to the region to be marked.

Optionally, the tracking module 602 is further specifically configured to expand, for each labeling area, a boundary of the labeling area according to a preset expansion coefficient, to obtain an expanded boundary; and generating the expansion area in the non-first frame image according to the expanded boundary by taking the center of the labeling area as a reference.

Optionally, the first labeling module 603 is further configured to label the region to be labeled if the similarity is greater than a preset similarity threshold; and if the similarity is smaller than or equal to the similarity threshold, deleting the tracking model corresponding to the marked area.

Optionally, referring to fig. 8, the apparatus further includes:

an acquiring module 605, configured to acquire a first frame image in the image data;

The second labeling module 606 is configured to label the first frame image according to a labeling operation triggered by a user, so as to obtain at least one labeling area of the first frame image, where a history tracking model corresponding to each labeling area of the first frame image is a preset initial tracking model.

Optionally, referring to fig. 9, the apparatus further includes:

a stopping module 607, configured to stop labeling the image data if a pause operation triggered by a user is detected during the labeling of the image data;

And a third labeling module 608, configured to label the current frame image according to a labeling operation triggered by the user again, to obtain a newly added labeling area, where a history tracking model corresponding to the newly added labeling area is a preset initial tracking model.

In summary, according to the data labeling device provided by the embodiment of the application, for each non-initial frame image in the image data, the terminal device can update the historical tracking model corresponding to each labeling area according to the area characteristic corresponding to each labeling area in the previous frame image of the non-initial frame image to obtain each tracking model corresponding to the non-initial frame image, then each tracking model is used for carrying out area tracking on the non-initial frame image to obtain the area to be labeled corresponding to each tracking model, and for each area to be labeled, the terminal device can label the area to be labeled according to the similarity between the area to be labeled and the corresponding labeling area, does not need to train a labeling model matched with the labeling requirement at this time, does not need to manually label a large number of frame images, can identify and label each area to be labeled in the non-initial frame image through the tracking model obtained based on the area characteristic update, so that the workload of labeling the image data is reduced, the time spent on labeling the image data is reduced, and the efficiency of labeling the image data is improved.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the system embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a terminal device, a recording medium, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A method for labeling data, comprising:

For each non-first frame image in image data, updating a history tracking model corresponding to each labeling area according to the area characteristic corresponding to each labeling area in a previous frame image of the non-first frame image to obtain each tracking model corresponding to the non-first frame image, wherein the history tracking models are in one-to-one correspondence with the labeling areas in the previous frame image; the tracking model is a plurality of models corresponding to a plurality of marked areas;

Carrying out region tracking on the non-initial frame image through each tracking model to obtain a region to be marked corresponding to each tracking model, wherein the method comprises the following steps: performing region expansion in the non-first frame image based on the labeling region corresponding to each history tracking model to obtain a plurality of expansion regions; for each expansion region, determining the similarity between each subarea in the expansion region and the marking region through a tracking model corresponding to the expansion region, wherein the subareas are obtained by dividing the subareas in the expansion region according to a preset recognition mode; determining a region to be marked corresponding to the expansion region from the plurality of sub-regions according to the plurality of similarities; traversing the expansion area through a tracking model according to the size of a preset subarea and the sliding step length to obtain a plurality of subareas;

2. The method for labeling data according to claim 1, wherein the determining, by the tracking model corresponding to the extended region, the similarity between each sub-region in the extended region and the labeled region includes:

3. The method for labeling data according to claim 1, wherein determining, from the plurality of sub-regions, a region to be labeled corresponding to the extended region according to the plurality of similarities, comprises:

4. The data labeling method as recited in claim 3, wherein after determining the region to be labeled corresponding to the extended region from among the plurality of sub-regions according to the plurality of similarities, the method further comprises:

5. The method for labeling data according to claim 1, wherein the performing region expansion in the non-first frame image based on the labeling region corresponding to each of the history tracking models to obtain a plurality of expansion regions includes:

6. The method for labeling data according to claim 1, wherein labeling the region to be labeled according to the similarity between the region to be labeled and the corresponding labeling region comprises:

7. The method according to any one of claims 1 to 6, wherein before updating the history tracking model corresponding to each labeling area according to the area feature corresponding to each labeling area in the previous frame image of the non-first frame image to obtain each tracking model corresponding to the non-first frame image, the method further comprises:

Acquiring a first frame image in the image data;

8. The method of data tagging according to any one of claims 1 to 6, further comprising:

9. A data tagging device, comprising:

The updating module is used for updating the historical tracking model corresponding to each labeling area in the previous frame image of each non-first frame image in the image data according to the area characteristics corresponding to each labeling area in the previous frame image of the non-first frame image to obtain each tracking model corresponding to the non-first frame image, wherein the historical tracking models are in one-to-one correspondence with the labeling areas in the previous frame image; the tracking model is a plurality of models corresponding to a plurality of marked areas;

The tracking module is used for carrying out area tracking on the non-first frame image through each tracking model to obtain a region to be marked corresponding to each tracking model, and comprises the following steps: performing region expansion in the non-first frame image based on the labeling region corresponding to each history tracking model to obtain a plurality of expansion regions; for each expansion region, determining the similarity between each subarea in the expansion region and the marking region through a tracking model corresponding to the expansion region, wherein the subareas are obtained by dividing the subareas in the expansion region according to a preset recognition mode; determining a region to be marked corresponding to the expansion region from the plurality of sub-regions according to the plurality of similarities; traversing the expansion area through a tracking model according to the size of a preset subarea and the sliding step length to obtain a plurality of subareas;

10. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 8 when executing the computer program.

11. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the method according to any one of claims 1 to 8.