CN113034551A

CN113034551A - Target tracking and labeling method and device, readable storage medium and computer equipment

Info

Publication number: CN113034551A
Application number: CN202110604174.9A
Authority: CN
Inventors: 毛凤辉; 郭振民
Original assignee: Nanchang Virtual Reality Institute Co Ltd
Current assignee: Nanchang Virtual Reality Institute Co Ltd
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2021-06-25

Abstract

The invention discloses a target tracking and labeling method, a target tracking and labeling device, a readable storage medium and computer equipment, wherein the method comprises the following steps: acquiring a target object to be tracked and labeled from a video, and tracking the target object by using a siammask target tracking algorithm to obtain a tracking mask; screening a target mask from the tracking masks, and obtaining the contour position attribute of the target mask; and converting the contour position attribute of the target mask into a format required by the yolo series target detection to obtain a yolo series target detection label file and a corresponding image thereof. The invention can solve the problems of poor tracking effect and low labeling efficiency when the prior art tracks and labels the video target.

Description

Target tracking and labeling method and device, readable storage medium and computer equipment

Technical Field

The invention relates to the technical field of computers, in particular to a target tracking and labeling method, a target tracking and labeling device, a readable storage medium and computer equipment.

Background

Target detection is one of important technologies for realizing VR (Virtual Reality) human-computer interaction, in the target detection, a labeling process of a target object is very important, and the target detection is used for marking the position of the target object in an original image and generating a corresponding file for each picture to represent the position of a target standard frame.

In the prior art, labeling tools commonly used for target detection comprise labelImg, labelme, Darklabel and the like, labeling of a single-frame image is convenient for labelImg and labelme, but support of a video is not friendly, the video needs to be divided into one frame of image and then labeled, and if the data size is large, time is consumed; the Darklabel can track and label the video target, but the tracking effect on the target is poor, the tracking frame needs to be continuously adjusted, and the labeling efficiency is low.

Disclosure of Invention

Therefore, an object of the present invention is to provide a target tracking and labeling method, so as to solve the problems of poor tracking effect and low labeling efficiency when a video target is tracked and labeled in the prior art.

The invention provides a target tracking and labeling method, which comprises the following steps:

acquiring a target object to be tracked and labeled from a video, and tracking the target object by using a siammask target tracking algorithm to obtain a tracking mask;

screening a target mask from the tracking masks, and obtaining the contour position attribute of the target mask;

and converting the contour position attribute of the target mask into a format required by the yolo series target detection to obtain a yolo series target detection label file and a corresponding image thereof.

According to the target tracking and labeling method provided by the invention, the siammask algorithm is combined with video labeling, the siammask algorithm with a better tracking effect is used for target tracking, the anti-interference performance is strong, the target object is tracked through the siammask target tracking algorithm, the target mask is screened out from the tracking mask after the tracking mask is obtained, the contour position attribute of the target mask is obtained, and then the contour position attribute of the target mask is converted into a format required by yolo series target detection, namely, the target object in the video is labeled, so that the automatic target tracking and labeling are completed, and the labeling efficiency is improved.

In addition, according to the above-mentioned target tracking and labeling method of the present invention, the following additional technical features may also be provided:

further, the step of acquiring a target object to be tracked and labeled from the video, and tracking the target object by using a siammask target tracking algorithm to obtain a tracking mask specifically includes:

selecting a target object in a frame of a video containing the target object to be tracked and labeled;

tracking the target object by using a siammask target tracking algorithm;

and acquiring a tracking mask obtained when the target object is tracked by a siammask target tracking algorithm.

Further, the step of screening out a target mask from the tracking masks and obtaining the contour position attribute of the target mask specifically includes:

searching the outline of each object in the tracking mask through a findContours () function of an opencv library to obtain an outline list;

acquiring the position attribute and the area of each outline in the outline list through a bounngselect () function in python library opencv;

and acquiring the outline with the largest area, taking the outline with the largest area as the outline of the target mask, and taking the position attribute of the outline with the largest area as the outline position attribute of the target mask.

Further, the step of converting the contour position attribute of the target mask into a format required for the detection of the yolo series target to obtain a yolo series target detection tag file and a corresponding image thereof specifically includes:

calculating the proportion of the target object relative to the size of the original image according to the contour position attribute of the target mask;

and converting the position of the target object relative to the size of the image into a format required by the yolo series target detection to obtain a yolo series target detection label file and a corresponding image thereof.

Further, in the step of calculating the ratio of the target object to the original image size according to the contour position attribute of the target mask, the ratio of the target object to the original image size is calculated by using the following formula:

wherein the content of the first and second substances,W、Hrepresenting width, height, x of the original image_k、y_kRepresents the coordinate position, width, of the top left corner vertex of the contour k with the largest area_k、height_kWidth and height of the profile k representing the largest area, p_kx、p_kyThe ratio of the center position of the minimum rectangular bounding box representing the contour k of the largest area to the original image size, p_kw、p_khThe contour k representing the maximum contour is the ratio of the width and height of the minimum rectangular bounding box to the original image size.

Further, after the step of converting the contour position attribute of the target mask into a format required for the yolo series target detection to obtain a yolo series target detection tag file and a corresponding image thereof, the method further includes:

storing the yolo series target detection label file and the corresponding image thereof in a preset folder;

dividing the marked files in the preset folder into a training set, a verification set and a test set according to a preset division ratio;

and performing model training on the neural network model of the yolo series through the training set, the verification set and the test set, wherein the trained neural network training model is used for target detection.

The invention also aims to provide a target tracking and labeling device to solve the problems of poor tracking effect and low labeling efficiency when a video target is tracked and labeled in the prior art.

The invention provides a target tracking and labeling device, which comprises:

the acquisition tracking module is used for acquiring a target object to be tracked and labeled from a video and tracking the target object by using a siammask target tracking algorithm to obtain a tracking mask;

the screening and obtaining module is used for screening a target mask from the tracking masks and obtaining the outline position attribute of the target mask;

and the conversion generation module is used for converting the contour position attribute of the target mask into a format required by the yolo series target detection so as to obtain a yolo series target detection label file and a corresponding image thereof.

According to the target tracking and labeling device provided by the invention, the siammask algorithm is combined with video labeling, the siammask algorithm with a better tracking effect is used for target tracking, the anti-interference performance is strong, the target object is tracked through the siammask target tracking algorithm, the target mask is screened out from the tracking mask after the tracking mask is obtained, the contour position attribute of the target mask is obtained, and then the contour position attribute of the target mask is converted into a format required by yolo series target detection, namely, the target object in the video is labeled, so that the automatic target tracking and labeling are completed, and the labeling efficiency is improved.

In addition, the above target tracking and labeling device according to the present invention may further have the following additional technical features:

further, the acquisition tracking module is specifically configured to:

tracking the target object by using a siammask target tracking algorithm;

Further, the screening acquisition module is specifically configured to:

Further, the conversion generation module is specifically configured to:

Further, the conversion generation module is specifically configured to calculate a ratio of the target object to the original image size using the following formula:

Further, the apparatus further comprises:

the storage module is used for storing the yolo series target detection label file and the corresponding image thereof in a preset folder;

the dividing module is used for dividing the marked files in the preset folder into a training set, a verification set and a test set according to a preset dividing proportion;

and the training module is used for carrying out model training on the neural network model of the yolo series through the training set, the verification set and the test set, and the trained neural network training model is used for target detection.

The invention also proposes a readable storage medium on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.

The invention also proposes a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above method when executing the program.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The above and/or additional aspects and advantages of embodiments of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a flow diagram of a target tracking and labeling method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of human hand tracking;

FIG. 3 is a detailed flowchart of step S101 in FIG. 1;

FIG. 4 is a schematic diagram of a human hand tracking mask;

FIG. 5 is a detailed flowchart of step S102 in FIG. 1;

FIG. 6 is a detailed flowchart of step S103 in FIG. 1;

FIG. 7 is a flow diagram of a target tracking and labeling method according to another embodiment of the present invention;

FIG. 8 is a block diagram of a target tracking and labeling apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, a target tracking and labeling method according to an embodiment of the present invention includes steps S101 to S103.

S101, acquiring a target object to be tracked and labeled from a video, and tracking the target object by using a siammask target tracking algorithm to obtain a tracking mask.

Referring to fig. 2, for example, a hand is tracked. In addition, referring to fig. 3, step S101 specifically includes:

s1011, selecting the target object in the frame of the video containing the target object to be tracked and labeled.

And S1012, tracking the target object by using a siammask target tracking algorithm.

The siammask is an existing open-source mature target tracking technology, and the target object selected in the step S1011 is tracked through a siammask target tracking algorithm.

And S1013, acquiring a tracking mask obtained when the target object is tracked through a siammask target tracking algorithm.

It should be noted that, at this time, the tracked tracking mask may further include some interference regions, for example, as shown in fig. 4, the target object is a hand of a person, and the tracked tracking mask may further include some other interference regions (oval regions in fig. 4) in addition to the hand region.

In the specific implementation, in the siammask tracking process, if tracking failure or inaccuracy occurs (manual supervision is needed), the tracking target can be selected again, and the tracker is initialized to realize re-tracking.

S102, screening out a target mask from the tracking masks, and obtaining the outline position attribute of the target mask.

Referring to fig. 5, step S102 specifically includes:

s1021, searching the outline of each object in the tracking mask through a findContours () function of an opencv library to obtain an outline list;

s1022, acquiring the position attribute and the area of each contour in the contour list through a boundinget () function in python library opencv;

for example, the location attributes and areas of the contours are as follows:

wherein x is_i、y_iIndicates the position of the top left corner of the ith contour, width_i、height_iWidth and height of i-th contour, aera_iDenotes the area of the ith contour, i = 0,1,2. cv2 represents an opencv library, and bonudnglect represents a function in the opencv library.

And S1023, acquiring the contour with the largest area, taking the contour with the largest area as the contour of the target mask, and taking the position attribute of the contour with the largest area as the contour position attribute of the target mask.

Wherein all the contour areas (aera) are obtained by the step S1022₀，aera₁，aera₂…), since the target mask is typically a larger region in the trace mask, the contour with the largest area is found, assuming the resulting largest area is aera_kThat is, the kth contour is the contour with the largest area, and the vertex coordinate of the upper left corner of the contour is obtained as (x)_i，y_i) Width and height are width respectively_k、height_k。

For example, the target object is a human hand, and the outline of the hand is the largest in all the outlines, and the outline areas of other interference regions are smaller than the contour areas, so that the target mask can be quickly screened out by using the size of the outline areas.

S103, converting the outline position attribute of the target mask into a format required by the yolo series target detection to obtain a yolo series target detection label file and a corresponding image thereof.

Referring to fig. 6, step S103 specifically includes:

s1031, calculating the proportion of the target object relative to the size of the original image according to the contour position attribute of the target mask;

specifically, the ratio of the target object to the original image size is calculated by using the following formula:

S1032, converting the position of the target object relative to the image size into a format required by the yolo series target detection to obtain a yolo series target detection label file and a corresponding image thereof.

The yolo series comprises yolov 1-yolov 5 versions, the yolo series label files are identical in format, the content of the yolo series target detection label file is the proportion of a target label and a target object relative to the size of an original image, and the yolo series label file comprises the following steps:

where 2 denotes the target label and the following decimal places denote the ratio of the target object to the original image size (i.e., p)_kx、p_ky、p_kw、p_khValue of) i.e. label, p_kx、p_ky、p_kw、p_khA text file (txt file) is written, where label represents the tagged target tag. The object labels are indicated by numbers and different objects are indicated by different labels.

In addition, referring to fig. 7, as a specific example, after step S103, the method further includes steps S201 to S203:

s201, storing the yolo series target detection label file and the corresponding image thereof in a preset folder;

wherein, the label file in step S103 is saved in the databases/labels folder, and the corresponding image is saved in the databases/images file, and the name of the label file is the same as that of the image file, for example: tag file 1.txt, corresponding to image name 1. jpg.

S202, dividing the marked files in the preset folder into a training set, a verification set and a test set according to a preset division ratio;

the marked files under the data sets are divided into a training set, a verification set and a test set, and the division proportion is divided according to the actual conditions of the project.

And S203, performing model training on the neural network model of the yolo series through the training set, the verification set and the test set, wherein the trained neural network training model is used for target detection.

In summary, according to the target tracking and labeling method provided by this embodiment, the siammask algorithm is combined with video labeling, the siammask algorithm with a better tracking effect is used for target tracking, the anti-interference performance is strong, the target object is tracked through the siammask target tracking algorithm, the target mask is screened from the tracking mask after the tracking mask is obtained, the contour position attribute of the target mask is obtained, and then the contour position attribute of the target mask is converted into a format required by yolo series target detection, that is, the labeling of the target object in the video is realized, so that automatic target tracking and labeling are completed, and the labeling efficiency is improved.

Referring to fig. 8, a target tracking and labeling apparatus according to an embodiment of the present invention includes:

In this embodiment, the acquisition tracking module is specifically configured to:

tracking the target object by using a siammask target tracking algorithm;

In this embodiment, the screening acquisition module is specifically configured to:

In this embodiment, the conversion generating module is specifically configured to:

In this embodiment, the conversion generating module is specifically configured to calculate a ratio of the target object to the original image size by using the following formula:

In this embodiment, the apparatus further includes:

According to the target tracking and labeling device provided by the embodiment, the siammask algorithm is combined with video labeling, the siammask algorithm with a good tracking effect is used for target tracking, the anti-interference performance is strong, the target object is tracked through the siammask target tracking algorithm, the target mask is screened out from the tracking mask after the tracking mask is obtained, the contour position attribute of the target mask is obtained, then the contour position attribute of the target mask is converted into a format required by yolo series target detection, namely, the target object in the video is labeled, so that automatic target tracking and labeling are completed, and the labeling efficiency is improved.

Furthermore, an embodiment of the present invention also proposes a readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the above-mentioned method.

Furthermore, an embodiment of the present invention also provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor implements the steps of the above method when executing the program.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims

1. A method for target tracking and labeling, the method comprising:

2. The target tracking and labeling method according to claim 1, wherein the step of obtaining a target object to be tracked and labeled from a video, and tracking the target object by using a siammask target tracking algorithm to obtain a tracking mask specifically comprises:

tracking the target object by using a siammask target tracking algorithm;

3. The target tracking and labeling method according to claim 1, wherein the step of screening out a target mask from the tracking masks and obtaining the contour position attribute of the target mask specifically comprises:

4. The method for tracking and labeling the target according to claim 3, wherein the step of converting the contour position attribute of the target mask into a format required for the detection of the yolo series target to obtain the yolo series target detection tag file and the corresponding image thereof specifically comprises:

5. The method for tracking and labeling a target according to claim 4, wherein in the step of calculating the ratio of the target object to the original image size according to the contour position attribute of the target mask, the ratio of the target object to the original image size is calculated by using the following formula:

6. The method for tracking and labeling the target according to any one of claims 1 to 5, wherein after the step of converting the outline position attribute of the target mask into a format required for the detection of the yolo series target to obtain the yolo series target detection tag file and the corresponding image thereof, the method further comprises:

7. An object tracking and labeling apparatus, characterized in that the apparatus comprises:

8. The target tracking and labeling apparatus of claim 7, wherein the screening acquisition module is specifically configured to:

9. A readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-6.

10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 6 when executing the program.