CN114550041B

CN114550041B - Multi-target labeling method for shooting video by multiple cameras

Info

Publication number: CN114550041B
Application number: CN202210152739.9A
Authority: CN
Inventors: 李向阳; 张正; 张兰; 雷佳谕
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2022-02-18
Filing date: 2022-02-18
Publication date: 2024-03-29
Anticipated expiration: 2042-02-18
Also published as: CN114550041A

Abstract

The invention discloses a multi-target labeling method for shooting videos by multiple cameras, which comprises the following steps: step 1, selecting two pieces of video data of the same area shot by front and rear cameras; step 2, time is leveled, and lens distortion is corrected; step 3, respectively selecting four fixed points in a common area in the pictures of the two video data at the same moment, and sequentially connecting the four fixed points into a convex quadrangle to serve as a labeling area; step 4, marking corresponding IDs in a marking area of one picture through a marking frame, outputting the positions of the ID marking targets on the other picture through a ReID model, and correcting the positions by marking personnel; step 5, repeating the step 4 to mark all targets; and step 6, switching to a picture with preset duration, operating a target tracking model, correcting an output result of the target tracking model by a labeling person, and operating the target tracking model to correct labeling information between two frames of pictures until multi-target labeling of the whole video data is completed. The method saves marking time and labor, and improves efficiency.

Description

Multi-target labeling method for shooting video by multiple cameras

Technical Field

The invention relates to the field of video data analysis, in particular to a multi-target labeling method of a region monitoring video.

Background

The existing video algorithms are very mature, but target detection and tracking for dense crowds are still difficult, and the main reason is the lack of effective and clean data to train the model. However, the existing labeling method has the problems of tedious labeling, serious shielding and the like in a region with dense personnel, such as a classroom. Meanwhile, detection of small targets (e.g., targets with small pixel areas and 32×32 pixels) has hitherto remained one of the difficulties in target detection, one of the important reasons being that small targets are unbalanced in terms of data sets. In the COCO data set, labeling of many small target objects is very difficult, and the objects are very small, and have the phenomena of shielding, blurring and the like with different degrees.

The existing video labeling method is concentrated on optimizing a target tracking method to save the clicking times of a labeling person, but is not suitable for small targets (such as targets with small pixel areas and 32×32 pixels) existing in a video, is difficult to find, and has the problems of high personnel operation times, labor consumption and low labeling efficiency.

In view of this, the present invention has been made.

Disclosure of Invention

The invention aims to provide a multi-target labeling method for shooting videos by multiple cameras, which can label multiple targets in an area monitoring video shot by the multiple cameras, remarkably reduce the operation times of personnel, save manpower and improve labeling efficiency, thereby solving the technical problems in the prior art.

The invention aims at realizing the following technical scheme:

the embodiment of the invention provides a multi-target labeling method for shooting videos by multiple cameras, which comprises the following steps:

step 1, selecting two pieces of video data of the same area shot by two cameras arranged in front and back from video data of the same area shot by a plurality of cameras as two pieces of video data to be marked, wherein the two pieces of video data are all video data containing a plurality of targets;

step 2, time of a front camera and a rear camera for shooting two sections of video data is aligned, and lens distortion of the two sections of video data is corrected through internal parameters of the front camera and the rear camera;

step 3, respectively selecting four fixed points in a common area in the pictures of the two video data at the same moment, and sequentially connecting the four fixed points into a convex quadrangle to serve as a labeling area;

step 4, in the labeling area of one picture of the video data processed in the step 3, labeling corresponding IDs of each target to be labeled through a labeling frame, outputting the positions of the targets labeled by the IDs in the labeling area of the picture of the other video data at the same moment through a ReID model, and correcting the positions output by the ReID model by labeling personnel according to the errors of ID matching or the offset of the labeling frame;

step 5, repeating the step 4 until all targets in the pictures of the same time of the two sections of video data are marked;

step 6, switching the two sections of video data to pictures after the preset time length, operating a target tracking model to track a target, correcting an output result of the target tracking model by a labeling person, and reversely operating the target tracking model to correct labeling information between two frames of pictures, so as to finish multi-target labeling of one section of video;

and (3) continuously switching the two sections of video data to a picture after the preset time period, and repeating the step (6) until the multi-target labeling of the whole video data is completed.

Compared with the prior art, the multi-target labeling method for shooting videos by using the multi-camera has the beneficial effects that:

four fixed points are respectively selected from the common region of the pictures at the same moment of the two pieces of marked video data and are sequentially connected to form a convex quadrangle to serve as an auxiliary marking region, after multiple targets in the picture marking region of one piece of video data are marked, multiple targets in the picture marking region at the same moment of the other piece of video data are marked through a ReID model, so that the number of times of personnel operation is obviously reduced, manpower is saved, and the marking efficiency is improved. The method is suitable for analysis and mining of video data.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flow chart of a multi-target labeling method for shooting video by a multi-camera according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a shooting area of a multi-target labeling method according to an embodiment of the present invention;

fig. 3 is a schematic diagram of correspondence between multiple target positions in video data captured by front and rear cameras according to the multiple target labeling method provided by the embodiment of the present invention;

fig. 4 is a schematic perspective view of a shooting area of a multi-target labeling method according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a process flow of a multi-objective labeling method according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a video data frame of a multi-target labeling method according to an embodiment of the present invention; wherein, (a) and (b) are respectively the initial pictures of the target actions in the front camera and the rear camera, and (c) and (d) are respectively the final pictures of the target actions in the front camera and the rear camera.

Detailed Description

The technical scheme in the embodiment of the invention is clearly and completely described below in combination with the specific content of the invention; it will be apparent that the described embodiments are only some embodiments of the invention, but not all embodiments, which do not constitute limitations of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

The terms that may be used herein will first be described as follows:

the term "and/or" is intended to mean that either or both may be implemented, e.g., X and/or Y are intended to include both the cases of "X" or "Y" and the cases of "X and Y".

The terms "comprises," "comprising," "includes," "including," "has," "having" or other similar referents are to be construed to cover a non-exclusive inclusion. For example: including a particular feature (e.g., a starting material, component, ingredient, carrier, formulation, material, dimension, part, means, mechanism, apparatus, step, procedure, method, reaction condition, processing condition, parameter, algorithm, signal, data, product or article of manufacture, etc.), should be construed as including not only a particular feature but also other features known in the art that are not explicitly recited.

The term "consisting of … …" is meant to exclude any technical feature element not explicitly listed. If such term is used in a claim, the term will cause the claim to be closed, such that it does not include technical features other than those specifically listed, except for conventional impurities associated therewith. If the term is intended to appear in only a clause of a claim, it is intended to limit only the elements explicitly recited in that clause, and the elements recited in other clauses are not excluded from the overall claim.

Unless specifically stated or limited otherwise, the terms "mounted," "connected," "secured," and the like should be construed broadly to include, for example: the connecting device can be fixedly connected, detachably connected or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the terms herein above will be understood by those of ordinary skill in the art as the case may be.

When concentrations, temperatures, pressures, dimensions, or other parameters are expressed as a range of values, the range is to be understood as specifically disclosing all ranges formed from any pair of upper and lower values within the range of values, regardless of whether ranges are explicitly recited; for example, if a numerical range of "2 to 8" is recited, that numerical range should be interpreted to include the ranges of "2 to 7", "2 to 6", "5 to 7", "3 to 4 and 6 to 7", "3 to 5 and 7", "2 and 5 to 7", and the like. Unless otherwise indicated, numerical ranges recited herein include both their endpoints and all integers and fractions within the numerical range.

The terms "center," "longitudinal," "transverse," "length," "width," "thickness," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," "clockwise," "counterclockwise," etc. refer to an orientation or positional relationship based on that shown in the drawings, merely for ease of description and to simplify the description, and do not explicitly or implicitly indicate that the apparatus or element in question must have a particular orientation, be constructed and operated in a particular orientation, and therefore should not be construed as limiting the present disclosure.

The multi-target labeling method for shooting videos by using the multi-camera provided by the invention is described in detail below. What is not described in detail in the embodiments of the present invention belongs to the prior art known to those skilled in the art. The specific conditions are not noted in the examples of the present invention and are carried out according to the conditions conventional in the art or suggested by the manufacturer. The reagents or apparatus used in the examples of the present invention were conventional products commercially available without the manufacturer's knowledge.

As shown in fig. 1, an embodiment of the present invention provides a multi-target labeling method for capturing video by using multiple cameras, including:

step 4, in the labeling area of one picture of the video data processed in the step 3, labeling corresponding IDs of each target to be labeled through a labeling frame, outputting the positions of the targets labeled by the IDs in the labeling area of the picture of the other video data at the same moment through a ReID model, and correcting the positions of the targets output by the ReID model by labeling personnel according to the errors of ID matching or the offset of the labeling frame;

step 6, switching the two sections of video data to pictures after the preset time length, operating a target tracking model to track the target, correcting the output result of the target tracking model by a labeling person, then reversely putting the two sections of video, and operating the target tracking model again to correct labeling information between the two frames of pictures, thereby completing multi-target labeling of one section of video;

In the multi-target labeling method, the ReID model adopts a CLI model. The CLI model can utilize the information of a plurality of cameras to perform information fusion, so that target marking information under a plurality of view angles can be obtained efficiently.

In the multi-target labeling method, the target tracking model adopts a ByteTrack model.

In the above multi-target labeling method, in the step 6, the switching is performed to a preset duration of 3 seconds. And performing multi-target labeling processing on the whole video by taking 3 seconds as a period of time.

In summary, according to the multi-target labeling method provided by the embodiment of the invention, as the ReID model and the target tracking model are introduced, the two images of the two cameras are mutually assisted to be labeled, and the labeling fine adjustment of personnel is combined, so that the personnel operation times are obviously reduced, the manpower is saved, and the labeling efficiency is improved.

In order to clearly show the technical scheme and the technical effects, the multi-target labeling method for shooting video by the multi-camera provided by the embodiment of the invention is described in detail in the following.

Examples

As shown in fig. 2 and 6, an embodiment of the present invention provides a multi-target labeling method for capturing video by using multiple cameras, which is used for labeling video data of areas captured by multiple cameras, such as monitoring video data of public areas including classrooms, offices, etc. (see fig. 1):

step 1, selecting two pieces of video data of the same area shot by two cameras arranged in front and back from video data of the same area shot by a plurality of cameras (see fig. 3) as two pieces of video data to be marked, wherein the two pieces of video data are all video data containing a plurality of targets (see fig. 4);

step 2, loading two sections of video data to be marked into marking equipment, aligning the time of a front camera and a rear camera for shooting the two sections of video data, and inputting the front camera and the rear camera to internally correct lens distortion of the two sections of video data;

step 3, selecting four fixed points to be sequentially connected into convex quadrilaterals as labeling areas in the common areas of two pictures of two video data at the same time (see fig. 6 (a), (b), (c) and (d));

step 4, for each object to be marked, marking ID information on a frame of one section of video data (namely through a marking frame), outputting the position of the ID information in a frame marking area of the other section of video at the same moment by a ReID model, and performing fine adjustment by a marking person, wherein the specific fine adjustment mode is to correct the target position output by the ReID model by the marking person according to an ID matching error or marking frame deviation (see FIG. 5);

step 5, repeating the step 4 until all targets in the two pieces of video data are marked;

step 6, switching two sections of video data to pictures after a few seconds (generally preset to be 3 seconds), running a target tracking model to track a labeling target, correcting an output result of the target tracking model by a labeling person, and reversely running the target tracking model to correct labeling information between two frames of pictures of the two sections of video data, so as to finish multi-target labeling of one section of video;

In summary, the multi-target labeling method of the embodiment of the invention automatically completes the multi-target labeling process which is carried out manually in the past through the model due to the participation of various models, and the labeling personnel only carry out correction processing tasks, thereby fully reducing the workload of the labeling personnel, having great fault-tolerant space for the effect of the model, simultaneously and efficiently obtaining a plurality of accurately labeled video data and effectively reducing the labeling difficulty of the labeling personnel.

The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present invention should be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims. The information disclosed in the background section herein is only for enhancement of understanding of the general background of the invention and is not to be taken as an admission or any form of suggestion that this information forms the prior art already known to those of ordinary skill in the art.

Claims

1. A multi-target labeling method for shooting video by a plurality of cameras is characterized by comprising the following steps:

2. The multi-target labeling method for capturing video with multiple cameras according to claim 1, wherein the ReID model adopts a CLI model.

3. The multi-target labeling method for capturing video by using multiple cameras according to claim 1 or 2, wherein the target tracking model adopts a ByteTrack model.

4. The multi-target labeling method for capturing video by using multiple cameras according to claim 1 or 2, wherein in the step 6, the switching is performed to a preset duration of 3 seconds.