CN109743497B

CN109743497B - Data set acquisition method and system and electronic device

Info

Publication number: CN109743497B
Application number: CN201811559113.XA
Authority: CN
Inventors: 张发恩; 赵江华; 秦永强
Original assignee: Ainnovation Chongqing Technology Co ltd
Current assignee: Ainnovation Chongqing Technology Co ltd
Priority date: 2018-12-21
Filing date: 2018-12-21
Publication date: 2020-06-30
Anticipated expiration: 2038-12-21
Also published as: CN109743497A

Abstract

The present invention relates to the field of data acquisition technologies, and in particular, to a data set acquisition method, system, and electronic device. The method comprises the following steps: providing a plurality of targets; shooting the plurality of targets under preset shooting angles, lighting conditions and background conditions to obtain videos; reading each frame of image in the video by using a multi-target tracking model, and obtaining the position information and/or the characteristic information of each target in each frame of image; and S4, labeling the targets by using the category information representing the targets to form a data set corresponding to each target. The method comprises the steps of shooting videos of a plurality of targets needing data set acquisition, reading position information and/or characteristic information of the targets in each frame of image of the videos by using a multi-target tracking algorithm to form a data set, and parallelizing acquisition tasks through a standardized data acquisition process, so that the data acquisition and labeling can be simultaneously carried out on the targets, and the efficiency is greatly improved.

Description

Data set acquisition method and system and electronic device

[ technical field ] A method for producing a semiconductor device

The invention relates to the technical field of data acquisition, in particular to a data acquisition method and system based on multi-target tracking and an electronic device.

[ background of the invention ]

Nowadays, industries such as freight transportation, goods production and goods sales all need to classify and detect different kinds of goods based on characteristic information of goods themselves, therefore, when facing a large amount of goods that need to be detected and classified, it is usually necessary to manually collect multiple characteristic data information under different scenes about each goods in multiple directions, and utilize these characteristic data information to mark each goods one by one respectively, can not mark a plurality of goods in batches simultaneously, such operation is often less efficient, the cost is higher, and make mistakes easily.

[ summary of the invention ]

The invention provides a data set acquisition method, a data set acquisition system and an electronic device, aiming at the problems in the existing data acquisition method.

The invention provides a data set acquisition method for solving the technical problems, which comprises the following steps:

s1, providing a plurality of targets, wherein the targets are cargoes;

s2, shooting the multiple targets under preset shooting angles, lighting conditions and background conditions to obtain videos;

A. the method comprises the steps that a detection model reads a first frame image in a video and determines a plurality of targets to be tracked, the detection model is obtained through training based on feature information matched with each target to be tracked, a feature template used for identifying the plurality of targets to be tracked is stored in the detection model, and when the detection model reads the first frame image, the feature information of a current picture is compared with information on the feature template to determine the plurality of targets to be tracked;

s3, reading each frame of image in the video by using the multi-target tracking model, and obtaining the position information and/or the characteristic information of each target in each frame of image,

the step S3 specifically includes the following steps:

s31, the multi-target tracking model reads the first frame image, position information and feature information of a plurality of targets needing to be tracked in the first frame image are obtained, the plurality of targets needing to be tracked are tracked based on the position information and the feature information of the first frame target, the feature information and the position information of the targets in each frame are updated, the feature information comprises contour information, and the multi-target tracking model tracks the targets in the subsequent frames based on the contour information and the position information of each target;

s32, the multi-target tracking model judges whether the tracked target is lost or not when reading each subsequent frame of image, and when the tracking model judges that the tracked target is lost, the detection model confirms whether the target is lost or not and continues to track the target which is not lost until the video is finished;

and S4, labeling the targets by using the category information representing the targets to form a data set corresponding to each target.

Preferably, when reading the image after the first frame, the multi-target tracking model determines whether the target is lost and detects whether a new target appears, and if detecting that a new target appears, the multi-target tracking model obtains the feature information corresponding to the new target, inputs the feature information corresponding to the new target to label the new target, and tracks the new target in the subsequent frame.

Preferably, when the tracking model reads each subsequent frame of image, the detection model detects each frame of image to determine whether the target is lost, or detects at least one frame of image at intervals to determine whether the target is lost.

Preferably, the multi-target tracking model determining whether a new target appears includes the steps of: when the multi-target tracking model acquires the characteristic information corresponding to the new target, the characteristic information of the new target is compared with the characteristic information of all targets in the previous frame, and if the characteristic information of the new target is not matched with the characteristic information of the targets in all the previous frames, the new target is considered to appear.

The present invention further provides a system for executing the data collection method, which includes a shooting module and a multi-target tracking model, wherein the shooting module is used for shooting videos of multiple targets under preset shooting angles, lighting conditions and background conditions; the multi-target tracking model is used for reading each frame of image in the video and obtaining the position information and the characteristic information of each target in each frame of image to form a data set, and the targets are goods; the data set acquisition system further comprises a detection model, signal transmission can be carried out between the detection model and the multi-target tracking model, the detection model is used for determining a plurality of targets to be tracked in a first frame on a video, the detection model is obtained by training based on characteristic information matched with each target to be tracked, a characteristic template used for identifying the plurality of targets to be tracked is stored in the detection model, and when the detection model reads a first frame image, the characteristic information of a current picture is compared with information on the characteristic template to determine the plurality of targets to be tracked; the multi-target tracking model is used for tracking and labeling a plurality of targets to be tracked, which are obtained based on the detection model, judging whether the tracked targets are lost or not when reading each subsequent frame of image, sending a signal to the detection model when the tracking model judges that the tracked targets are lost, confirming whether the targets are lost or not by the detection model and continuing tracking the targets which are not lost until the video is finished, and judging whether new targets appear or not when reading each frame of image.

The present invention also provides an electronic device, which includes a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute any step of the data collection method described above through the computer program.

Compared with the prior art, the invention has the following beneficial effects:

the method comprises the steps of performing video shooting on a plurality of targets needing data set acquisition, reading position information and/or characteristic information of the plurality of targets in each frame of image of a video by utilizing a multi-target tracking algorithm to form a data set, and parallelizing acquisition tasks through a standardized data acquisition process, so that the data acquisition can be performed on the plurality of targets at the same time, the efficiency is greatly improved, the problems of high manual labeling cost, long consumed time, inaccuracy and the like are solved, and the method is suitable for rapidly acquiring a batch of image data with accurate labels in an actual industrial application scene; moreover, the applicability is high, specific samples are not limited, the target can be tracked and data can be acquired based on the relevance of the characteristic information and the position information of the target in different frames on the video without acquiring a data training model in advance, and the process is saved; meanwhile, the process of acquiring the position information and the characteristic information of each frame of image and the operation of labeling the target are asynchronously carried out, so that the flexibility of the data acquisition process is higher, the target can be labeled when the first frame is read, and the target can be labeled after the video reading is finished and the position information of the target in each frame is calculated through a tracking model.

When the multi-target tracking model reads an image after a first frame, whether a target is lost or not is judged, whether a new target appears or not is also detected, when the new target appears is judged, the new target is calibrated according to information corresponding to the new target and tracked in a subsequent frame, the types of collected data can be well enriched, and the collected data are suitable for screening more products.

The data set acquisition system comprises a shooting module and a multi-target tracking module, wherein the shooting module is used for carrying out video shooting on a target; the multi-target tracking module is used for reading each frame of image in the video and obtaining the position information and the characteristic information of the target in each frame of image to form a data set, the position information and the characteristic information corresponding to the targets can be collected through the matching of the shooting module and the multi-target tracking module to form the data set, the video does not need to be shot for each target respectively, data are collected respectively, the process is simple, the efficiency is high, the cost is low, and the accuracy of the data is high.

[ description of the drawings ]

Fig. 1 is a schematic flow-chart structure diagram of a data set acquisition method provided in a first embodiment of the present invention;

FIG. 2 is a schematic diagram of a first frame of image detected by a detection model to be tracked in a data collection method according to a first embodiment of the present invention;

fig. 3 is a schematic diagram illustrating a detailed flow chart of step S3 in the data set acquisition method provided in the first embodiment of the present invention;

fig. 4 is a schematic diagram of the steps included in step S32 of the data set acquisition method according to the first embodiment of the present invention;

FIG. 5 is a schematic view of a flow structure in another embodiment of the first embodiment of the present invention;

FIG. 6 is a diagram illustrating a tracking model detecting the presence of a new target according to a first embodiment of the present invention;

FIG. 7 is a schematic block diagram of a data set acquisition system according to a second embodiment of the present invention;

FIG. 8 is a schematic block diagram of a data set acquisition system according to a second embodiment of the present invention;

FIG. 9 is a diagram illustrating modules included in the multi-object tracking model in the data set acquisition system according to the second embodiment;

fig. 10 is a schematic block diagram of an electronic device according to a third embodiment.

[ detailed description ] embodiments

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Referring to fig. 1, the present invention provides a data set collecting method for collecting data of a plurality of targets needing to collect data in a video, including the following steps:

s1, providing a plurality of targets;

s2, shooting the multiple targets to obtain videos;

s3, reading each frame of image in the video by using a multi-target tracking model, and obtaining the position information and/or the characteristic information of each target in each frame of image; and

and S4, labeling the multiple targets.

In the step S1, the objects are articles that need to acquire characteristic data to form a data set, and include commodities, food, documents, and the like.

In the above step S2, a plurality of objects are photographed according to a preset photographing standard to obtain one video including the plurality of objects. The preset shooting standard refers to video shooting under the conditions of multiple angles, multiple lighting and simple background set for multiple targets. The selected background is generally a background that has a good distinguishing effect from the color of the object. The target is shot in multiple angles and under multiple light-ray conditions, characteristic data of the target under different environments and different scenes can be well obtained, so that collected data sets represent articles more comprehensively, and when a model is built by using collected data at a later stage to detect or classify products, more pictures obtained under different scenes can be identified, and the target is identified.

In step S3, the feature information of the target includes one or more of color features, contour features, and texture features.

In step S3, before reading each frame of image in the video by using the multi-target tracking model and obtaining the position information and/or the feature information of each target in each frame of image, the method further includes step a: and reading a first frame image in the video by the detection model, detecting the first frame image, and determining a plurality of targets to be tracked.

The detection model used in the step A is a multi-target detection network, and specifically comprises one of a Fast-RCNN network and an SSD network. The detection model is trained based on the feature information matched with each target to be tracked, a plurality of targets to be tracked can be identified, namely, a feature template for distinguishing the targets to be tracked exists in the detection model, and when the detection model reads the first frame image, the feature information of the current image is compared with the information on the feature template, so that the targets to be tracked can be determined.

In the step a, when the detection model reads the first frame image in the video, the detection model finds out the regions matching with the multiple targets to be tracked according to the feature information of the first frame image and frames out the regions with rectangular frames to determine the multiple targets to be tracked, where target 1, target 2, and target 3 shown in fig. 2 are the multiple targets to be tracked. The rectangular frames are respectively used for framing different targets, so that the situation that the targets are mixed up in the subsequent frames due to the similarity and the shielding property among the targets and the acquired data are deviated can be well avoided. Alternatively, the framing of the target to be tracked is not limited to a rectangular frame, but may be a diamond frame or other quadrilateral pattern. In even other embodiments, other polygons may be used to frame the target.

Referring to fig. 3, the step S3 specifically includes the following steps:

s31, the multi-target tracking model reads the first frame image, position information and feature information of a plurality of targets needing to be tracked in the first frame image are obtained, the plurality of targets needing to be tracked are tracked based on the position information and the feature information of the first frame target, and the feature information and the position information of the targets in each frame are updated; and

and S32, the tracking model judges whether the tracked target is lost or not when reading each subsequent frame of image, and when the tracking model judges that the tracked target is lost, the detection model confirms whether the target is lost or not and continues to track the target which is not lost until the video is finished.

Referring to fig. 4, in the step a, after the detection model frames the multiple targets to be tracked, the multi-target tracking model reads the first frame image and obtains the corresponding position information and feature information of each target in the first frame image, and tracks and acquires data of the multiple targets to be tracked based on the position information and feature information of the first frame target. In step S31, the multi-target tracking model obtains contour information, color features, or grammatical features, and position information corresponding to each target in the first frame image. The objects in subsequent frames are tracked based on the contour information and the position information of each object.

Step S32 specifically includes the following steps:

s321, reading a next frame of video image by the multi-target tracking model, and determining a plurality of targets to be tracked by combining position information and characteristic information of a previous frame of targets;

s322, judging whether the target is lost or not by the tracking model;

if not, executing the step

S324, updating the target position;

if yes, execute the step

S325, the detection model further confirms whether the target is lost, and the tracking target is marked according to the judgment result of the detection model;

after step S324 or step S325 is executed, step S is executed

S326, updating the tracking state;

s327, judging whether the video is finished;

if yes, execute the step

S328, uploading the marked data,

if not, the process returns to step S321 until the video ends.

In step S321, when reading the image of each subsequent frame, the multi-target tracking model identifies an area matching the contour of each target of the previous frame, matches a plurality of contours in the next frame with the contours of a plurality of targets in the previous frame based on the gradual change of the contour information of the targets in the previous and next frames, and further determines a plurality of targets to be tracked by combining the relevance of the position information between each target.

In step S322, if the tracking model does not find a contour track matching the target to be tracked, the target is considered to be lost. For example, if no matching contour is found in the second frame for target 1, then target 1 is considered to have been lost.

After step S322 is completed, if the target is not lost, the target location is updated and the tracking status is updated. That is, the position information of the target in the frame is updated, and the tracking state corresponds to the searched similar contour. At this time, the position information and the feature information of the contour region corresponding to each object of the previous frame are corresponding to the object of the frame to represent the position information and the feature information of the object of the frame to form a data set. Meanwhile, the target in the subsequent frame continuously tracks and collects data based on the position information and the characteristic information of the frame.

If the tracking model determines that the target is lost, step S325 is executed correspondingly, and the detection model further determines whether the target is lost, and labels the target according to the determination result of the detection model. And the detection model determines whether the target is lost or not based on whether the frame can find the characteristic information matched with the characteristic model prestored in the detection model, if so, the target is not lost, otherwise, the target is lost. For example, the tracking model determines that the target 1 is lost in the frame, at this time, the detection model reads feature information corresponding to a plurality of new positions related to the target 1 in the frame, if the feature information of the target 1 in the frame cannot be matched with the feature model pre-stored by the detection model, the target 1 is considered to be lost, at this time, if the targets 2 and 3 are not lost, the tracking model corresponds the feature information and the position information corresponding to the targets 2 and 3 in the frame, and continues to track the targets 2 and 3 in subsequent frames to acquire data.

It will be appreciated that the tracking of the objects 2 and 3 continues after the object 1 is lost, and continues if the object 1 reappears in the images of subsequent frames.

It is understood that the determination of whether the target 2 and the target 3 are lost is the same as that of the target 1, and is not described herein.

In step S32, the detection model is used to detect each frame of image to determine whether the target is lost or not, or detect at least one frame of image to determine whether the target is lost when the tracking model reads each subsequent frame of image. Alternatively, it is set whether the detection model confirms that the target is lost every 5 frames, 10 frames, 15 frames, 30 frames.

Referring to fig. 5, in another embodiment of the present invention, when the multi-target tracking model reads each frame of image to track the target, the detection model detects each frame of image and further includes step S329, determining whether a new target is present, in addition to determining whether the target is lost; if a new target appears, executing step S3291, inputting a new label: manually inputting characteristic information corresponding to the new target to label the new target and tracking the new target in a subsequent frame;

after the step S3291 is executed, the step S324 and the related steps are further executed until the video capture is finished.

If no new target appears, step S324 and the following steps are correspondingly executed until the video capture is finished.

It is to be understood that, in step S3291, the label may be input after the video capture is finished to label the new target.

Referring to fig. 6, a specific operation for determining whether a new target appears is to consider that a new target, for example, the target 4 shown in fig. 6 appears if an image track that has not appeared in the previous frame appears in the current frame, and the target 4 is a new target if the target 4 has not appeared in the previous frame. The determination of whether there is a new target is also based on the feature information and the position information.

When a new target appears, besides the label is manually input to label the new target, the feature template corresponding to the new target can be imported into the detection model to update the data of the detection model, so that whether the new target is lost or not can be conveniently confirmed in the process of tracking the new target.

It should be noted that, when the detection model detects that a new target appears in the current frame, the new target appearing newly is labeled only when the new target appears continuously for multiple times in the subsequent frames, and a data set corresponding to the new target is obtained.

The tracking model used in the present invention includes one of OpenCV and Keras algorithms.

Referring to fig. 1 again, it should be noted that, in step S4, the multiple targets are labeled, that is, the label values of the targets are labeled as information to label the targets, and usually are manually input. The label value comprises the information of the kind of the object, such as coke, biscuit or milk, etc. The step of labeling the target may be performed when the detection model detects the target of the first frame image, or may be performed after the video reading is finished, or may be performed during the process of reading the subsequent frame by the video. Therefore, the whole data acquisition process is more flexible, and detection and labeling can be asynchronously operated.

After the video acquisition is finished, labeling all the targets appearing in the video, and before uploading labeled data, performing image enhancement processing on each acquired picture. Specific ways of image enhancement processing include, but are not limited to: noise processing and distortion processing. And training the model after enhancing the image so that the obtained model has better detection effect and classification effect and can be better applied to different scenes. For example, even if the pictures are taken with different shooting devices and under illumination adjustment, the model can still be well distinguished, and the samples to be classified are detected or classified.

Referring to fig. 7, a second embodiment of the present invention provides a data set acquisition system, which includes a shooting module 10 and a multi-target tracking model 20, wherein the shooting module 10 is configured to shoot a plurality of targets; the multi-target tracking model 20 is used for reading each frame of image in the video and obtaining the position information and the characteristic information of each target in each frame of image to form a data set representing the target.

Referring to fig. 8, the data set collection system further includes a detection model 30, a signal may be transmitted between the detection model 30 and the multi-target tracking model 20, the detection model 30 is configured to determine a plurality of targets to be tracked in a first frame of a video, the multi-target tracking model 20 is configured to track the plurality of targets to be tracked obtained based on the detection model, when the multi-target tracking model 20 identifies that the tracked targets are lost, a signal is sent to the detection model 30, the detection model 30 determines whether the targets are lost, and the detection model 30 is further configured to determine whether new targets appear when reading an image of each frame.

Referring to fig. 9, the multi-target tracking model 20 includes an obtaining module 201, an operation module 202, a tracking module 203, a first determining module 204, and a second determining module 205. The obtaining module 201 is configured to obtain position information and feature information of an object in an image of a first frame, and obtain feature information of an image in a subsequent frame. The operation module 202 obtains position information and contour information related to the target in the subsequent frame by calculation based on the feature information of the target in the previous frame. The tracking module 203 is configured to track the target based on the position information and the contour information obtained by the operation module 202, that is, update the position of the target and follow the new tracking state. The first determining module 204 is used for determining whether the target is lost. The second determining module 205 is configured to determine whether the video is finished.

Referring to fig. 10, a third embodiment of the present invention provides an electronic apparatus, which includes a memory 41 and a processor 42, wherein the memory 41 stores a computer program, and the processor 42 is configured to execute any step of the data set acquisition method based on single target tracking according to the first embodiment by the computer program.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit of the present invention are intended to be included within the scope of the present invention.

Claims

1. A method of data set acquisition, characterized by: the method comprises the following steps:

s1, providing a plurality of targets, wherein the targets are cargoes;

A. the method comprises the steps that a detection model reads a first frame image in a video and determines a plurality of targets to be tracked, the detection model is obtained through training based on feature information matched with each target to be tracked, a feature template used for identifying the plurality of targets to be tracked is stored in the detection model, and when the detection model reads the first frame image, the feature information of a current picture is compared with information on the feature template to determine the plurality of targets to be tracked; s3, reading each frame of image in the video by using the multi-target tracking model, and obtaining the position information and/or the characteristic information of each target in each frame of image,

the step S3 specifically includes the following steps:

2. The data set acquisition method of claim 1, characterized in that: the multi-target tracking model judges whether a target is lost or not and simultaneously detects whether a new target appears or not when reading an image after a first frame, and if the new target appears, the multi-target tracking model acquires the characteristic information corresponding to the new target, inputs the characteristic information corresponding to the new target to label the new target and tracks the new target in a subsequent frame.

3. The data set acquisition method of claim 2, characterized in that: when the tracking model reads each subsequent frame of image, the detection model detects each frame of image to confirm whether the target is lost or detects at least one frame of image at intervals to confirm whether the target is lost.

4. The data set acquisition method of claim 2, characterized in that:

the multi-target tracking model for judging whether a new target appears comprises the following steps: when the multi-target tracking model acquires the characteristic information corresponding to the new target, the characteristic information of the new target is compared with the characteristic information of all targets in the previous frame, and if the characteristic information of the new target is not matched with the characteristic information of the targets in all the previous frames, the new target is considered to appear.

5. A system for performing the data set acquisition method of any one of claims 1-4, characterized by: the system comprises a shooting module and a multi-target tracking model, wherein the shooting module is used for carrying out video shooting on a plurality of targets under preset shooting angles, lighting conditions and background conditions; the multi-target tracking model is used for reading each frame of image in the video and obtaining the position information and the characteristic information of each target in each frame of image to form a data set, and the targets are goods; the data set acquisition system further comprises a detection model, signal transmission can be carried out between the detection model and the multi-target tracking model, the detection model is used for determining a plurality of targets to be tracked in a first frame on a video, the detection model is obtained by training based on characteristic information matched with each target to be tracked, a characteristic template used for identifying the plurality of targets to be tracked is stored in the detection model, and when the detection model reads a first frame image, the characteristic information of a current picture is compared with information on the characteristic template to determine the plurality of targets to be tracked;

the multi-target tracking model is used for tracking a plurality of targets needing to be tracked and obtained based on the detection model, the multi-target tracking model reads the first frame image, obtains position information and feature information of the plurality of targets needing to be tracked in the first frame image, tracks the plurality of targets needing to be tracked based on the position information and the feature information of the first frame target and updates the feature information and the position information of the targets in each frame, the feature information comprises contour information, and the multi-target tracking model tracks the targets in the subsequent frames based on the contour information and the position information of each target;

the multi-target tracking model judges whether a tracked target is lost or not when reading each subsequent frame of image, when the tracking model judges that the tracked target is lost, a signal is sent to the detection model, the detection model confirms whether the target is lost or not and continues to track the target which is not lost until the video is finished, and the detection model is also used for judging whether a new target appears when reading each frame of image.

6. An electronic device, characterized in that: the electronic device comprises a memory in which a computer program is stored and a processor arranged to perform any of the steps of the data set acquisition method according to any of claims 1-4 by means of the computer program.