CN112287911A

CN112287911A - Data labeling method, device, equipment and storage medium

Info

Publication number: CN112287911A
Application number: CN202011555025.XA
Authority: CN
Inventors: 张建安; 闾凡兵; 姚胜; 曾海文
Original assignee: Changsha Hisense Intelligent System Research Institute Co ltd
Current assignee: Changsha Hisense Intelligent System Research Institute Co ltd
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2021-01-29
Anticipated expiration: 2040-12-25
Also published as: CN112287911B

Abstract

The embodiment of the application provides a data annotation method, a data annotation device, data annotation equipment and a storage medium. The method comprises the following steps: acquiring video data shot by at least two shooting devices; acquiring an image sequence of at least one target object in each video data; extracting image characteristics of the target object in each image sequence; according to the image characteristics of the target object shot by a first shooting device and a second shooting device in at least two shooting devices, calculating the similarity between the target object shot by the first shooting device and the target object shot by the second shooting device; and labeling the image sequence of the target object shot by the first shooting device and the image sequence of the target object shot by the second shooting device with the maximum similarity. According to the method and the device, the image sequences of the target object shot by different shooting devices can be matched and labeled, the matching type data can be rapidly obtained, and the labeling efficiency is improved.

Description

Data labeling method, device, equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a data annotation method, apparatus, device, and storage medium.

Background

In the field of artificial intelligence, the labeled data is used as the input of model training and plays a decisive role in determining the performance of the model. With the development of artificial intelligence technology, people tend to solve problems from the perspective of enhancing data scale more and more in order to obtain models with strong generalization capability and wide adaptability, and therefore the demand for labeling data is more and more.

But for computer vision orientation, the retrieval of annotation data remains quite difficult. The traditional data annotation scheme can only perform annotation processing on shot videos of a single shooting device generally, and cannot perform matching annotation on shot videos of different shooting devices, for example, cannot perform matching annotation on image data of the same pedestrian under two cameras.

Disclosure of Invention

The embodiment of the application provides a data annotation method, a data annotation device, data annotation equipment and a storage medium, which can match and annotate image sequences of target objects shot by different shooting equipment, quickly acquire matching type data and improve annotation efficiency.

In a first aspect, an embodiment of the present application provides a data annotation method, where the method includes:

acquiring video data shot by at least two shooting devices;

acquiring an image sequence of at least one target object in each video data;

extracting image characteristics of the target object in each image sequence;

according to the image characteristics of the target object shot by a first shooting device and a second shooting device in at least two shooting devices, calculating the similarity between the target object shot by the first shooting device and the target object shot by the second shooting device;

and labeling the image sequence of the target object shot by the first shooting device and the image sequence of the target object shot by the second shooting device with the maximum similarity.

In some implementations of the first aspect, acquiring a sequence of images of at least one target object in each video data comprises:

acquiring a target object in each video data;

and tracking the target object according to each piece of video data to obtain an image sequence of the target object.

In some implementations of the first aspect, extracting image features of the target object in each image sequence includes:

and extracting the image characteristics of the target object in each image sequence according to a preset characteristic extraction model.

In some implementations of the first aspect, the sequence of images includes a start time;

according to the image characteristics of the target object respectively shot by a first shooting device and a second shooting device in at least two shooting devices, the similarity of the target object shot by the first shooting device and the target object shot by the second shooting device is calculated, and the similarity calculation method comprises the following steps:

calculating a time interval between a start time of an image sequence of a target object photographed by a first photographing apparatus and a start time of an image sequence of a target object photographed by a second photographing apparatus;

and under the condition that the time interval is less than or equal to the preset time length, calculating the similarity between the target object shot by the first shooting device and the target object shot by the second shooting device according to the image characteristics of the target objects respectively shot by the first shooting device and the second shooting device.

In some realizations of the first aspect, after labeling the sequence of images of the target object captured by the first capturing device and the sequence of images of the target object captured by the second capturing device with the greatest similarity, the method further includes:

receiving shooting equipment selection information input by a user, wherein the shooting equipment selection information comprises equipment identifications of a third shooting equipment and a fourth shooting equipment in at least two pieces of shooting equipment;

displaying an image sequence of the target object shot by the third shooting device and an image sequence of the target object shot by the fourth shooting device which is arranged according to the similarity size of the target object shot by the third shooting device and the target object shot by the fourth shooting device;

and when receiving an input of a user for selecting the image sequence of the target object shot by the fourth shooting device, marking the image sequence of the target object shot by the third shooting device and the selected image sequence.

and training a feature extraction model according to the marked data.

In a second aspect, an embodiment of the present application provides a data annotation device, where the device includes:

the image sequence acquisition module is used for acquiring video data shot by at least two shooting devices;

acquiring an image sequence of at least one target object in each video data;

the image feature extraction module is used for extracting the image features of the target object in each image sequence;

the similarity calculation module is used for calculating the similarity between the target object shot by the first shooting device and the target object shot by the second shooting device according to the image characteristics of the target object shot by the first shooting device and the second shooting device in the at least two shooting devices respectively;

In some implementations of the second aspect, the image sequence acquisition module is specifically configured to: acquiring a target object in each video data;

In some implementations of the second aspect, the image feature extraction module is specifically configured to: and extracting the image characteristics of the target object in each image sequence according to a preset characteristic extraction model.

In some implementations of the second aspect, the sequence of images includes a start time;

the similarity calculation module is specifically configured to: calculating a time interval between a start time of an image sequence of a target object photographed by a first photographing apparatus and a start time of an image sequence of a target object photographed by a second photographing apparatus;

In some implementations of the second aspect, the apparatus further comprises:

the shooting device selection module is used for receiving shooting device selection information input by a user after marking an image sequence of a target object shot by a first shooting device and an image sequence of the target object shot by a second shooting device with the maximum similarity, wherein the shooting device selection information comprises device identifications of a third shooting device and a fourth shooting device in at least two shooting devices;

the image sequence reading and displaying module is used for displaying an image sequence of the target object shot by the third shooting device and an image sequence of the target object shot by the fourth shooting device, wherein the image sequences are arranged according to the similarity of the target object shot by the third shooting device and the target object shot by the fourth shooting device;

and the image sequence labeling and submitting module is used for labeling the image sequence of the target object shot by the third shooting equipment and the selected image sequence when receiving the input of the user for selecting the image sequence of the target object shot by the fourth shooting equipment.

In some implementations of the second aspect, the apparatus further comprises:

and the model self-adaptive training module is used for training the feature extraction model according to the marked data after marking the image sequence of the target object shot by the first shooting equipment and the image sequence of the target object shot by the second shooting equipment with the maximum similarity.

In a third aspect, an embodiment of the present application provides a data annotation device, where the device includes: a processor and a memory storing computer program instructions; the data annotation method of the first aspect is implemented when the processor executes the computer program instructions.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, on which computer program instructions are stored, and the computer program instructions, when executed by a processor, implement the data annotation method according to the first aspect.

According to the data annotation method, the data annotation device, the data annotation equipment and the storage medium, the video data shot by at least two shooting devices are obtained, the image sequence of at least one target object in each video data is obtained, the image characteristics of the target object in each image sequence are extracted, the similarity between the target object shot by the first shooting device and the target object shot by the second shooting device is calculated according to the image characteristics of the target object respectively shot by the first shooting device and the second shooting device in the at least two shooting devices, and the image sequence of the target object shot by the first shooting device and the image sequence of the target object shot by the second shooting device with the maximum similarity are annotated. The image sequence of the target object shot by different shooting devices can be matched and labeled, the matching type data can be rapidly acquired, and the labeling efficiency is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic view of a data annotation scenario provided in an embodiment of the present application;

fig. 2 is a schematic flowchart of a data annotation method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a display interface provided by an embodiment of the present application;

FIG. 4 is a schematic structural diagram of a data annotation device according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a data annotation device according to an embodiment of the present application.

Detailed Description

Features and exemplary embodiments of various aspects of the present application will be described in detail below, and in order to make objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail below with reference to the accompanying drawings and the embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the application and do not limit the application. It will be apparent to one skilled in the art that the present application may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present application by illustrating examples thereof.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Currently, for the computer vision direction, the traditional data annotation scheme mainly manually performs annotation processing on a shot video of a single shooting device based on an annotation tool, which may include Labelme, vantic, and the like. However, under the condition that the matching annotation needs to be performed on the shot videos of different shooting devices, such as cross-border image retrieval or pedestrian re-identification tasks, the traditional data annotation scheme is often insufficient.

In order to solve the problem of the prior art, embodiments of the present application provide a data annotation method, apparatus, device, and storage medium. The method comprises the steps of obtaining video data shot by at least two shooting devices, obtaining an image sequence of at least one target object in each video data, extracting image features of the target object in each image sequence, calculating the similarity between the target object shot by the first shooting device and the target object shot by the second shooting device according to the image features of the target object respectively shot by the first shooting device and the second shooting device in the at least two shooting devices, and labeling the image sequence of the target object shot by the first shooting device and the image sequence of the target object shot by the second shooting device with the largest similarity. The image sequence of the target object shot by different shooting devices can be matched and labeled, the matching type data can be rapidly acquired, and the labeling efficiency is improved.

The data annotation method, apparatus, device and storage medium provided in the embodiments of the present application are described in detail below with reference to the accompanying drawings through specific embodiments and application scenarios thereof.

Fig. 1 is a schematic view of a data annotation scenario provided in an embodiment of the present application, as shown in fig. 1, in an area, for example, a cell, a school, a station, or an intersection, there are an electronic device 110 and at least two shooting devices 120. The electronic device 110 may be a mobile electronic device or a non-mobile electronic device. For example, the Mobile electronic device may be a notebook Computer, a palm top Computer, an Ultra-Mobile Personal Computer (UMPC), or the like, and the non-Mobile electronic device may be a server, a Network Attached Storage (NAS), or a Personal Computer (PC), or the like. The photographing apparatus 120 may be a camera, a camera mounted with a camera module, or the like. The electronic device 110 and the photographing device 120 communicate through a network. The network may be a wired communication network or a wireless communication network.

As shown in fig. 1, the photographing apparatus 120 may photograph a visible area thereof, generating video data. The electronic device 110 may acquire video data captured by at least two capturing devices 120 and acquire a sequence of images of at least one target object in each video data. Then, the image features of the target object in each image sequence are extracted, and the similarity between the target object photographed by the first photographing apparatus and the target object photographed by the second photographing apparatus is calculated according to the image features of the target object photographed by the first photographing apparatus and the second photographing apparatus, respectively, of the at least two photographing apparatuses 120. Then, the image sequence of the target object shot by the first shooting device and the image sequence of the target object shot by the second shooting device with the largest similarity are marked. And then matching and labeling of image sequences of target objects shot by different shooting devices are achieved, and matching type data are rapidly acquired.

The data annotation method provided by the embodiment of the present application will be described in detail below. The payment method can be applied to the data annotation scenario shown in fig. 1.

Fig. 2 is a schematic flowchart of a data annotation method provided in an embodiment of the present application, and as shown in fig. 2, the data annotation method may include the following steps:

and S210, acquiring video data shot by at least two shooting devices.

At least two shooting devices can be shooting devices in the same area, such as multiple shooting devices on the same road, intersection or cell. The photographing data is generated by photographing the visible region thereof by the photographing apparatus.

S220, acquiring an image sequence of at least one target object in each video data.

In one embodiment, the target object in each video data may be acquired. Specifically, target detection may be performed on each video data according to a preset target detection model to obtain a target object in each video data. The target object may be a pedestrian, a vehicle, an animal, or the like.

And then tracking the target object according to each piece of video data to obtain an image sequence of the target object. Specifically, the target object in each video data may be tracked according to a preset target tracking model, so as to generate an image sequence of the target object. Alternatively, a sequence identification may be generated for the sequence of images as an identity number of the sequence.

For example, target detection may be performed on each video data according to a preset YOLOv5 model, so as to obtain a target object in each video data. And tracking the target object in each video data according to a preset Deepsort model to generate an image sequence of the target object.

And S230, extracting the image characteristics of the target object in each image sequence.

In one embodiment, the image features of the target object in each image sequence may be extracted according to a preset feature extraction model. The feature extraction model may be a model such as VGG, Residual Neural Network (ResNet), densneet, PCB, HPM, or fastrun id. Therefore, the feature extraction model has strong feature extraction capability and can accurately extract the image features of the target object. It can be understood that the feature extraction model can be flexibly adjusted according to actual needs, and is not limited herein.

And S240, calculating the similarity between the target object shot by the first shooting device and the target object shot by the second shooting device according to the image characteristics of the target object shot by the first shooting device and the second shooting device in the at least two shooting devices respectively.

Specifically, the similarity between the target object photographed by the first photographing apparatus and the target object photographed by the second photographing apparatus may be calculated using a preset similarity calculation model and image characteristics of the target objects photographed by the first photographing apparatus and the second photographing apparatus, respectively. Wherein the first photographing apparatus is different from the second photographing apparatus. Alternatively, the similarity calculation model may be an euclidean distance, a cosine distance, a mahalanobis distance, a babbitt distance, a valley distance, or a jaccard distance equidistance calculation model. In addition, the similarity calculation model may also be a metric learning model such as Probabilistic Linear Discriminant Analysis (PLDA) or KISSME, and the similarity may be adaptively calculated based on the metric learning model.

As one example, the target objects photographed by the photographing apparatus 1 are a pedestrian 1, a pedestrian 2, and a pedestrian 3, and the target objects photographed by the photographing apparatus 2 are a pedestrian 4, a pedestrian 5, and a pedestrian 6. Calculating the similarity of the image features of the pedestrian 1 and the

pedestrian

4, 5 and 6 respectively, namely the similarity of the pedestrian 1 and the

pedestrian

4, 5 and 6 respectively, by using a preset PLDA model and the image features of the

pedestrian

1, 2, 3, 4, 5 and 6; calculating the similarity of the pedestrian 2 to the pedestrian 4, the pedestrian 5 and the pedestrian 6 respectively; the degrees of similarity of the pedestrian 3 to the pedestrian 4, the pedestrian 5, and the pedestrian 6, respectively, are calculated.

And S250, labeling the image sequence of the target object shot by the first shooting device and the image sequence of the target object shot by the second shooting device with the maximum similarity.

Specifically, the image sequence of the target object shot by the first shooting device and the image sequence of the target object shot by the second shooting device with the largest similarity are matched and labeled to indicate that the two are the same person.

Referring to the example in S240, the degrees of similarity of the pedestrian 1 to the pedestrian 4, the pedestrian 5, and the pedestrian 6 are 98%, 40%, and 30%, respectively, the degrees of similarity of the pedestrian 2 to the pedestrian 4, the pedestrian 5, and the pedestrian 6 are 50%, 97%, and 30%, respectively, and the degrees of similarity of the pedestrian 3 to the pedestrian 4, the pedestrian 5, and the pedestrian 6 are 40%, 20%, and 95%, respectively. At this time, the image series of the pedestrian 1 and the pedestrian 4, the image series of the pedestrian 2 and the pedestrian 5, and the image series of the pedestrian 3 and the pedestrian 6 are labeled.

For example, it is also possible to store the image sequences corresponding to the similarity of the target object captured by the first capturing device and the second capturing device in association, for example, in a list form, and generate the storage path information. And then, the device identifications of the first shooting device and the second shooting device are stored in association with the storage path information, so that the subsequent calling is facilitated. The information may be stored in a database, such as a MySQL database, or may be stored in a memory.

In the embodiment of the application, video data shot by at least two shooting devices and an image sequence of at least one target object in each video data can be acquired, image features of the target object in each image sequence are extracted, the similarity between the target object shot by the first shooting device and the target object shot by the second shooting device is calculated according to the image features of the target object respectively shot by the first shooting device and the second shooting device in the at least two shooting devices, and the image sequence of the target object shot by the first shooting device and the image sequence of the target object shot by the second shooting device with the largest similarity are labeled. Therefore, matching and labeling can be performed on the image sequences of the target objects shot by different shooting devices, matching type data can be rapidly acquired, and labeling efficiency is improved.

In the annotation process, given the image feature of the target object captured by the first capturing device, it is necessary to calculate the similarity with the image feature of the target object captured by the second capturing device. In practical application, the image features of the target object shot by the second shooting device are more, and the calculation amount of the similarity is larger.

Therefore, in one embodiment, the image sequence may include a start time, a time interval between the start time of the image sequence of the target object captured by the first capturing device and the start time of the image sequence of the target object captured by the second capturing device may be calculated, and in a case where the time interval is less than or equal to a preset time period, a similarity between the target object captured by the first capturing device and the target object captured by the second capturing device is calculated according to image characteristics of the target objects captured by the first capturing device and the second capturing device, respectively. And under the condition that the time interval is greater than the preset time length, no calculation is carried out. Therefore, a part of image sequences with overlarge time intervals can be filtered during similarity calculation, and the similarity calculation efficiency is improved.

In one embodiment, the annotated data may be stored. Illustratively, it may be stored in bulk. Illustratively, the data sets and related tables may be custom stored in a MySQL storage format. Meanwhile, the stored data can be converted into multiple data formats such as JSON, HDF5, XML, Pickle, Matlab and the like.

The preset feature extraction model and the similarity calculation model are obtained by training on a common data set, but the common data set is usually different from a data labeling scene in distribution, so that the models have deviation. Therefore, the model and the similarity calculation model can be extracted according to the labeled data training features, fine adjustment is achieved, the scene deviation of the model is reduced, and the model is better suitable for data labeling scenes.

In one embodiment, after S250, photographing apparatus selection information input by the user may be received. The shooting device selection information includes device identifiers of a third shooting device and a fourth shooting device of the at least two shooting devices.

And then displaying the image sequence of the target object shot by the third shooting device and the image sequence of the target object shot by the fourth shooting device which is arranged according to the similarity size of the target object shot by the third shooting device and the target object shot by the fourth shooting device. Illustratively, the stored image sequence can be read according to the shooting device selection information, and a folder reading mode, a compatible interactive database retrieval mode, a predefined list mode or a visual map selection mode can be adopted in implementation. One image of the sequence of images may be randomly displayed when displayed. When the displayed image is fuzzy, the user can click the image, and in response to the user input, the image sequence corresponding to the image can be displayed in a video form, so that the user can capture dynamic information for subsequent judgment.

Referring to the examples in S240 and S250, the image sequences 1, 2, 3 of the pedestrian 1, the pedestrian 2, and the pedestrian 3 are displayed in a column, the

image sequences

4, 5, 6 of the

pedestrian

4, 5, and 6 are displayed in an arrangement from large to small with the similarity to the pedestrian 1, the image sequences of the

pedestrian

4, 5, and 6 are displayed in an arrangement from large to small with the similarity to the pedestrian 2, and the image sequences of the

pedestrian

4, 5, and 6 are displayed in an arrangement from large to small with the similarity to the pedestrian 3. Illustratively, the display may be in the form of table 1.

TABLE 1

And when receiving an input of a user for selecting the image sequence of the target object shot by the fourth shooting device, marking the image sequence of the target object shot by the third shooting device and the selected image sequence. The user can check the displayed image sequences to determine whether the machine annotation is reasonable, namely whether the initially annotated target objects of the third shooting device and the fourth shooting device are the same, and if not, the user can select the image sequences of the target objects which are the same as the target objects shot by the third shooting device from the image sequences of other target objects shot by the fourth shooting device. In response to the user's input, the image sequence of the target object photographed by the third photographing apparatus and the selected image sequence may be annotated. Therefore, manual correction can be introduced, and the marking accuracy is improved.

As an example, as shown in table 1, the user views the target objects in image sequence 1 and image sequence 4, finds them different, and finds that the target object in image sequence 5 is the same as the target object in image sequence 1, at which time the user may double-click on selected image sequence 5. In response to a user input, the image sequence 1 and the selected image sequence 5 may be annotated.

Optionally, manually annotated data may be stored. And a feature extraction model and a similarity calculation model can be trained according to the manually labeled data.

In practical application, as shown in fig. 3, the denggaolu _1_ a and the denggaolu _2_ a are identifiers of two shooting devices, the image sequences corresponding to the two shooting devices are shown in a column under the denggaolu _1_ a, the image sequences corresponding to the two shooting devices are shown in a column under the denggaolu _2_ a, and the image sequences are arranged according to the sequence of similarity of the target object shot by the shooting device denggaolu _1_ a from large to small, and it is known that the first column under the shooting device denggaolu _2_ a is selected by default, that is, the image sequence with the largest default labeling similarity is selected. And when the user checks that the default image sequence is not appropriate, the user can select the image sequence in the same row, if the image sequence is not appropriate, the user selects a NULL option, and after the operation of each row is finished, the user can click the submission option to store the data.

Based on the data annotation method provided by the embodiment of the present application, the embodiment of the present application further provides a data annotation device, as shown in fig. 4, the data annotation device 400 may include two parts, namely a user front end and an algorithm back end. The user front end can comprise a shooting equipment selection module, a time constraint module, an image sequence reading and displaying module, a video displaying module and an image sequence marking and submitting module. The algorithm rear end can comprise a model self-adaptive training module, an image sequence acquisition module, an image feature extraction module, a similarity calculation module and a data storage and format conversion module.

The image sequence acquisition module is used for acquiring video data shot by at least two shooting devices and acquiring an image sequence of at least one target object in each video data.

The image feature extraction module is used for extracting the image features of the target object in each image sequence.

The similarity calculation module is used for calculating the similarity between the target object shot by the first shooting device and the target object shot by the second shooting device according to the image characteristics of the target object shot by the first shooting device and the second shooting device in the at least two shooting devices respectively. And labeling the image sequence of the target object shot by the first shooting device and the image sequence of the target object shot by the second shooting device with the maximum similarity.

In an embodiment, the image sequence acquiring module is specifically configured to acquire a target object in each piece of video data, track the target object according to each piece of video data, and acquire an image sequence of the target object.

In one embodiment, the image feature extraction module is specifically configured to extract the image feature of the target object in each image sequence according to a preset feature extraction model.

In one embodiment, the sequence of images includes a start time. The similarity calculation module is specifically configured to calculate a time interval between a start time of an image sequence of the target object captured by the first capturing device and a start time of an image sequence of the target object captured by the second capturing device. And under the condition that the time interval is less than or equal to the preset time length, calculating the similarity between the target object shot by the first shooting device and the target object shot by the second shooting device according to the image characteristics of the target objects respectively shot by the first shooting device and the second shooting device.

In one embodiment, the photographing apparatus selection module is configured to receive photographing apparatus selection information input by a user after labeling an image sequence of a target object photographed by a first photographing apparatus and an image sequence of a target object photographed by a second photographing apparatus having a largest similarity. The shooting device selection information includes device identifiers of a third shooting device and a fourth shooting device of the at least two shooting devices.

The image sequence reading and displaying module is used for displaying an image sequence of the target object shot by the third shooting device and an image sequence of the target object shot by the fourth shooting device, wherein the image sequences are arranged according to the similarity of the target object shot by the third shooting device and the target object shot by the fourth shooting device.

And the image sequence labeling and submitting module is used for labeling the image sequence of the target object shot by the third shooting equipment and the selected image sequence when receiving the input of the user selecting the image sequence of the target object shot by the fourth shooting equipment.

In one embodiment, the model adaptive training module is used for training the feature extraction model according to the marked data after marking the image sequence of the target object shot by the first shooting device and the image sequence of the target object shot by the second shooting device with the maximum similarity.

As a specific example, referring to FIG. 4, a model adaptive training module may be used to train models with a common data set. And obtaining a target detection model, a target tracking model, a feature extraction model and a similarity calculation model aiming at a general scene.

The image sequence acquisition module may be configured to acquire video data captured by at least two capturing devices, and perform target detection on each video data according to a target detection model trained by the model adaptive training module to obtain a target object in each video data. And tracking the target object in each video data according to the target tracking model trained by the model self-adaptive training module to obtain an image sequence of the target object.

The image feature extraction module may be configured to extract image features of the target object in each image sequence according to the feature extraction model trained by the model adaptive training module.

The similarity calculation module may be configured to calculate a time interval between a start time of an image sequence of the target object captured by the first capturing device and a start time of an image sequence of the target object captured by the second capturing device according to the image sequence acquired by the image sequence acquisition module, and calculate a similarity between the target object captured by the first capturing device and the target object captured by the second capturing device according to the similarity calculation model trained by the model adaptive training module and the image features of the target object captured by the first capturing device and the second capturing device, when the time interval is less than or equal to a preset time duration. And labeling the image sequence of the target object shot by the first shooting device and the image sequence of the target object shot by the second shooting device with the largest similarity.

The data storage and format conversion module may be configured to store, in the memory, the image sequence corresponding to the similarity between the target object captured by the first capturing device and the target object captured by the second capturing device in an associated manner according to the similarity calculated by the image sequence and similarity calculation module obtained by the image sequence obtaining module, and generate the storage path information. And the device identifications of the first photographing device and the second photographing device may be stored in the database in association with the storage path information.

The photographing apparatus selection module may be configured to receive photographing apparatus selection information input by a user.

The time constraint module may be configured to receive time constraint information, i.e., a preset time duration, input by a user.

And the image sequence reading and displaying module can read the image sequence from the data storage and format conversion module according to the shooting device selection information received by the shooting device selection module and the time constraint information received by the time constraint module, namely the image sequence of the target object shot by the third shooting device and the image sequence of the target object shot by the fourth shooting device. And displaying images in the image sequence of the target object shot by the third shooting device and images in the image sequence of the target object shot by the fourth shooting device which are arranged according to the similarity size of the target object shot by the third shooting device and the target object shot by the fourth shooting device.

When the displayed image is blurred, the user may click on the image, and in response to user input, the video display module may be configured to display a sequence of images corresponding to the image in video form.

When receiving an input that the user selects the image sequence of the target object shot by the fourth shooting device, the image sequence labeling and submitting module may be configured to label the image sequence of the target object shot by the third shooting device and the selected image sequence.

The user can click the submission option, and in response to the user input, the image sequence labeling and submission module can submit the manually labeled data to the data storage and format conversion module and store the data by the data storage and format conversion module.

The model self-adaptive training module can be used for training the feature extraction model and the similarity calculation model according to the manual labeling data stored by the data storage and format conversion module.

It can be understood that each module/unit in the data annotation device 400 shown in fig. 4 has a function of implementing each step in the data annotation method provided in the embodiment of the present application, and can achieve the corresponding technical effect, and for brevity, no further description is provided here.

As shown in fig. 5, the data annotation device 500 in the present embodiment includes an input device 501, an input interface 502, a central processing unit 503, a memory 504, an output interface 505, and an output device 506. The input interface 502, the central processing unit 503, the memory 504, and the output interface 505 are connected to each other through a bus 510, and the input device 501 and the output device 506 are connected to the bus 510 through the input interface 502 and the output interface 505, respectively, and further connected to other components of the data annotation device 500.

Specifically, the input device 501 receives input information from the outside and transmits the input information to the central processor 503 through the input interface 502; the central processor 503 processes input information based on computer-executable instructions stored in the memory 504 to generate output information, temporarily or permanently stores the output information in the memory 504, and then transmits the output information to the output device 506 through the output interface 505; the output device 506 outputs the output information to the exterior of the data annotation device 500 for use by a user.

In one embodiment, the data annotation device 500 shown in FIG. 5 comprises: a memory 504 for storing programs; the processor 503 is configured to execute the program stored in the memory to implement the data annotation method provided in the embodiment of the present application.

Embodiments of the present application further provide a computer-readable storage medium having computer program instructions stored thereon; the computer program instructions realize the data annotation method provided by the embodiment of the application when being executed by the processor.

It should be clear that each embodiment in this specification is described in a progressive manner, and the same or similar parts among the embodiments may be referred to each other, and for brevity, the description is omitted. The present application is not limited to the specific configurations and processes described above and shown in the figures. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present application are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications, and additions or change the order between the steps after comprehending the spirit of the present application.

The functional blocks shown in the above-described structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic Circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the present application are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of machine-readable media include electronic circuits, semiconductor Memory devices, Read-Only memories (ROMs), flash memories, erasable ROMs (eroms), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.

It should also be noted that the exemplary embodiments mentioned in this application describe some methods or systems based on a series of steps or devices. However, the present application is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.

Aspects of the present disclosure are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such a processor may be, but is not limited to, a general purpose processor, a special purpose processor, an application specific processor, or a field programmable logic circuit. It will also be understood that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware for performing the specified functions or acts, or combinations of special purpose hardware and computer instructions.

As described above, only the specific embodiments of the present application are provided, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present application, and these modifications or substitutions should be covered within the scope of the present application.

Claims

1. A method for annotating data, the method comprising:

acquiring video data shot by at least two shooting devices;

acquiring an image sequence of at least one target object in each video data;

extracting image characteristics of the target object in each image sequence;

according to the image characteristics of the target object shot by the first shooting device and the second shooting device in the at least two shooting devices, calculating the similarity between the target object shot by the first shooting device and the target object shot by the second shooting device;

2. The method of claim 1, wherein the obtaining a sequence of images of at least one target object in each video data comprises:

acquiring a target object in each video data;

3. The method of claim 1, wherein the extracting image features of the target object in each image sequence comprises:

4. The method of claim 1, wherein the sequence of images includes a start time;

the calculating the similarity between the target object shot by the first shooting device and the target object shot by the second shooting device according to the image characteristics of the target object shot by the first shooting device and the second shooting device in the at least two shooting devices respectively comprises the following steps:

5. The method according to any one of claims 1 to 4, wherein after labeling the sequence of images of the target object captured by the first capturing device and the sequence of images of the target object captured by the second capturing device having the greatest similarity, the method further comprises:

receiving shooting equipment selection information input by a user, wherein the shooting equipment selection information comprises equipment identifications of a third shooting equipment and a fourth shooting equipment in the at least two shooting equipments;

displaying an image sequence of a target object shot by a third shooting device and an image sequence of a target object shot by a fourth shooting device which is arranged according to the similarity of the target object shot by the third shooting device and the target object shot by the fourth shooting device;

6. The method according to claim 3, wherein after labeling the sequence of images of the target object captured by the first capturing device and the sequence of images of the target object captured by the second capturing device having the greatest similarity, the method further comprises:

and training the feature extraction model according to the marked data.

7. A data annotation device, said device comprising:

acquiring an image sequence of at least one target object in each video data;

8. The apparatus of claim 7, further comprising:

the shooting device selection module is used for receiving shooting device selection information input by a user after marking an image sequence of a target object shot by a first shooting device and an image sequence of the target object shot by a second shooting device with the maximum similarity, wherein the shooting device selection information comprises device identifications of a third shooting device and a fourth shooting device in the at least two shooting devices;

the image sequence reading and displaying module is used for displaying an image sequence of a target object shot by a third shooting device and an image sequence of a target object shot by a fourth shooting device which is arranged according to the similarity of the target object shot by the third shooting device and the target object shot by the fourth shooting device;

9. A data annotation apparatus, characterized in that said apparatus comprises: a processor and a memory storing computer program instructions; the processor, when executing the computer program instructions, implements the data annotation method of any one of claims 1-6.

10. A computer-readable storage medium having computer program instructions stored thereon which, when executed by a processor, implement the data annotation method of any one of claims 1-6.