CN113657307A

CN113657307A - Data labeling method and device, computer equipment and storage medium

Info

Publication number: CN113657307A
Application number: CN202110963168.2A
Authority: CN
Inventors: 侯欣如; 姜翰青; 刘浩敏; 王楠; 盛崇山
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2021-08-20
Filing date: 2021-08-20
Publication date: 2021-11-16

Abstract

The present disclosure provides a data labeling method, apparatus, computer device and storage medium, wherein the method comprises: acquiring a video obtained by acquiring an image of a preset area of target equipment by using image acquisition equipment; determining a key frame image comprising the preset area from the video; generating attribute annotation data of the video in response to annotation data obtained by performing annotation processing on the preset region in the key frame image; and generating target acquisition data based on the attribute labeling data and the video.

Description

Data labeling method and device, computer equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a data annotation method, an apparatus, a computer device, and a storage medium.

Background

In order to facilitate management of physical devices, such as management of electrical devices in a machine room, currently, digital assets corresponding to the devices are generated based on the devices in the machine room, which relates to a generation mode of the digital assets; currently, a device management form is created, and device data is filled in the device management form to generate digital assets corresponding to a device. However, since there are many devices, interfaces, control buttons, etc. on the device, when the device data of the device is recorded in a manual filling manner, omission may occur during labeling, resulting in loss of digital assets.

Disclosure of Invention

The embodiment of the disclosure at least provides a data annotation method, a data annotation device, computer equipment and a storage medium.

In a first aspect, an embodiment of the present disclosure provides a data annotation method, including: acquiring a video obtained by acquiring an image of a preset area of target equipment by using image acquisition equipment; determining a key frame image comprising the preset area from the video; generating attribute annotation data of the video in response to annotation data obtained by performing annotation processing on the preset region in the key frame image; and generating target acquisition data based on the attribute labeling data and the video.

In this way, the key frame image can correspondingly show the corresponding annotation data while showing the target device, so that the target device for which data annotation is not completed by the target device and the position in the key frame image can be more intuitively displayed when data annotation is performed, so that the attribute annotation data of the video can be more easily obtained, and the target acquisition data can be generated. Therefore, the phenomenon of omission in data labeling can be reduced, and the digital assets in a computer room are prevented from being lost. Meanwhile, the generated target acquisition data not only comprises attribute labeling data of the target equipment, but also comprises a video of the target equipment, so that the data is more comprehensive.

In an optional implementation manner, the labeling data obtained by labeling the preset region in the key frame image includes: generating a preview image based on the key frame image, and displaying the preview image; wherein the preview image has a resolution lower than the resolution of the key frame image; responding to the trigger of any position in the preview image, and showing a property configuration control corresponding to the position; receiving the annotation data input from the property configuration control.

In this way, since the size of the preview image is small, the amount of data transmission is small when performing data annotation, and the transmission efficiency is high. In addition, the mode of determining the data marking position by triggering any position of the preview image is more flexible.

In an optional implementation manner, the labeling data obtained by labeling the preset region in the key frame image includes: performing preset identification processing on the key frame image to obtain attribute information contained in an object to be identified in the key frame image; the object to be identified is located in a preset area of the target device; determining the position information of the object to be identified in the three-dimensional entity model corresponding to the target equipment based on the position information of the object to be identified in the key frame image and the pose of the image acquisition equipment when acquiring the key frame image; and performing labeling processing on the key frame image based on the object to be recognized, the position information of the object to be recognized in the three-dimensional solid model corresponding to the target equipment and the attribute information contained in the object to be recognized to obtain the labeling data.

Therefore, through the preset identification processing, the relevant information of the object to be identified can be determined more quickly, less manual intervention or manual operation is needed, and the efficiency is higher.

In an alternative embodiment, the object to be identified comprises an icon; wherein the icon comprises at least one of: positioning corner marks and identification codes; the preset identification processing is performed on the key frame image to obtain attribute information contained in an object to be identified in the key frame image, and the preset identification processing comprises the following steps: and carrying out object recognition on the key frame image, and determining an icon corresponding to a preset area of the target equipment in the key frame image.

In an optional embodiment, the object to be recognized includes a text; the preset identification processing is performed on the key frame image to obtain attribute information contained in an object to be identified in the key frame image, and the preset identification processing comprises the following steps: and performing optical character OCR recognition on the key frame image, and determining characters corresponding to a preset area of the target equipment in the key frame image.

Therefore, the object to be identified is convenient to identify, and the efficiency of data labeling can be effectively improved.

In an optional implementation manner, the acquiring a video obtained by acquiring an image of a preset area of a target device by using an image acquisition device includes: controlling the image acquisition equipment to acquire an image of a preset area of the target equipment to obtain a first video; determining a complementary acquisition region corresponding to the target device and to be subjected to complementary acquisition based on the first video and the pose of the image acquisition device when acquiring the first video; and controlling the image acquisition equipment to perform complementary acquisition on the target equipment based on the complementary acquisition area to obtain a second video.

Therefore, the complementary acquisition region is synchronously confirmed and complementary acquisition operation is carried out on the complementary acquisition region, and the second video can be used for supplementing the acquired missing part in the first video, so that the obtained video can more completely display images at different positions corresponding to the preset region of the target device.

In an optional implementation manner, the determining, based on the first video and a pose of the image capturing device when the first video is captured, a complementary capture area corresponding to the target device and having to be captured complementarily includes: performing three-dimensional reconstruction on the target equipment based on the first video and the pose of the image acquisition equipment when acquiring the first video to generate a three-dimensional entity model of the target equipment; and determining a complementary collection area to be subjected to complementary collection corresponding to the target equipment based on the three-dimensional solid model.

In this way, by using the three-dimensional scene model determined by three-dimensional reconstruction to determine the complementary mining area, the part in which the three-dimensional solid model cannot be established can be determined more clearly, so as to determine the complementary mining area corresponding to the target device more accurately.

In an optional implementation manner, based on the complementary acquisition region, controlling the image acquisition device to perform complementary acquisition on the target device to obtain a second video includes: and controlling the image acquisition equipment to perform complementary acquisition on the complementary acquisition region based on the current pose of the image acquisition equipment and the corresponding position of the complementary acquisition region in the three-dimensional solid model to obtain the second video.

Therefore, the acquisition strategy of the complementary acquisition area can be determined more accurately and efficiently through the determined second video and the determined current pose of the image acquisition equipment, and the second video for the complementary acquisition area is obtained.

In an optional implementation manner, the determining, based on the three-dimensional solid model, a complementary acquisition region corresponding to the target device and having to be subjected to complementary acquisition includes: detecting whether an area which is not modeled completely exists in the three-dimensional solid model or not based on three-dimensional position information of each dense point cloud point in the three-dimensional solid model in the target equipment; and if the area which is not modeled completely exists, determining the area which is not modeled completely as the complementary mining area.

Therefore, the accurate position of the complementary mining area can be reflected more accurately through the three-dimensional position information of the dense point cloud points, so that the area needing data complementary mining can be determined more accurately.

In an optional implementation manner, the determining, based on the three-dimensional solid model, a complementary acquisition region corresponding to the target device and having to be subjected to complementary acquisition includes: displaying the three-dimensional solid model; and determining the triggered region as the complementary mining region in response to the triggering operation of the user on any region in the three-dimensional solid model.

Therefore, the triggering operation of the user can be responded, and the complementary mining area can be selected more flexibly.

In an optional embodiment, the method further comprises: acquiring the pose of the image acquisition equipment when acquiring the video; generating target collection data based on the attribute labeling data and the video, comprising: generating the target acquisition data based on the attribute labeling data, the video, and the pose.

In an alternative embodiment, the target device includes a cabinet; the preset area of the target device comprises at least one of the following: a cabinet inner side, and a cabinet outer side.

In a second aspect, an embodiment of the present disclosure further provides a data annotation device, including: the first acquisition module is used for acquiring a video obtained by acquiring an image of a preset area of target equipment by using image acquisition equipment; the determining module is used for determining a key frame image comprising the preset area from the video; the generating module is used for responding to annotation data obtained by performing annotation processing on the preset area in the key frame image and generating attribute annotation data of the video; and the processing module is used for generating target acquisition data based on the attribute labeling data and the video.

In a third aspect, this disclosure also provides a computer device, a processor, and a memory, where the memory stores machine-readable instructions executable by the processor, and the processor is configured to execute the machine-readable instructions stored in the memory, and when the machine-readable instructions are executed by the processor, the machine-readable instructions are executed by the processor to perform the steps in the first aspect or any one of the possible implementations of the first aspect.

In a fourth aspect, this disclosure also provides a computer-readable storage medium having a computer program stored thereon, where the computer program is executed to perform the steps in the first aspect or any one of the possible implementation manners of the first aspect.

For the description of the effects of the data annotation device, the computer device, and the computer-readable storage medium, reference is made to the description of the data annotation method, which is not repeated herein.

In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.

Fig. 1 shows a flowchart of a data annotation method provided by an embodiment of the present disclosure;

fig. 2 illustrates a schematic diagram of a cabinet provided by an embodiment of the present disclosure;

FIG. 3 illustrates a flowchart that corresponds to an embodiment of generating a three-dimensional solid model provided by an embodiment of the present disclosure;

FIG. 4 is a flow chart illustrating an embodiment of the present disclosure when performing data complementary mining;

FIG. 5 is a schematic diagram illustrating a video frame image displayed on a graphical display interface according to an embodiment of the disclosure;

FIG. 6 is a schematic diagram illustrating a video frame image and a three-dimensional solid model displayed on a graphical display interface according to an embodiment of the disclosure;

FIG. 7 is a schematic diagram illustrating a graphical display interface when displaying a key frame image and an annotation control according to an embodiment of the disclosure;

FIG. 8 is a diagram illustrating another graphical display interface for displaying a key frame image and annotating a control provided by embodiments of the present disclosure;

FIG. 9 is a flow chart illustrating an embodiment of the present disclosure for performing data annotation;

FIG. 10 is a flowchart illustrating data tagging performed on a cabinet according to an embodiment of the present disclosure;

FIG. 11 is a schematic diagram of a data annotation device provided in an embodiment of the disclosure;

fig. 12 shows a schematic diagram of a computer device provided by an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of embodiments of the present disclosure, as generally described and illustrated herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.

Based on the above research on the background technology, the present disclosure provides a data annotation method, in which an image acquisition device acquires an image of a preset area of a target device to obtain a video, and performs annotation by using a determined key frame image in the video. The key frame image can correspondingly show the corresponding annotation data while showing the target equipment, so that the target equipment which does not finish data annotation aiming at the target equipment and the position in the key frame image can be more intuitively displayed when data annotation is carried out, so that the attribute annotation data of the video can be more easily obtained, and the target acquisition data can be generated. Therefore, the phenomenon of omission in data labeling can be reduced, and the digital assets in a computer room are prevented from being lost.

The above-mentioned drawbacks are the results of the inventor after practical and careful study, and therefore, the discovery process of the above-mentioned problems and the solutions proposed by the present disclosure to the above-mentioned problems should be the contribution of the inventor in the process of the present disclosure.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

To facilitate understanding of the present embodiment, first, a data annotation method disclosed in the embodiments of the present disclosure is described in detail, where an execution subject of the data annotation method provided in the embodiments of the present disclosure is generally a data processing device with certain computing capability, and the data processing device includes, for example: a terminal device, which may be a User Equipment (UE), a mobile device (e.g., a tablet, or a cell phone as described in the examples below), a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or a server or other processing device. In a possible case, the target space can also be equipped with a dedicated data processing device, for example a management computer in a computer room or a portable handheld management device. Specifically, the determination may be performed according to actual situations, and details are not described herein. In addition, in some possible implementations, the data annotation method can be implemented by a processor calling computer-readable instructions stored in a memory.

Referring to fig. 1, a flowchart of a data annotation method provided in the embodiment of the present disclosure is shown, where the method includes steps S101 to S104, where:

s101: acquiring a video obtained by acquiring an image of a preset area of target equipment by using image acquisition equipment;

s102: determining a key frame image comprising the preset area from the video;

s103: generating attribute annotation data of the video in response to annotation data obtained by performing annotation processing on the preset region in the key frame image;

s104: and generating target acquisition data based on the attribute labeling data and the video.

The following describes the details of S101 to S104.

For the above S101, the image capturing device may include a camera carried by a mobile device such as a mobile phone, for example. Particularly, mobile devices such as mobile phones and the like are small in size and light, so that the mobile device is more suitable for image acquisition of equipment structures on the cabinet.

Here, in a specific implementation, for example, the cabinet may be used as a target device, and positions of different device structures on the cabinet may be used as preset regions of the target device, such as an inner side of the cabinet and an outer side of the cabinet. In a possible case, if a plurality of equipment structures to be labeled are installed on the cabinet, the positions of the equipment structures to be labeled are correspondingly used as the preset areas of the target equipment one by one for image acquisition.

Illustratively, a specific scenario of data annotation is also provided in the embodiments of the present disclosure. In this specific scenario, when the image capturing device performs image capturing, for example, reference may be made to fig. 2, where fig. 2 is a schematic diagram of a cabinet provided in an embodiment of the present disclosure. Wherein, fig. 2 (a) shows the structure of the equipment on the front, right and upper three planes of the outer side of the cabinet; accordingly, fig. 2 (b) shows the equipment structure on the left side of the inside surface of the cabinet and on the two rear planes when the cabinet is opened.

Specifically, for the cabinet shown in fig. 2 (a), the text label "cabinet 1" and the bar two-dimensional code of the cabinet are shown on the plane where the front surface of the outer side of the cabinet is located, and three buttons are correspondingly shown, including button 1, button 2, and button 3; meanwhile, four signal interfaces are shown, including a signal interface 1, a signal interface 2, a signal interface 3, and a signal interface 4; in addition, a signal 1 display screen corresponding to signal 1 and a signal 2 display screen corresponding to signal 2 are shown. On the right side of the outside of the cabinet, four interfaces of the cabinet are shown, including interface 1, interface 2, interface 3, and interface 4. On the outside of the cabinet, two toggle switches of the cabinet are shown, including toggle switch 1 and toggle switch 2.

For the cabinet shown in fig. 2 (b), two interfaces, including interface 5 and interface 6, are provided on the left side of the inner side of the cabinet. At the rear of the inside of the cabinet, a signal 3 display screen corresponding to signal 3 and a signal 4 display screen corresponding to signal 4 are shown, and two data line groups are shown, including a data line group 1 and a data line group 2, each of which includes a plurality of data lines.

Specifically, when the image capturing device captures an image to obtain a video, for example, the image capturing device may capture an image of a target device by controlling a robot equipped with the image capturing device to obtain the video; alternatively, the target device may be image-captured by a worker such as a survey worker who holds the image capture device, so as to obtain the video. The video may include, for example, a first video described below, or a first video and a second video described below.

Here, when the image capturing device captures an image of the target device, for example, multiple video frame images corresponding to the target device may be obtained, and a video corresponding to the target space may also be obtained correspondingly. In a specific implementation, when the data processing device controls the image capturing device to capture an image to acquire a video, for example, the following manner may be adopted: controlling the image acquisition equipment to acquire an image of a preset area of the target equipment to obtain a first video; determining a complementary acquisition region corresponding to the target device and to be subjected to complementary acquisition based on the first video and the pose of the image acquisition device when acquiring the first video; and controlling the image acquisition equipment to perform complementary acquisition on the target equipment based on the complementary acquisition area to obtain a second video.

When the data processing device controls the image capturing device to capture the first video, if a preset area of the target device contains a device structure to be labeled, for example, the corresponding control image capturing device captures an image of the preset area to obtain the first video.

Since incomplete capturing may occur when the image capturing device is used to capture an image of the target object, for example, a captured image of a partial area such as a corner position of the target object is lacking, the data processing device may also control the image capturing device to capture the second video, for example.

In a specific implementation, when determining that there is a complementary acquisition area to be complementarily acquired corresponding to the target device based on the first video and a pose when the image acquisition device acquires the first video for a case where the data processing device controls the image acquisition device to acquire the second video, for example, the following manner may be adopted: performing three-dimensional reconstruction on the target equipment based on the first video and the pose of the image acquisition equipment when acquiring the first video to generate a three-dimensional entity model of the target equipment; and determining a complementary collection area to be subjected to complementary collection corresponding to the target equipment based on the three-dimensional solid model.

Specifically, since the pose of the image capture device when capturing the first video can be determined, and the plurality of dense cloud points can be determined in the first video after the image capture device captures the target device, the target device can be three-dimensionally reconstructed by using the first video and the pose of the image capture device when capturing the first video, so as to generate a three-dimensional solid model of the target device.

In a specific implementation, when determining the three-dimensional solid model corresponding to the target device, for example, the corresponding three-dimensional solid model may be determined for the target device according to dense point cloud points in the first video and a pose of the image acquisition device when acquiring the target device.

When the pose of the image capturing device when capturing the first video is obtained, for example, relevant data of an Inertial Measurement Unit (IMU) of the image capturing device when capturing the first video may be obtained. For example, in the inertial measurement unit IMU of the image capturing device, for example, three single-axis accelerometers and three single-axis gyroscopes may be included, where the accelerometers may detect an acceleration of the image capturing device when capturing the first video in the target device, and the gyroscopes may detect an angular velocity of the image capturing device when capturing the first video in the target device. Therefore, the pose of the image acquisition equipment when acquiring the first video can be accurately determined by acquiring the relevant data of the inertial measurement unit IMU in the image acquisition equipment.

Here, in a specific application, taking a camera on a mobile phone as an example, since the camera is directly deployed on the mobile phone, the mobile phone can directly acquire related data of the camera, that is, related data of the image acquisition device can be directly acquired by the data processing device, and does not need to depend on other network connections or data transmission modes such as wireless transmission.

When the data processing device determines the three-dimensional solid model, for example, at least one algorithm of Simultaneous Localization And Mapping (SLAM) And real-time dense reconstruction may be used. For example, when the image capturing device captures the first video, the three-dimensional solid model covering the target device may be gradually generated as the image capturing device gradually moves to capture the first video; or after the image acquisition equipment finishes acquiring the first video, generating a three-dimensional entity model corresponding to the target equipment by using the obtained complete first video.

In another embodiment of the present disclosure, a specific embodiment of generating a three-dimensional solid model for a target device by using a SLAM algorithm and a real-time dense reconstruction algorithm is further provided. The image acquisition equipment selects two cameras arranged on the mobile phone; the two cameras are arranged on the mobile phone at a preset pose so as to acquire a complete video corresponding to the target equipment. Here, the data acquisition device is a mobile phone, that is, the execution subject in the following steps is a mobile phone, and referring to fig. 3, a flowchart corresponding to a specific embodiment of generating a three-dimensional solid model provided in the embodiment of the present disclosure is shown, where:

s301: and acquiring two videos with two cameras acquiring time synchronization in real time.

Wherein, the two videos respectively comprise a plurality of frames of video frame images. Because the two cameras collect two videos with synchronous time in real time, timestamps of multi-frame video frame images respectively included in the two videos respectively correspond to each other.

In addition, the precision of the time stamp and the collection frequency when collecting the video frame images in the video can be determined according to the specific instrument parameters of the two cameras. For example, setting the time stamp of the video frame image to be accurate to nanosecond; and when the video frame images in the video are acquired, the acquisition frequency is not lower than 30 hertz (Hz).

S302: and acquiring related data of the inertial measurement unit IMU when the two cameras respectively acquire videos.

Taking any one of the two cameras as an example, when the camera captures a video frame image in a video, the camera can correspondingly observe and acquire related data of an inertial measurement unit IMU between two adjacent video frames and a timestamp when the related data is acquired. Specifically, a corresponding mobile phone coordinate system (which may be composed of an X axis, a Y axis, and a Z axis, for example) may also be determined for the camera to determine relevant data of the inertial measurement unit IMU on the mobile phone coordinate system, such as acceleration and angular velocity under the X axis, the Y axis, and the Z axis of the mobile phone coordinate system.

In addition, the time stamp for acquiring the relevant data of the inertial measurement unit IMU can be determined according to the specific instrument parameters of the two cameras. For example, it may be determined that the observation frequency for acquiring the relevant data of the inertial measurement unit IMU is not lower than 400 Hz.

S303: and determining the poses of the two cameras in the world coordinate system based on the relevant data of the inertial measurement unit IMU.

Specifically, since the coordinate system transformation relationship between the mobile phone coordinate system and the world coordinate system can be determined, after the relevant data Of the inertial measurement unit IMU is acquired, the poses Of the two cameras in the world coordinate system can be determined according to the coordinate system transformation relationship, for example, the poses can be expressed as 6-Degree Of Freedom (6-Degree Of Freedom, 6DOF) poses, which is not described herein again in detail.

For the above S301 to S303, when the SLAM algorithm is adopted, since the video frame images in the video are all depth images, the 6DOF pose of the image acquisition device can be accurately solved by the processing steps of image processing, key point extraction, key point tracking, and establishment of the association relationship between the key points, that is, the acquisition and calculation of the 6DOF pose of the image acquisition device in real time are realized; and moreover, the coordinates of the dense point cloud points corresponding to the target device can be obtained.

When the video frame images in the video are processed, the key frame images can be determined in the corresponding multi-frame video frame images in the video, so that the SLAM algorithm is ensured to have enough processing data, the calculation amount is reduced, and the efficiency is improved. The specific manner of determining the key frame image from the video can be referred to the following description of S102, and will not be described in detail here.

In this way, the key frame image map can be stored in the background of the SLAM algorithm, so that after the image acquisition device is controlled to return to the acquired position again, the two frames of video frame images at the position can be compared to perform loop detection on the image acquisition device, and the positioning accumulated error of the image acquisition device under long-time and long-distance operation can be corrected.

S304: and respectively acquiring a key frame image in the video and the pose of the camera by the camera, and processing the key frame image and the pose of the camera as input data of a real-time dense reconstruction algorithm.

For example, for a video acquired by any camera, after determining a new key frame image in the video by using the above S201 to S203, all currently obtained key frame images and the pose of the camera corresponding to the new key frame image are used as input data of the real-time dense reconstruction algorithm.

Before a new key frame image is obtained, for the transmitted key frame image, when the key frame image is used as input data of the real-time dense reconstruction algorithm, the pose of the corresponding camera is used as the input data to be input into the real-time dense reconstruction algorithm, so that the key frame image can not be input repeatedly when the new key frame image is input.

S305: and processing the input data by using a real-time dense reconstruction algorithm to obtain a three-dimensional solid model corresponding to the target equipment.

Illustratively, the resulting three-dimensional solid model may include, for example, a dense point cloud that may be shown in a preset color. In generating the three-dimensional solid model, the dense point cloud may be updated, for example, as the video is captured. The updating frequency can be determined according to the input frequency of the key frame images and the pose of the camera when the real-time dense reconstruction algorithm is input.

For the above S304 to S305, when the real-time dense reconstruction algorithm is adopted, the new keyframe image may be used to estimate a dense depth map of the scene three-dimensional model, and the corresponding pose of the camera is used to fuse the dense depth map into the three-dimensional solid model, so as to obtain the scene three-dimensional model after the target device is completely acquired. In a possible case, for the processed key frame image, by using the pose of the image capturing device corresponding to the key frame image and the pose of the image capturing device corresponding to the new key frame image adjacent to the key frame image, it can be determined whether the pose of the image capturing device at the time of capturing the target device is adjusted. If the pose is not adjusted, correspondingly continuing to carry out real-time dense reconstruction on the target equipment to obtain a three-dimensional entity model; and if the pose is adjusted, correspondingly adjusting the dense depth map according to the pose adjustment, so as to obtain an accurate three-dimensional solid model.

After the three-dimensional solid model corresponding to the target device is generated, for example, the complementary mining area of the target device may be determined by determining whether the three-dimensional solid model can completely express the device structure installed at the preset area of the target device.

In a possible case, if it is determined that the three-dimensional solid model can completely express the equipment structure installed in the preset area of the target equipment, it is determined that no complementary acquisition area exists, and it is determined that the first video acquired by the image acquisition equipment for acquiring the image of the target equipment is the required video.

In another possible case, if it is determined that the three-dimensional solid model cannot completely express the equipment structure installed at the preset area of the target equipment, it is determined that the complementary mining area exists. In this case, it is determined to perform data complementary acquisition, and the second video is acquired to supplement the complementary acquisition region in the target apparatus, in which image acquisition is not completed, with the second video. Here, the second video is also the complementary video.

In a specific implementation, when the data processing device determines whether data complementary collection needs to be performed on a complementary collection region in the target device by using the three-dimensional solid model, for example, the following two ways (a1) or (a2) may be used:

(A1) the method comprises the following steps Detecting whether an area which is not modeled completely exists in the three-dimensional solid model or not based on three-dimensional position information of each dense point cloud point in the three-dimensional solid model in the target equipment; and if the area which is not modeled completely exists, determining the area which is not modeled completely as the complementary mining area.

In this case, when the three-dimensional solid model is generated, the three-dimensional solid model includes a plurality of dense point cloud points, each dense point cloud point corresponds to the three-dimensional position information of the preset region corresponding to the target device, so that by determining the three-dimensional position information of the dense point cloud points in the three-dimensional solid model, a region in which the dense point cloud points are not distributed in the three-dimensional solid model can be determined, and whether the region is not collected or not can be correspondingly determined. For example, for a preset region at a corner position of a target device, due to the influence of a shooting angle and the like, a video frame image at the corner position may not be acquired after one image acquisition, and therefore, the generated three-dimensional solid model has a defect of the region, that is, the three-dimensional solid model does not have a dense point cloud point corresponding to the corner position. At this time, the area where the corner position is located may be taken as the complementary mining area of the uncompleted modeling.

And under the condition that the complementary mining area exists, determining that data complementary mining needs to be carried out on the complementary mining area.

Specifically, after determining that data supplementary mining needs to be performed on a supplementary mining area, the data processing device may control the image acquisition device to perform supplementary mining on the supplementary mining area based on the current pose of the image acquisition device and the corresponding position of the supplementary mining area in the three-dimensional solid model in the following manner, so as to obtain the second video.

For example, in the case of data complementary acquisition by using a robot equipped with an image acquisition device, the data processing device may determine the position of the robot in the target device by using the current pose of the image acquisition device. Furthermore, by utilizing the position of the complementary acquisition area in the three-dimensional solid model, the data processing equipment can determine the movement strategy of the robot, and the image acquisition equipment acquires the complementary acquisition area again by relying on the movement of the robot in the target equipment. Therefore, the robot can be controlled to efficiently and directly move to the position where the complementary acquisition area can be acquired in the target equipment, and the image acquisition efficiency is correspondingly higher.

In addition, if the user holds the image acquisition equipment to perform data complementary acquisition, the data processing equipment can similarly send related prompt information for complementary acquisition of the complementary acquisition area to the user according to the current pose of the image acquisition equipment and the position of the complementary acquisition area in the three-dimensional solid model. Illustratively, the related prompt information may include at least one of voice prompt information, text prompt information, and image prompt information, for example. For example, a voice prompt "move up 10 centimeters with an area of complementary acquisition," or a text prompt "please move up about 10 centimeters to the next area of complementary acquisition," or an image prompt, such as when presenting a video and/or three-dimensional solid model to the user, instructing the user to move the image acquisition device 5 centimeters with a labeled arrow, may be issued to the user.

In this case, the video and/or the three-dimensional solid model may be presented, for example, in a graphical display interface of the data processing device when the video and/or the three-dimensional solid model is presented to the user. In a possible case, the video and/or the three-dimensional solid model can be displayed directly by using a corresponding graphical display interface of the mobile phone or a special acquisition device, such as a mobile phone screen or a display screen connected with the special acquisition device. Specifically, the determination may be performed according to actual situations, and details are not described herein.

Therefore, the operation of determining the complementary mining area can be automatically completed, the complementary mining area can be determined quickly, and the efficiency is high; and because the user does not need to determine the complementary mining area, the technical requirements on the working personnel can be reduced, and the working personnel can be assisted to complete related data labeling tasks more conveniently.

(A2) The method comprises the following steps Displaying the three-dimensional solid model; and determining the triggered region as the complementary mining region in response to the triggering operation of the user on any region in the three-dimensional solid model.

In this case, the three-dimensional solid model may be presented to the user in a graphical user interface on the data processing device, which may be specifically referred to the description in (a1), and will not be described herein again.

In a possible case, when the three-dimensional solid model is built, there may be a case that less dense point cloud points correspond to a partial region determined by using the first video, so that the modeling result of the partial region is inaccurate. Therefore, the adoption of the automatic detection method for the three-dimensional solid model may omit the supplementary collection of the partial region, and further cause the inaccuracy of the three-dimensional model corresponding to the partial region in the obtained three-dimensional solid model. By means of displaying the three-dimensional solid model to the user, the user can flexibly select the region needing to be supplemented and collected through a mode of viewing the three-dimensional solid model, the region needing to be collected more carefully is selected by the user, or the region where the three-dimensional solid model cannot be accurately established by utilizing the first video can be supplemented more completely and clearly, so that the three-dimensional solid model obtained after the supplementary collection is more complete, and the actual precision requirement of the user and the actual labeling requirement of the user are met.

Here, the manner of controlling the image capturing device to perform the supplementary capturing on the target device to obtain the second video may be referred to the description in (a1) above, and details thereof are also omitted here.

In another embodiment of the present disclosure, a specific embodiment of data complementary collection is further provided. In this embodiment, the data processing device is, for example, a mobile phone. The mobile phone collects videos through image collecting equipment carried on the mobile phone, and displays related videos and three-dimensional entity models for users on a graphical display interface on the mobile phone. Specifically, referring to fig. 4, a flowchart of a specific embodiment when performing data complementary collection according to an embodiment of the present disclosure is shown, where an execution subject in the specific embodiment is a mobile phone, where:

s401: and displaying a video obtained by image acquisition of the target equipment by the image acquisition equipment on the graphical display interface.

S402: and in response to the dragging action of the user on the video on the graphical display interface, determining a three-dimensional solid model corresponding to the video frame image by utilizing the video frame image in the video.

S403: and displaying the three-dimensional entity model for video superposition and rendering images on a graphical display interface.

In steps S401 to S403, when displaying a video, a rendered image shown in a certain color with transparency may be superimposed on the video frame image, for example. Referring to fig. 5, a schematic diagram of displaying a video frame image on a graphical display interface according to an embodiment of the present disclosure is provided. When the user drags the frame of video frame image, the collected image and the grey rendering image which are displayed on the graphical display interface and correspond to the front surface of the outer side of the cabinet are displayed.

In addition, after the video frame image is subjected to data processing, a three-dimensional solid model corresponding to the video frame image can be obtained and shown on a graphical display interface. Wherein, the rendering image is erased for establishing the area of the three-dimensional solid model. Referring to fig. 6, a schematic diagram of a video frame image and a three-dimensional solid model displayed on a graphical display interface according to an embodiment of the present disclosure is shown; in the figure, a three-dimensional solid model is shown by a dotted line. In addition, in fig. 6, it may be determined that a corresponding three-dimensional solid model is not established in the complementary acquisition region according to the position of the rendering image which is not erased, that is, the illustrated complementary acquisition region.

S404: and responding to the confirmation operation of the user according to the complementary mining area shown by the graphical display interface, and determining whether to perform data complementary mining on the complementary mining area.

In this step, the user may determine that the complementary acquisition region does not complete image acquisition according to the shown complementary acquisition region, and may determine whether to perform complementary acquisition on the region according to the actual data acquisition requirement. In a possible situation, if the user determines that the supplementary mining area needs to be supplemented according to the shown supplementary mining area, the user can correspondingly trigger a determination button corresponding to the supplementary mining, and the confirmation operation of the user represents that the supplementary mining area needs to be supplemented. In another possible case, if the user determines that the additional mining of the additional mining area is not required according to the shown additional mining area, the user may correspondingly trigger a determination button for determining no error, and the confirmation operation used at this time indicates that the additional mining of the additional mining area is not required.

For example, according to the video frame image and the complementary acquisition region shown in the graphical display interface in fig. 6, the user may determine a position corresponding to the complementary acquisition region, that is, a position where the bar two-dimensional code is located. In response to the fact that the bar-shaped two-dimensional code needs to be subjected to data annotation, the data processing equipment controls the image acquisition equipment to perform complementary acquisition on a complementary acquisition area where the bar-shaped two-dimensional code is located again because the bar-shaped two-dimensional code is not subjected to complete image acquisition currently, namely the image acquisition equipment finishes acquisition of a second video; in response to determining that data annotation is not required for the bar two-dimensional code, the data processing device may continue to establish the three-dimensional solid model for the other regions that may be shown in the video, that is, the data processing device does not need to control the image acquisition device to acquire the second video.

For the above S102, after determining the video by using the first video, or determining the video by using the first video and the second video, the key frame image may be determined from the video.

Specifically, when determining a key frame image from a video, for example, but not limited to, any of the following two ways (B1) and (B2) may be adopted:

(B1) the method comprises the following steps And determining a preset number of key frame images directly according to the number of video frame images contained in the video and the actual data annotation requirement.

For example, when a video includes 100 frames of video frame images and it is determined that annotation is performed on 10 frames of key frame images, in the case that annotation data in all 100 frames of video frame images in the video can be effectively and accurately determined, the preset number of key frame images may be determined to be 10 frames, and the 10 frames of key frame images are determined at the same frame number interval in the 100 frames of video frame images, for example, 10 frames of key frame images are determined, which are the 1 st frame, the 11 th frame, the 21 st frame, … …, the 81 th frame, and the 91 th frame.

Therefore, the key frame images can be determined from the multi-frame video frame images contained in the video more easily and conveniently, and the number of the key frame images can meet the requirement of subsequent actual data annotation.

(B2) The method comprises the following steps In response to a user selection of a video frame image in a video, a key frame image in the video is determined.

In a specific implementation, for example, a part of video frames selected by a user can be used as a key frame image in a video in response to a user selecting operation on the part of video frames.

Illustratively, when a video is presented to a user, a prompt for a selected key frame image may be displayed to the user, for example. Specifically, for example, a specific operation such as a long press, a double click, or the like by the user may be performed to select a video frame image in the video, and the selected video frame image may be used as the key frame image. For example, the graphic display interface may present prompt information to the user, such as presenting the user with a message containing the text "please long press to select the frame of video frame image", and in the case that the user long presses any frame of video frame image in the video, the frame of video frame image is taken as the key frame image. In this way, the user can more flexibly select the key frame image from the video.

In a possible situation, if the distribution of each device structure on the target device is not concentrated, the device structure does not exist in continuous multi-frame video frames but appears in other frame video frames in a concentrated manner, so that the situation that the video frames without the device structures are used as key frames can be avoided by adopting a mode of manually selecting video frame images, and the efficiency of subsequently utilizing the key frames for data annotation is improved. In another possible situation, if there is a situation that part of the video frames in the video frames are unclear or data is damaged, the situation that the video frames are used as key frames can also be avoided by adopting a mode of manually selecting the video frame images.

For the above S103, in the case of determining the key frame image in the video, for example, attribute labeling may be performed on the target device in the key frame image. Specifically, when performing attribute labeling on the target device in the key frame image, for example, a manner including, but not limited to, any one of the following (C1) and (C2) may be adopted:

(C1) the method comprises the following steps Generating a preview image based on the key frame image, and displaying the preview image; wherein the preview image has a resolution lower than the resolution of the key frame image; responding to the trigger of any position in the preview image, and showing a property configuration control corresponding to the position; receiving the annotation data input from the property configuration control.

The size of the key frame image is relatively large, and the requirement of a user on the definition of the image is not high when the user performs data annotation, so that the size of the key frame image in the video can be reasonably reduced, for example, a preview image corresponding to the key frame image is generated in a resolution reduction mode, and the preview image is displayed to the user through a graphical display interface on the data processing device, so that the data transmission amount and the data processing amount after the data annotation is performed on the preview image are reduced while the user can clearly identify each device structure in the preview image through the graphical display interface, and the transmission efficiency and the data processing efficiency are further improved.

Specifically, when the key frame image is presented to the user, for example, the annotation control required when the first annotation operation is performed on the key frame image may also be provided to the user at the same time. Illustratively, referring to fig. 7, a schematic diagram of a graphical display interface when displaying a key frame image and labeling a control according to an embodiment of the present disclosure is provided.

Illustratively, the data processing apparatus may determine the location of the data annotation, for example, in response to a user clicking on a graphical display interface. Specifically, for the user, referring to fig. 7, in the key frame image 71, the user may select a position 72 by a click operation, and correspondingly fill in the data annotation area 73 corresponding to the key frame image 71 with the relevant annotation data.

For the data labeling area 73, for example, part of the labeled data types that may be contained therein, such as device name, age, specific function, device person in charge, device manufacturer, device size specification, related text notes, etc., are shown in fig. 7. When data corresponding to different data types is filled, as shown in a data label area 73 in fig. 7, for example, the data may be filled directly by inputting characters, for example, inputting a character of "button 1" in a text input box under the device name, or inputting a character of "device start button" in a text input box under a specific function. For another example, a plurality of selectable input options may be provided to the user, and the data processing apparatus may then present to the user a pull-down menu including a plurality of different ages in response to a click operation of the user on the under-age selection box, and include selection items of "1 year", "2 years", "3 years" in the pull-down menu, so that the user may determine the input data under the age through selection of the selection item.

In a possible case, a custom annotation segment, such as "custom annotation segment 1" shown in the data annotation area 73, can also be included in the data annotation area 73. In response to user editing of the custom annotation segment, a new annotation data type can be determined and a new input box generated. Thus, the flexibility in data annotation is higher. In a possible case, a slider 74 is included in the data annotation region 73 for the user to use because the content displayable on the graphical display interface is limited. In response to the user sliding the screen up and down in the graphical display interface, the slider bar 74 may also display an up-and-down sliding effect to prompt the user of the current progress of the data in the data annotation area. Therefore, by means of sliding the screen, the limitation of the size of the graphical display interface can be eliminated, and more space for writing the annotation data can be provided for the user.

In another possible case, the key frame image 71 may also be part of the data annotation region 73, for example. Illustratively, for example, a selection box corresponding to "whether to use the key frame image as annotation data" may be displayed in the data annotation region 73, and the key frame image may be automatically used as annotation data in response to a user's selection operation on the selection box. Therefore, the original key frame image can be reserved in the annotation data, so that whether the annotation data has annotation errors or not can be checked back and corrected in time by calling the key frame image.

In this way, the data processing device can respond to the marking operation of the user and generate the marking data corresponding to the marking operation.

(C2) The method comprises the following steps Performing preset identification processing on the key frame image to obtain attribute information contained in an object to be identified in the key frame image; the object to be identified is located in a preset area of the target device; determining the position information of the object to be identified in the three-dimensional entity model corresponding to the target equipment based on the position information of the object to be identified in the key frame image and the pose of the image acquisition equipment when acquiring the key frame image; and performing labeling processing on the key frame image based on the object to be recognized, the position information of the object to be recognized in the three-dimensional solid model corresponding to the target equipment and the attribute information contained in the object to be recognized to obtain the labeling data.

Wherein the object to be recognized shown on the key frame image includes at least one of: icons and characters; wherein the icon comprises at least one of: a positioning corner mark and an identification code.

Here, for an object to be recognized shown on the key frame image, it may include, for example, the above-described device structure as an object to be labeled; alternatively, an identifiable object having a device structure different from that of the target device may be included, and data labeling for the entire target device may be performed using the object to be identified, for example.

In addition, the positioning corner mark can also be used for intercepting a region corresponding to the target device from the key frame image, and the region corresponding to the target device is also used as one of the labeling information.

Illustratively, referring to fig. 8, a schematic diagram of another graphical display interface for displaying a key frame image and labeling a control according to an embodiment of the present disclosure is provided. Specifically, two objects to be recognized in the key frame image 81 are shown in fig. 8, including the recognition two-dimensional code corresponding to the dashed line labeling box 82 and the text "button 1" corresponding to the dashed line labeling box 83.

Here, when performing attribute labeling, since the video acquired by image capturing the target device includes a key frame image capable of performing attribute labeling, labeling the key frame image may also be regarded as a process of performing attribute labeling on the video.

In a specific implementation, when performing the preset recognition processing on the key frame image to obtain the attribute information included in the object to be recognized in the key frame image, for example, the following two manners (D1) and (D2) may be adopted:

(D1) the method comprises the following steps And carrying out object recognition on the key frame image, and determining an icon corresponding to a preset area of the target equipment in the key frame image.

Specifically, when performing object recognition, for example, semantic segmentation processing may be performed on the key frame image, and an algorithm that may be used may include at least one of the following: convolutional Neural Networks (CNNs) and deep self-attention transform Networks (transformers).

After the object recognition is performed on the key frame image, for example, the device structure type information corresponding to the area may be determined, and for example, after the object recognition is performed on the area corresponding to the dashed line marking box 82 shown in fig. 8, it may be determined that the area includes a bar two-dimensional code.

(D2) The method comprises the following steps And performing optical character OCR recognition on the key frame image, and determining characters corresponding to a preset area of the target equipment in the key frame image.

Specifically, after performing the optical character OCR recognition, for example, the text information corresponding to the area may be determined, and after performing the object recognition on the area corresponding to the dotted line marking box 83 shown in fig. 8, the text "button 1" marked in the area may be determined.

In addition, in a possible case, before the object recognition or the optical character OCR recognition is performed, the recognition type can be selected in the recognition mark area 84 accordingly. Referring to fig. 8, an area 85 for selecting the identification type is also shown in the identification label area 84, the area 85 includes a selection button for object identification and a selection button for character identification, and fig. 8 is a schematic diagram of the selection button for selecting object identification.

Illustratively, for example, in response to a user selecting "object recognition", the device structure in the key frame image may be more specifically recognized, specifically including object recognition of various buttons, signal interfaces, and signal display screens in the key frame image 81; alternatively, the text shown in the key frame image may be identified more specifically in response to the user selecting "text recognition".

In another possible scenario, object recognition or optical character OCR recognition may also be performed simultaneously on the key frame image. Therefore, based on the result of the object recognition, the result of the optical character OCR recognition and the position information of the object to be recognized in the key frame image, the object can be correspondingly associated with the characters so as to obtain a more accurate recognition result. For example, for the region 83 shown in the key frame image in fig. 8 and the object 86 to be recognized, the text corresponding to the region 83 may be determined to be "button 1" by text recognition, and the object 86 to be recognized adjacent to the region 83 may be determined to be a button by object recognition, and then the object 86 to be recognized may be determined to be button 1 by corresponding association.

Here, only some examples of possible portions are listed, and may be specifically determined according to actual situations, and are not described herein again.

In addition, after the preset recognition processing is performed on the key frame image, the recognition result may further include, for example, attribute information included in the object to be recognized. In a possible case, when the object to be recognized includes an object to be labeled, for example, attribute information corresponding to the object to be recognized, such as the service life and the specific function of the two-dimensional code, may be directly called. In another possible case, when the object to be identified is subjected to data tagging for the whole target device, the corresponding attribute information may be correspondingly changed to the attribute information for the whole target device, such as the device name and the service life of the cabinet.

In addition, after the key frame image is identified, the attribute information of the object to be identified can be correspondingly called according to the identification result of the object to be identified and used as a part of the identification result.

For example, after the object 86 to be recognized in fig. 8 is recognized, it may be correspondingly determined that the attribute information corresponding to the object to be recognized includes the device name, the service life, and the specific function, and then the attribute information corresponding to the object 86 to be recognized is correspondingly displayed in the recognition mark area 84.

While determining the attribute information contained in the object to be identified in the key frame image by using any one of the above-mentioned manners (D1) and (D2), the position information of the object to be identified in the three-dimensional solid model corresponding to the target device may also be determined based on the position information of the object to be identified in the key frame image and the pose of the image acquisition device when acquiring the key frame image. The position information of the object to be identified in the key frame image and the pose of the image acquisition equipment when acquiring the key frame image can be directly determined, so that the position information of the object to be identified in the three-dimensional solid model can be back calculated.

Therefore, due to the fact that the target equipment is likely to have more corresponding annotation data, by means of the way of associating the annotation data with the three-dimensional solid model, when other key frame images are used for annotating other equipment structures on the target equipment, the three-dimensional solid model moves along with the other key frame images when the other key frame images are dragged, and the associated annotation data can correspondingly move and display in the graphical display interface, so that the situations of label missing and label missing caused by the fact that more annotation data exist in adjacent areas are less guaranteed, and the data annotation efficiency is improved while the digital asset missing is reduced.

Under the condition that the position information of the object to be recognized in the three-dimensional solid model corresponding to the target device and the attribute information contained in the object to be recognized are determined, the labeling processing of the key frame image can be completed, so that the labeling data can be obtained.

After the synchronization of the labeling data of the video frame images of all the non-key frame images in the video is completed, the video with the completed data labeling can be obtained.

In another embodiment of the present disclosure, a specific embodiment of the data processing device during data annotation is further provided. Referring to fig. 9, a flowchart of a specific embodiment of the present disclosure is provided when performing data annotation, where:

s901: and opening a data labeling environment.

Specifically, the data annotation method provided by the embodiment of the present disclosure may be applied to an Application (APP). After the user opens the APP, the data annotation environment can be correspondingly opened, for example, an entrance for data acquisition and annotation is provided; after the portal is started, a Software Development Kit (SDK) is called.

S902: and the data processing equipment acquires the station data and determines an acquisition task list corresponding to the station.

Wherein the site is, for example, a site where many devices are installed; the station may include, but is not limited to, at least one machine room, and at least one of indoor control cabinet equipment deployed in the machine room, a tower installed on the ceiling of the machine room, and an outdoor control cabinet.

After the SDK is called, the site data can be acquired correspondingly. One station may also include a plurality of different target devices, and stations corresponding to different target devices may be different, for example, a station 1 includes a rack, and a station two includes a rack 2. Therefore, when data annotation is performed, for example, multiple target devices in one site can be annotated.

Specifically, corresponding to different sites, an Identifier (ID) corresponding to a site may be used as a unique identification identifier, a site for data annotation is determined by inputting the identifier and scanning a two-dimensional code of the site, and data required by an asset platform and a generation platform corresponding to the site is transmitted to create a current collection task list. When data annotation is performed on the site, the data to be annotated on the site can be determined according to the relevant tasks in the collection task list.

S903: and the data processing equipment determines the latest data labeling attribute packet through the network request attribute platform.

Specifically, after the current collection task list is created, the attribute platform may be requested through the network to determine whether the data annotation attribute package is updated. When there is an updated data tagging attribute package, for example, the latest data tagging attribute package may be downloaded in an APP update manner, so as to call data such as tagging attributes in the data tagging attribute package in a specific process of data tagging in the following. In the case that there is no updated data tagging attribute packet, S904 may be continuously executed, that is, the collection process is normally performed.

S904: and the data processing equipment controls the image acquisition equipment to acquire images of the target equipment to obtain videos.

For a specific description of this step, reference may be made to the above description of S101, and details are not repeated here.

S905: and the data processing equipment performs data annotation on the video frame image contained in the video by using the data annotation attribute packet to obtain attribute annotation data of the video.

For a specific description of this step, reference may be made to the above description of S102 and S103, and details are not repeated here again.

Here, in steps S904 and S905, when performing image acquisition and data annotation, both of them can be performed independently of network connection, for example, by using an image acquisition device to perform image acquisition in an offline state and/or perform attribute annotation in an offline state using a data annotation attribute package.

S906: the data processing equipment confirms whether the attribute marking data are correct or not; if yes, go to step S907; if not, go to step S908;

s907: and uploading the attribute marking data by the data processing equipment.

When uploading the attribute marking data, for example, the network connection may be waited, and the attribute marking data may be uploaded sequentially after the network connection is successful. The specific manner of uploading the attribute labeling data can be referred to the following description of S104, and will not be described in detail here.

S908: the data processing equipment determines whether an image acquisition error exists; if yes, return to step S904; if not, the process returns to step S905.

Through the process, the data acquisition process of the target equipment can be realized.

For the above S104, in another embodiment of the present disclosure, a pose when the image capturing device captures the video may also be obtained; in addition, when generating the target capture data based on the attribute labeling data and the video, for example, the following manner may be adopted: generating the target acquisition data based on the attribute labeling data, the video, and the pose.

Specifically, for example, the attribute labeling data, the video and the pose may be generated into corresponding target acquisition data under different timestamps according to at least one of the timestamps; or, the attribute marking data, the video and the pose can be directly subjected to data packing according to at least one of the timestamps to obtain a collection data packet which needs to be submitted finally, and the collection data packet is used as target collection data.

In addition, in a possible case, since the number of video frames included in a video may be large, and the size of a video frame image in the video is large, the amount of data to be transmitted when data transmission is performed is also large, and therefore, if a complete video is uploaded, it takes a long time, which causes a problem of low transmission efficiency. Therefore, the frame extraction processing can be carried out on the acquired original video, and the video data with the extracted partial frames is used as partial data in the target acquisition data, so that the data volume required to be transmitted is effectively reduced, and the data transmission efficiency is improved.

In another embodiment of the present disclosure, a specific embodiment of data labeling for a cabinet by using a mobile phone is further provided; in this embodiment, the data processing device is a mobile phone, the image capturing device is a camera on the mobile phone, and since the camera is disposed on the mobile phone, the mobile phone is directly used as an execution subject for explanation, and the steps of the image capturing device when performing image capturing are explained. The data annotation method provided by the embodiment of the present disclosure is applied to the embodiment, and can be divided into three stages, including stage I: data acquisition phase, phase II: a data annotation phase, and a phase III: and (5) a data uploading stage. Referring to fig. 10, a flowchart corresponding to an embodiment of labeling data of a cabinet according to an embodiment of the present disclosure is shown, wherein,

for phase I: the data acquisition stage comprises the following steps S1001 to S1014. Wherein the content of the first and second substances,

s1001: and entering a cabinet for collection.

S1002: the acquisition is started.

In step S1002, for example, the image capturing environment and the data annotation environment may be turned on accordingly.

S1003: the image acquisition equipment acquires videos in real time.

In this step S1003, the execution subject is an image capture device, that is, a camera on a mobile phone.

S1004: data is continuously generated.

In this step S1004, for example, a real-time captured video acquired when the video is captured in real time and a pose of a camera on a mobile phone may be generated; the generated data, namely the data a marked in fig. 10, the video collected in real time and the pose.

S1005: SLAM real-time calculation.

S1006: and performing real-time dense reconstruction.

S1007: and generating a visual reconstruction result in real time according to the SLAM real-time calculation and the real-time dense reconstruction result.

S1008: and displaying the real-time preview result of the visualized reconstruction result by the graphical display interface.

In this step S1008, the graphic display interface includes, for example, a graphic display interface configured on a mobile phone.

S1009: and finishing the collection.

In this step S1009, for example, a captured video and a pose may be generated; the generated data, namely data b labeled in fig. 10.

S1010: the acquisition coverage area is determined manually.

S1011: determining whether additional mining is needed; if yes, go to step S1012; if not, go to step S1014.

S1012: and confirming the complementary mining area.

S1013: starting the supplementary mining; it jumps to step S1002.

In this step S1013, a captured video set and a pose set may be generated accordingly, for example, from the captured video and the pose of the data b; the generated data, i.e., the data labeled in fig. 10 c.

S1014: and finishing the collection.

For stage II: a data labeling stage, comprising the following steps S1015 to S1018; wherein the content of the first and second substances,

s1015: and responding to the operation of viewing the preview video set by the staff, and confirming the ending of the acquisition.

S1016: and performing data annotation on the key frame image.

S1017: and marking the equipment structure with the completed data annotation.

In step S1017, since there are possibly more objects to be labeled on the cabinet, marking the device structure with the data label already completed can help determine the next object to be labeled more intuitively.

S1018: and synchronizing the annotation data to the video.

In this step S1018, for example, annotation data d, which is annotated in fig. 10, matching the video may be obtained.

For stage III: a data uploading stage, which comprises the following steps S1019-S1019; wherein the content of the first and second substances,

s1019: and determining the data needing to be uploaded in response to the selection operation of the user on the annotation data needing to be uploaded.

S1020: and modifying the file name according to at least one of the site, the time and the target equipment.

The file described here may be, for example, a file including storage label data, a site, a video, and the like.

S1021: and performing frame extraction on the acquired video.

In this step S1021, for example, data shown in the figure can be obtained: e. and (5) extracting the video, the pose and the marking data after the frame extraction.

S1022: and packaging all the files.

S1023: and determining the uploading progress.

S1024: and determining that the uploading is finished.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

Based on the same inventive concept, a data labeling device corresponding to the data labeling method is also provided in the embodiments of the present disclosure, and as the principle of solving the problem of the device in the embodiments of the present disclosure is similar to the data labeling method in the embodiments of the present disclosure, the implementation of the device can refer to the implementation of the method, and repeated details are not repeated.

Referring to fig. 11, a schematic diagram of a data annotation device provided in an embodiment of the present disclosure is shown, where the device includes: a first obtaining module 111, a determining module 112, a generating module 113, and a processing module 114; wherein the content of the first and second substances,

the first obtaining module 111 is configured to obtain a video obtained by performing image acquisition on a preset area of a target device by using an image acquisition device; a determining module 112, configured to determine, from the video, a key frame image including the preset region; a generating module 113, configured to generate attribute annotation data of the video in response to annotation data obtained by performing annotation processing on the preset region in the key frame image; and the processing module 114 is configured to generate target collection data based on the attribute labeling data and the video.

In an optional implementation manner, when the generation module 113 performs annotation processing on the preset region in the key frame image to obtain annotation data, the generation module is configured to: generating a preview image based on the key frame image, and displaying the preview image; wherein the preview image has a resolution lower than the resolution of the key frame image; responding to the trigger of any position in the preview image, and showing a property configuration control corresponding to the position; receiving the annotation data input from the property configuration control.

In an optional implementation manner, when the generation module 113 performs annotation processing on the preset region in the key frame image to obtain annotation data, the generation module is configured to: performing preset identification processing on the key frame image to obtain attribute information contained in an object to be identified in the key frame image; the object to be identified is located in a preset area of the target device; determining the position information of the object to be identified in the three-dimensional entity model corresponding to the target equipment based on the position information of the object to be identified in the key frame image and the pose of the image acquisition equipment when acquiring the key frame image; and performing labeling processing on the key frame image based on the object to be recognized, the position information of the object to be recognized in the three-dimensional solid model corresponding to the target equipment and the attribute information contained in the object to be recognized to obtain the labeling data.

In an alternative embodiment, the object to be identified comprises an icon; wherein the icon comprises at least one of: positioning corner marks and identification codes; the generating module 113 is configured to, when performing preset identification processing on the key frame image to obtain attribute information included in an object to be identified in the key frame image,: and carrying out object recognition on the key frame image, and determining an icon corresponding to a preset area of the target equipment in the key frame image.

In an optional embodiment, the object to be recognized includes a text; the generating module 113 is configured to, when performing preset identification processing on the key frame image to obtain attribute information included in an object to be identified in the key frame image,: and performing optical character OCR recognition on the key frame image, and determining characters corresponding to a preset area of the target equipment in the key frame image.

In an optional implementation manner, when acquiring a video obtained by image capturing a preset area of a target device by using an image capturing device, the first acquiring module 111 is configured to: controlling the image acquisition equipment to acquire an image of a preset area of the target equipment to obtain a first video; determining a complementary acquisition region corresponding to the target device and to be subjected to complementary acquisition based on the first video and the pose of the image acquisition device when acquiring the first video; and controlling the image acquisition equipment to perform complementary acquisition on the target equipment based on the complementary acquisition area to obtain a second video.

In an optional implementation manner, when determining that there is a complementary acquisition region to be complementarily acquired corresponding to the target device based on the first video and a pose of the image acquisition device when acquiring the first video, the first acquisition module 111 is configured to: performing three-dimensional reconstruction on the target equipment based on the first video and the pose of the image acquisition equipment when acquiring the first video to generate a three-dimensional entity model of the target equipment; and determining a complementary collection area to be subjected to complementary collection corresponding to the target equipment based on the three-dimensional solid model.

In an optional implementation manner, when the first obtaining module 111 controls the image capturing device to perform the complementary capture on the target device based on the complementary capture area to obtain the second video, the first obtaining module is configured to: and controlling the image acquisition equipment to perform complementary acquisition on the complementary acquisition region based on the current pose of the image acquisition equipment and the corresponding position of the complementary acquisition region in the three-dimensional solid model to obtain the second video.

In an optional implementation manner, when determining, based on the three-dimensional solid model, that there is a complementary acquisition region to be complementarily acquired corresponding to the target device, the first obtaining module 111 is configured to: detecting whether an area which is not modeled completely exists in the three-dimensional solid model or not based on three-dimensional position information of each dense point cloud point in the three-dimensional solid model in the target equipment; and if the area which is not modeled completely exists, determining the area which is not modeled completely as the complementary mining area.

In an optional implementation manner, when determining, based on the three-dimensional solid model, that there is a complementary acquisition region to be complementarily acquired corresponding to the target device, the first obtaining module 111 is configured to: displaying the three-dimensional solid model; and determining the triggered region as the complementary mining region in response to the triggering operation of the user on any region in the three-dimensional solid model.

In an optional implementation manner, the data annotation apparatus further includes a second obtaining module 115, configured to: acquiring the pose of the image acquisition equipment when acquiring the video; the processing module 114, when generating target capture data based on the attribute annotation data and the video, is configured to: generating the target acquisition data based on the attribute labeling data, the video, and the pose.

The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.

An embodiment of the present disclosure further provides a computer device, as shown in fig. 12, which is a schematic structural diagram of the computer device provided in the embodiment of the present disclosure, and includes:

a processor 10 and a memory 20; the memory 20 stores machine-readable instructions executable by the processor 10, the processor 10 being configured to execute the machine-readable instructions stored in the memory 20, the processor 10 performing the following steps when the machine-readable instructions are executed by the processor 10:

acquiring a video obtained by acquiring an image of a preset area of target equipment by using image acquisition equipment; determining a key frame image comprising the preset area from the video; generating attribute annotation data of the video in response to annotation data obtained by performing annotation processing on the preset region in the key frame image; and generating target acquisition data based on the attribute labeling data and the video.

The storage 20 includes a memory 210 and an external storage 220; the memory 210 is also referred to as an internal memory, and temporarily stores operation data in the processor 10 and data exchanged with the external memory 220 such as a hard disk, and the processor 10 exchanges data with the external memory 220 through the memory 210.

The specific execution process of the instruction may refer to the steps of the data labeling method described in the embodiments of the present disclosure, and details are not repeated here.

The embodiments of the present disclosure also provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program performs the steps of the data annotation method in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.

The embodiments of the present disclosure also provide a computer program product, where the computer program product carries a program code, and instructions included in the program code may be used to execute the steps of the data labeling method in the foregoing method embodiments, which may be referred to specifically in the foregoing method embodiments, and are not described herein again.

The computer program product may be implemented by hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

The disclosure relates to the field of augmented reality, and aims to detect or identify relevant features, states and attributes of a target object by means of various visual correlation algorithms by acquiring image information of the target object in a real environment, so as to obtain an AR effect combining virtual and reality matched with specific applications. For example, the target object may relate to a face, a limb, a gesture, an action, etc. associated with a human body, or a marker, a marker associated with an object, or a sand table, a display area, a display item, etc. associated with a venue or a place. The vision-related algorithms may involve visual localization, SLAM, three-dimensional reconstruction, image registration, background segmentation, key point extraction and tracking of objects, pose or depth detection of objects, and the like. The specific application can not only relate to interactive scenes such as navigation, explanation, reconstruction, virtual effect superposition display and the like related to real scenes or articles, but also relate to special effect treatment related to people, such as interactive scenes such as makeup beautification, limb beautification, special effect display, virtual model display and the like. The detection or identification processing of the relevant characteristics, states and attributes of the target object can be realized through the convolutional neural network. The convolutional neural network is a network model obtained by performing model training based on a deep learning framework.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus described above may refer to the corresponding process in the foregoing method embodiment, and is not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, and for example, some features may be omitted or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A method for annotating data, comprising:

acquiring a video obtained by acquiring an image of a preset area of target equipment by using image acquisition equipment;

determining a key frame image comprising the preset area from the video;

generating attribute annotation data of the video in response to annotation data obtained by performing annotation processing on the preset region in the key frame image;

and generating target acquisition data based on the attribute labeling data and the video.

2. The data annotation method according to claim 1, wherein the annotation data obtained by annotating the preset region in the key frame image comprises:

generating a preview image based on the key frame image, and displaying the preview image; wherein the preview image has a resolution lower than the resolution of the key frame image;

responding to the trigger of any position in the preview image, and showing a property configuration control corresponding to the position;

receiving the annotation data input from the property configuration control.

3. The data annotation method according to claim 1 or 2, wherein the annotation data obtained by annotating the preset region in the key frame image comprises:

performing preset identification processing on the key frame image to obtain attribute information contained in an object to be identified in the key frame image; the object to be identified is located in a preset area of the target device;

determining the position information of the object to be identified in the three-dimensional entity model corresponding to the target equipment based on the position information of the object to be identified in the key frame image and the pose of the image acquisition equipment when acquiring the key frame image;

and performing labeling processing on the key frame image based on the object to be recognized, the position information of the object to be recognized in the three-dimensional solid model corresponding to the target equipment and the attribute information contained in the object to be recognized to obtain the labeling data.

4. The data annotation method of claim 3, wherein the object to be identified comprises an icon; wherein the icon comprises at least one of: positioning corner marks and identification codes;

the preset identification processing is performed on the key frame image to obtain attribute information contained in an object to be identified in the key frame image, and the preset identification processing comprises the following steps:

and carrying out object recognition on the key frame image, and determining an icon corresponding to a preset area of the target equipment in the key frame image.

5. The data annotation method of claim 3 or 4, wherein the object to be identified comprises a word;

and performing optical character OCR recognition on the key frame image, and determining characters corresponding to a preset area of the target equipment in the key frame image.

6. The data annotation method according to any one of claims 1 to 5, wherein the acquiring a video obtained by image-capturing a preset region of a target device by using an image-capturing device comprises:

controlling the image acquisition equipment to acquire an image of a preset area of the target equipment to obtain a first video;

determining a complementary acquisition region corresponding to the target device and to be subjected to complementary acquisition based on the first video and the pose of the image acquisition device when acquiring the first video;

and controlling the image acquisition equipment to perform complementary acquisition on the target equipment based on the complementary acquisition area to obtain a second video.

7. The data annotation method of claim 6, wherein the determining, based on the first video and a pose of the image capture device when capturing the first video, a complementary capture area corresponding to the target device and having to be subjected to complementary capture includes:

performing three-dimensional reconstruction on the target equipment based on the first video and the pose of the image acquisition equipment when acquiring the first video to generate a three-dimensional entity model of the target equipment;

and determining a complementary collection area to be subjected to complementary collection corresponding to the target equipment based on the three-dimensional solid model.

8. The data annotation method according to claim 6 or 7, wherein controlling the image capture device to perform the complementary capture on the target device based on the complementary capture region to obtain a second video comprises:

and controlling the image acquisition equipment to perform complementary acquisition on the complementary acquisition region based on the current pose of the image acquisition equipment and the corresponding position of the complementary acquisition region in the three-dimensional solid model to obtain the second video.

9. The data annotation method of claim 7, wherein the determining, based on the three-dimensional solid model, a complementary acquisition region corresponding to the target device and having to be acquired additionally comprises:

detecting whether an area which is not modeled completely exists in the three-dimensional solid model or not based on three-dimensional position information of each dense point cloud point in the three-dimensional solid model in the target equipment;

and if the area which is not modeled completely exists, determining the area which is not modeled completely as the complementary mining area.

10. The data annotation method of claim 8, wherein the determining, based on the three-dimensional solid model, a complementary acquisition region corresponding to the target device and having to be acquired additionally comprises:

displaying the three-dimensional solid model;

and determining the triggered region as the complementary mining region in response to the triggering operation of the user on any region in the three-dimensional solid model.

11. The data annotation method of any one of claims 1-10, wherein the method further comprises:

acquiring the pose of the image acquisition equipment when acquiring the video;

generating target collection data based on the attribute labeling data and the video, comprising:

generating the target acquisition data based on the attribute labeling data, the video, and the pose.

12. The data annotation method of any one of claims 1-11, wherein the target device comprises a cabinet; the preset area of the target device comprises at least one of the following: a cabinet inner side, and a cabinet outer side.

13. A data annotation device, comprising:

the first acquisition module is used for acquiring a video obtained by acquiring an image of a preset area of target equipment by using image acquisition equipment;

the determining module is used for determining a key frame image comprising the preset area from the video;

the generating module is used for responding to annotation data obtained by performing annotation processing on the preset area in the key frame image and generating attribute annotation data of the video;

and the processing module is used for generating target acquisition data based on the attribute labeling data and the video.

14. A computer device, comprising: a processor, a memory storing machine readable instructions executable by the processor, the processor for executing the machine readable instructions stored in the memory, the processor performing the steps of the data annotation method of any one of claims 1 to 12 when the machine readable instructions are executed by the processor.

15. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when executed by a computer device, performs the steps of the data annotation method according to any one of claims 1 to 12.