WO2024087067A1 - 图像标注方法及装置、神经网络训练方法及装置 - Google Patents

图像标注方法及装置、神经网络训练方法及装置 Download PDF

Info

Publication number
WO2024087067A1
WO2024087067A1 PCT/CN2022/127769 CN2022127769W WO2024087067A1 WO 2024087067 A1 WO2024087067 A1 WO 2024087067A1 CN 2022127769 W CN2022127769 W CN 2022127769W WO 2024087067 A1 WO2024087067 A1 WO 2024087067A1
Authority
WO
WIPO (PCT)
Prior art keywords
annotated
image
annotation
images
dimensional model
Prior art date
Application number
PCT/CN2022/127769
Other languages
English (en)
French (fr)
Inventor
李虎民
王欢
Original Assignee
北京小米移动软件有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京小米移动软件有限公司 filed Critical 北京小米移动软件有限公司
Priority to PCT/CN2022/127769 priority Critical patent/WO2024087067A1/zh
Publication of WO2024087067A1 publication Critical patent/WO2024087067A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods

Definitions

  • the present disclosure relates to the field of artificial intelligence technology, and in particular to an image annotation method and device, and a neural network training method and device.
  • artificial intelligence makes image processing more accurate and efficient, and realizes functions such as automatic image recognition.
  • the image to be processed can be input into a pre-trained neural network for processing to obtain the result of image processing.
  • a large number of annotated training images are required, and the number of training images will affect the accuracy of the trained neural network.
  • the annotation of training images is mostly done by manual annotation, which is inefficient and the number of annotated images is limited.
  • the embodiments of the present disclosure provide an image annotation method and device, and a neural network training method and device to solve the defects in the related art.
  • an image annotation method comprising:
  • the three-dimensional model is annotated according to the annotation instruction, and the image to be annotated is annotated according to the annotation result of the three-dimensional model.
  • the image to be annotated includes a panoramic image.
  • it further includes:
  • the extended image is annotated according to the annotation result of the panoramic image.
  • the imaging parameters include at least one of the following: field of view, resolution, imaging angle, and noise ratio.
  • the position of the target object is marked in the three-dimensional model.
  • labeling the three-dimensional model according to the labeling instruction further includes:
  • the attributes of the target object are labeled in the three-dimensional model.
  • the set of images to be annotated includes a plurality of images to be annotated that are taken from different viewing angles and are aimed at the target object.
  • the step of labeling the image to be labeled according to the labeling result of the three-dimensional model includes:
  • the annotation result is projected onto the image to be annotated.
  • a neural network training method which uses training images in a training set to train a neural network to be trained, wherein the training images are pre-annotated using the image annotation method described in the first aspect.
  • an image annotation device including:
  • An acquisition module used for acquiring a set of images to be annotated, wherein the set of images to be annotated includes a plurality of images to be annotated collected for a target space;
  • a modeling module used for generating a three-dimensional model of the target space according to the set of images to be annotated
  • the annotation module is used to annotate the three-dimensional model according to the annotation instruction, and annotate the image to be annotated according to the annotation result of the three-dimensional model.
  • the image to be annotated includes a panoramic image.
  • an expansion module is further included, for:
  • the extended image is annotated according to the annotation result of the panoramic image.
  • the imaging parameters include at least one of the following: field of view, resolution, imaging angle, and noise ratio.
  • the annotation module is used to annotate the three-dimensional model according to the annotation instruction, specifically to:
  • the position of the target object is marked in the three-dimensional model.
  • annotation module when used to annotate the three-dimensional model according to the annotation instruction, it is also specifically used to:
  • the attributes of the target object are labeled in the three-dimensional model.
  • the set of images to be annotated includes a plurality of images to be annotated that are taken from different viewing angles and are aimed at the target object.
  • the annotation module when used to annotate the image to be annotated according to the annotation result of the three-dimensional model, it is specifically used to:
  • the annotation result is projected onto the image to be annotated.
  • a neural network training device which uses training images in a training set to train a neural network to be trained, wherein the training images are pre-annotated using the image annotation device described in the third aspect.
  • an electronic device comprising a memory and a processor, wherein the memory is used to store computer instructions executable on the processor, and the processor is used to implement the image annotation method described in the first aspect when executing the computer instructions.
  • a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the method described in the first aspect is implemented.
  • the image annotation method provided by the embodiment of the present disclosure can generate a three-dimensional model of the target space by using the image set to be annotated, by acquiring a plurality of images to be annotated collected for the target space, and finally annotate the three-dimensional model according to the annotation instruction, and annotate the image to be annotated according to the annotation result of the three-dimensional model. That is, by annotating the three-dimensional model once, the annotation of each image to be annotated can be completed, avoiding the sequential annotation of each image to be annotated, thereby improving the efficiency of image annotation and the number of annotated images.
  • FIG1 is a flow chart of an image annotation method shown in an exemplary embodiment of the present disclosure
  • FIG2 is a schematic diagram of the structure of an image annotation device shown in an exemplary embodiment of the present disclosure
  • Fig. 3 is a structural block diagram of an electronic device according to an exemplary embodiment of the present disclosure.
  • first, second, third, etc. may be used in the present disclosure to describe various information, such information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other.
  • first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information.
  • word "if” as used herein may be interpreted as "at the time of” or "when” or "in response to determining”.
  • artificial intelligence makes image processing more accurate and efficient, and realizes functions such as automatic image recognition.
  • the image to be processed can be input into a pre-trained neural network for processing to obtain the result of image processing.
  • a large number of annotated training images are required, and the number of training images will affect the accuracy of the trained neural network.
  • the annotation of training images is mostly done by manual annotation, which is inefficient and the number of annotated images is limited.
  • the annotation of training images is to annotate the regions of interest in the images. For example, when training a car detection network, it is necessary to annotate the car regions in multiple images at different angles and positions, and use the annotated data as the true value to train the network.
  • At least one embodiment of the present disclosure provides an image annotation method. Please refer to FIG. 1 , which shows the process of the method, including steps S101 to S103 .
  • the method can be used to annotate the training images of the neural network, that is, to add labels to the training images.
  • the method can be executed by an electronic device such as a terminal device or a server.
  • the terminal device can be a user equipment (User Equipment, UE), a mobile device, a user terminal, a terminal, a cellular phone, a cordless phone, a personal digital assistant (Personal Digital Assistant, PDA) handheld device, a computing device, a vehicle-mounted device, a wearable device, etc.
  • the method can be implemented by a processor calling a computer-readable instruction stored in a memory.
  • the method can be executed by a server, which can be a local server, a cloud server, etc.
  • step S101 a set of images to be annotated is obtained, wherein the set of images to be annotated includes a plurality of images to be annotated collected for a target space.
  • multiple images to be annotated in the set of images to be annotated can be collected by the user in advance for the target space.
  • the target space is the space where the 3D model is to be constructed; the target space is related to the purpose of the neural network corresponding to the image to be annotated. For example, if the neural network is used to detect cars, the target space can be the space where cars exist, so that the neural network can be trained to detect cars from the images to be annotated.
  • the image to be annotated is a panoramic image, such as a VR (Virtual Reality) panoramic image.
  • the panoramic image has a wider viewing angle and richer content, which is convenient for building a three-dimensional model in step S102. Compared with ordinary two-dimensional images, a smaller number of panoramic images can be collected to complete the construction of the three-dimensional model.
  • the target object may be an object for the neural network to detect, such as a car, a pedestrian, etc.
  • the set of images to be annotated may include a plurality of images to be annotated taken from different perspectives for the target object.
  • step S102 a three-dimensional model of the target space is generated according to the set of images to be annotated.
  • the three-dimensional model may include a model structure composed of three-dimensional points, and each three-dimensional point in the three-dimensional model has a corresponding pixel point in at least one image to be annotated.
  • Feature extraction can be performed on each image to be annotated in the image set to be annotated, and feature matching and optimization can be performed between every two images to be annotated, and finally a three-dimensional model of the target space can be constructed according to the feature matching results.
  • a three-dimensional model of the target space can be constructed according to the feature matching results.
  • step S103 the three-dimensional model is annotated according to the annotation instruction, and the image to be annotated is annotated according to the annotation result of the three-dimensional model.
  • the marking instruction may be generated according to the user's operation. For example, when the user selects and marks a certain position in the three-dimensional model, a corresponding marking instruction is generated.
  • the position of the target object can be marked in the three-dimensional model according to the marking instruction.
  • the attributes of the target object can also be marked in the three-dimensional model according to the marking instruction.
  • the above two marking scenarios can be: the user selects the position of the target object in the three-dimensional model using a three-dimensional selection box, and further adds the attributes of the target object, thereby generating a corresponding marking instruction, and completing the marking of the position and attributes of the target object.
  • the attribute can be the type, name, etc. of the target object.
  • the annotation result can be projected onto the image to be annotated according to the correspondence between the three-dimensional points in the three-dimensional model and the pixel points in the image to be annotated.
  • the annotation result in the three-dimensional model is a three-dimensional selection box representing the position of the target object
  • the eight vertices of the three-dimensional selection box can be projected onto the corresponding image to be annotated, thereby forming a two-dimensional rectangular selection box on the image to be annotated to represent the position of the target object in the image to be annotated.
  • the image annotation method provided by the embodiment of the present disclosure can generate a three-dimensional model of the target space by using the image set to be annotated, by acquiring a plurality of images to be annotated collected for the target space, and finally annotate the three-dimensional model according to the annotation instruction, and annotate the images to be annotated according to the annotation result of the three-dimensional model. That is, by annotating the three-dimensional model once, the annotation of each image to be annotated can be completed, avoiding the sequential annotation of each image to be annotated, thereby improving the efficiency of image annotation and the number of annotated images.
  • the image to be annotated includes a panoramic image, and multiple (two-dimensional) extended images can be generated according to the panoramic image and pre-configured imaging parameters; and the extended image is annotated according to the annotation result of the panoramic image.
  • the imaging parameters include at least one of the following: field of view angle, resolution, imaging angle, and noise ratio.
  • the panoramic image can be used to render a two-dimensional image with different imaging parameters, thereby further increasing the number and diversity of annotated images.
  • the image rendering and the process of annotating the extended image in this embodiment are both automated, which can improve efficiency and save time in constructing training data compared to manual image acquisition and manual annotation.
  • a neural network training method which uses training images in a training set to train a neural network to be trained, wherein the training images are pre-annotated using the image annotation method described in the first aspect.
  • an image annotation device is provided. Please refer to FIG. 2 .
  • the device includes:
  • An acquisition module 201 is used to acquire a set of images to be annotated, wherein the set of images to be annotated includes a plurality of images to be annotated collected for a target space;
  • a modeling module 202 configured to generate a three-dimensional model of the target space according to the set of images to be annotated
  • the labeling module 203 is used to label the three-dimensional model according to the labeling instruction, and to label the image to be labeled according to the labeling result of the three-dimensional model.
  • the image to be annotated includes a panoramic image.
  • an expansion module is further included for:
  • the extended image is annotated according to the annotation result of the panoramic image.
  • the imaging parameters include at least one of the following: field of view, resolution, imaging angle, and noise ratio.
  • the annotation module is used to annotate the three-dimensional model according to the annotation instruction, specifically to:
  • the position of the target object is marked in the three-dimensional model.
  • annotation module when used to annotate the three-dimensional model according to the annotation instruction, it is also specifically used to:
  • the attributes of the target object are labeled in the three-dimensional model.
  • the set of images to be annotated includes a plurality of images to be annotated taken from different viewing angles and targeting the target object.
  • the annotation module when used to annotate the image to be annotated according to the annotation result of the three-dimensional model, it is specifically used to:
  • the annotation result is projected onto the image to be annotated.
  • a neural network training device which uses training images in a training set to train a neural network to be trained, wherein the training images are pre-annotated using the image annotation device described in the third aspect.
  • the device 300 can be a mobile phone, a computer, a digital broadcast terminal, a message transceiver device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, etc.
  • the device 300 may include one or more of the following components: a processing component 302 , a memory 304 , a power component 306 , a multimedia component 308 , an audio component 310 , an input/output (I/O) interface 312 , a sensor component 314 , and a communication component 316 .
  • the processing component 302 generally controls the overall operation of the device 300, such as operations associated with display, phone calls, data communications, camera operations, and recording operations.
  • the processing component 302 may include one or more processors 320 to execute instructions to complete all or part of the steps of the above-mentioned method.
  • the processing component 302 may include one or more modules to facilitate the interaction between the processing component 302 and other components.
  • the processing component 302 may include a multimedia module to facilitate the interaction between the multimedia component 308 and the processing component 302.
  • the memory 304 is configured to store various types of data to support operations on the device 300. Examples of such data include instructions for any application or method operating on the device 300, contact data, phone book data, messages, pictures, videos, etc.
  • the memory 304 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EPROM erasable programmable read-only memory
  • PROM programmable read-only memory
  • ROM read-only memory
  • magnetic memory flash memory
  • flash memory magnetic disk or optical disk.
  • the power component 306 provides power to the various components of the device 300.
  • the power component 306 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device 300.
  • the multimedia component 308 includes a screen that provides an output interface between the device 300 and the user.
  • the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user.
  • the touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundaries of the touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.
  • the multimedia component 308 includes a front camera and/or a rear camera. When the device 300 is in an operating mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
  • the audio component 310 is configured to output and/or input audio signals.
  • the audio component 310 includes a microphone (MIC), and when the device 300 is in an operating mode, such as a call mode, a recording mode, and a speech recognition mode, the microphone is configured to receive an external audio signal.
  • the received audio signal can be further stored in the memory 304 or sent via the communication component 316.
  • the audio component 310 also includes a speaker for outputting audio signals.
  • I/O interface 312 provides an interface between processing component 302 and peripheral interface modules, such as keyboards, click wheels, buttons, etc. These buttons may include but are not limited to: a home button, a volume button, a start button, and a lock button.
  • the sensor assembly 314 includes one or more sensors for providing various aspects of the status assessment of the device 300.
  • the sensor assembly 314 can detect the open/closed state of the device 300, the relative positioning of components, such as the display and keypad of the device 300, the sensor assembly 314 can also detect the position change of the device 300 or a component of the device 300, the presence or absence of user contact with the device 300, the orientation or acceleration/deceleration of the device 300, and the temperature change of the device 300.
  • the sensor assembly 314 can also include a proximity sensor configured to detect the presence of nearby objects without any physical contact.
  • the sensor assembly 314 can also include an optical sensor, such as a CMOS or CCD image sensor, for use in imaging applications.
  • the sensor assembly 314 can also include an accelerometer, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
  • the communication component 316 is configured to facilitate wired or wireless communication between the device 300 and other devices.
  • the device 300 can access a wireless network based on a communication standard, such as WiFi, 2G or 3G, 4G or 5G or a combination thereof.
  • the communication component 316 receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel.
  • the communication component 316 also includes a near field communication (NFC) module to facilitate short-range communication.
  • the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID radio frequency identification
  • IrDA infrared data association
  • UWB ultra-wideband
  • Bluetooth Bluetooth
  • the apparatus 300 may be implemented by one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), controllers, microcontrollers, microprocessors or other electronic components to execute the power supply method for the above-mentioned electronic device.
  • ASICs application-specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGAs field programmable gate arrays
  • controllers microcontrollers, microprocessors or other electronic components to execute the power supply method for the above-mentioned electronic device.
  • the present disclosure in an exemplary embodiment, further provides a non-transitory computer-readable storage medium including instructions, such as a memory 304 including instructions, and the instructions can be executed by a processor 320 of the device 300 to complete the power supply method of the electronic device.
  • the non-transitory computer-readable storage medium can be a ROM, a random access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

本公开是关于图像标注方法及装置、神经网络训练方法及装置,所述方法包括:获取待标注图像集,其中,所述待标注图像集包括针对目标空间采集的多张待标注图像;根据所述待标注图像集,生成所述目标空间的三维模型;根据标注指令对所述三维模型进行标注,并根据所述三维模型的标注结果对所述待标注图像进行标注。也就是通过对三维模型的一次标注,可以完成每张待标注图像的标注,避免对每张待标注图像进行依次标注,从而提高了图像标注的效率,以及标注图像的数量。

Description

图像标注方法及装置、神经网络训练方法及装置 技术领域
本公开涉及人工智能技术领域,具体涉及一种图像标注方法及装置、神经网络训练方法及装置。
背景技术
近年来,人工智能愈加进步,取得了非常大的发展,逐渐在各个领域掀起了技术革新。例如,人工智能使图像处理变得更加准确和高效,实现了图像的自动识别等功能。具体来说,可以将待处理的图像输入至预先训练的神经网络中进行处理,从而得到图像处理的结果。神经网络在训练时,需要使用大量的完成标注的训练图像,而且训练图像的数量会影响完成训练的神经网络的精度。但是相关技术中,对训练图像的标注多采用人工标注的方式,效率低下,标注的图像数量有限。
发明内容
为克服相关技术中存在的问题,本公开实施例提供一种图像标注方法及装置、神经网络训练方法及装置,用以解决相关技术中的缺陷。
根据本公开实施例的第一方面,提供一种图像标注方法,包括:
获取待标注图像集,其中,所述待标注图像集包括针对目标空间采集的多张待标注图像;
根据所述待标注图像集,生成所述目标空间的三维模型;
根据标注指令对所述三维模型进行标注,并根据所述三维模型的标注结果对所述待标注图像进行标注。
在一个实施例中,所述待标注图像包括全景图像。
在一个实施例中,还包括:
根据所述全景图像和预先配置的成像参数,生成多张扩展图像;
根据所述全景图像的标注结果对所述扩展图像进行标注。
在一个实施例中,所述成像参数包括下述至少一项:视场角、分辨率、成像角度和噪声比。
在一个实施例中,所述目标空间内存在目标对象;
所述根据标注指令对所述三维模型进行标注,包括:
根据标注指令,在所述三维模型中对所述目标对象的位置进行标注。
在一个实施例中,所述根据标注指令对所述三维模型进行标注,还包括:
根据标注指令,在所述三维模型中对所述目标对象的属性进行标注。
在一个实施例中,所述待标注图像集包括从不同视角,针对所述目标对象拍摄的多张待标注图像。
在一个实施例中,所述根据所述三维模型的标注结果对所述待标注图像进行标注,包括:
根据所述三维模型中三维点与所述待标注图像中像素点的对应关系,将所述标注结果投影至所述待标注图像上。
根据本公开实施例的第二方面,提供一种神经网络训练方法,使用训练集中的训练图像对待训练的神经网络进行训练,其中,所述训练图像预先使用第一方面所述的图像标注方法进行标注。
根据本公开实施例的第三方面,提供一种图像标注装置,包括:
获取模块,用于获取待标注图像集,其中,所述待标注图像集包括针对目标空间采集的多张待标注图像;
建模模块,用于根据所述待标注图像集,生成所述目标空间的三维模型;
标注模块,用于根据标注指令对所述三维模型进行标注,并根据所述三维模型的标注结果对所述待标注图像进行标注。
在一个实施例中,所述待标注图像包括全景图像。
在一个实施例中,还包括扩展模块,用于:
根据所述全景图像和预先配置的成像参数,生成多张扩展图像;
根据所述全景图像的标注结果对所述扩展图像进行标注。
在一个实施例中,所述成像参数包括下述至少一项:视场角、分辨率、成像角度和噪声比。
在一个实施例中,所述目标空间内存在目标对象;
所述标注模块用于根据标注指令对所述三维模型进行标注时,具体用于:
根据标注指令,在所述三维模型中对所述目标对象的位置进行标注。
在一个实施例中,所述标注模块用于根据标注指令对所述三维模型进行标注时,还具体用于:
根据标注指令,在所述三维模型中对所述目标对象的属性进行标注。
在一个实施例中,所述待标注图像集包括从不同视角,针对所述目标对象拍摄的多张待标注图像。
在一个实施例中,所述标注模块用于根据所述三维模型的标注结果对所述待标注图像进行标注时,具体用于:
根据所述三维模型中三维点与所述待标注图像中像素点的对应关系,将所述标注结果投影至所述待标注图像上。
根据本公开实施例的第四方面,提供一种神经网络训练装置,使用训练集中的训练图像对待训练的神经网络进行训练,其中,所述训练图像预先使用第三方面所述的图像标注装置进行标注。
根据本公开实施例的第五方面,提供一种电子设备,所述电子设备包括存储器、处理器,所述存储器用于存储可在处理器上运行的计算机指令,所述处理器用于在执行所述计算机指令时实现第一方面所述的图像标注方法。
根据本公开实施例的第六方面,提供一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现第一方面所述的方法。
本公开的实施例提供的技术方案可以包括以下有益效果:
本公开实施例所提供的图像标注方法,通过获取针对目标空间采集的多张待标注图像组成的待标注图像集,可以利用所述待标注图像集,生成所述目标空间的三维模型,最后可以根据标注指令对所述三维模型进行标注,并 根据所述三维模型的标注结果对所述待标注图像进行标注。也就是通过对三维模型的一次标注,可以完成每张待标注图像的标注,避免对每张待标注图像进行依次标注,从而提高了图像标注的效率,以及标注图像的数量。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本发明的实施例,并与说明书一起用于解释本发明的原理。
图1是本公开一示例性实施例示出的图像标注方法的流程图;
图2是本公开一示例性实施例示出的图像标注装置的结构示意图;
图3是本公开一示例性实施例示出的电子设备的结构框图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。
在本公开使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本公开。在本公开和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。
应当理解,尽管在本公开可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本公开范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。
近年来,人工智能愈加进步,取得了非常大的发展,逐渐在各个领域掀起了技术革新。例如,人工智能使图像处理变得更加准确和高效,实现了图像的自动识别等功能。具体来说,可以将待处理的图像输入至预先训练的神经网络中进行处理,从而得到图像处理的结果。神经网络在训练时,需要使用大量的完成标注的训练图像,而且训练图像的数量会影响完成训练的神经网络的精度。但是相关技术中,对训练图像的标注多采用人工标注的方式,效率低下,标注的图像数量有限。
训练图像的标注是将图像中感兴趣区域进行标注。例如,在训练一个汽车的检测网络时,需要对不同角度、不同位置处的多张图像的汽车区域进行标注,利用标注数据作为真值对网络进行训练。
基于此,第一方面,本公开至少一个实施例提供了一种图像标注方法,请参照附图1,其示出了该方法的流程,包括步骤S101至步骤S103。
其中,该方法可以用于对神经网络的训练图像进行标注,也就是为训练图像添加标签。该方法可以由终端设备或服务器等电子设备执行,终端设备可以为用户设备(User Equipment,UE)、移动设备、用户终端、终端、蜂窝电话、无绳电话、个人数字处理(Personal Digital Assistant,PDA)手持设备、计算设备、车载设备、可穿戴设备等,该方法可以通过处理器调用存储器中存储的计算机可读指令的方式来实现。或者,可以通过服务器执行该方法,服务器可以为本地服务器、云端服务器等。
在步骤S101中,获取待标注图像集,其中,所述待标注图像集包括针对目标空间采集的多张待标注图像。
其中,待标注图像集中的多张待标注图像,可以由用户预先针对目标空间进行采集。目标空间即为准备构建三维模型的空间;目标空间与待标注图像对应的神经网络的用途相关,例如神经网络用于检测汽车,则目标空间可以为存在汽车的空间,从而可以训练神经网络从待标注图像中检测到汽车。
示例性的,待标注图像为全景图像,例如VR(Virtual Reality,虚拟现实)全景图像等。全景图像的视角更为广阔,内容更为丰富,便于步骤S102中构 建三维模型。相对于普通的二维图像,可以采集数量较少的全景图像,完成三维模型的构建。
示例性的,所述目标空间内存在目标对象。目标对象可以为神经网络用于检测的对象,例如汽车、行人等。相对应的,则所述待标注图像集可以包括从不同视角,针对所述目标对象拍摄的多张待标注图像。
上述各个示例,可以相互结合组成更进一步的示例。
在步骤S102中,根据所述待标注图像集,生成所述目标空间的三维模型。
其中,三维模型可以包括三维点组成的模型结构,三维模型中的每个三维点均在至少一张待标注图像中具有对应的像素点。
可以对待标注图像集中的每张待标注图像进行特征提取,并对每两张待标注图像之间进行特征匹配和优化,最后可以根据特征匹配结果构建目标空间的三维模型。可以理解的是,上述生成三维模型的方式仅仅是示例性的说明,并非对生成三维模型的方式的限制,也可以采用相关技术中其他模型生成方式进行本步骤中的模型生成。
在步骤S103中,根据标注指令对所述三维模型进行标注,并根据所述三维模型的标注结果对所述待标注图像进行标注。
其中,标注指令可以根据用户的操作而生成,例如用户对三维模型中的某个位置进行选择并标注,则生成对应的标注指令。
在目标空间内存在目标对象的情况下,可以根据标注指令,在所述三维模型中对所述目标对象的位置进行标注。在目标空间内存在目标对象的情况下,还可以根据标注指令,在所述三维模型中对所述目标对象的属性进行标注。上述两种标注情况的场景可以为:用户在三维模型中利用三维选择框对目标对象的位置进行选择,并进一步对目标对象的属性进行添加,从而生成了相应的标注指令,并据此完成了目标对象的位置和属性的标注。属性可以为目标对象的种类、名称等。
根据所述三维模型的标注结果对所述待标注图像进行标注时,可以根据所述三维模型中三维点与所述待标注图像中像素点的对应关系,将所述标注 结果投影至所述待标注图像上。例如,三维模型中的标注结果为表征目标对象的位置的三维选择框,则可以将该三维选择框的八个顶点投影至对应的待标注图像上,从而在待标注图像上形成二维的矩形选择框,以表征目标对象在该待标注图像中的位置。
本公开实施例所提供的图像标注方法,通过获取针对目标空间采集的多张待标注图像组成的待标注图像集,可以利用所述待标注图像集,生成所述目标空间的三维模型,最后可以根据标注指令对所述三维模型进行标注,并根据所述三维模型的标注结果对所述待标注图像进行标注。也就是通过对三维模型的一次标注,可以完成每张待标注图像的标注,避免对每张待标注图像进行依次标注,从而提高了图像标注的效率,以及标注图像的数量。
在数据标注中通常有两个关键问题,数据量和多样性。通常情况下,一个神经网络的训练需要大量的标注数据,并且需要不同视角、不同大小、不同种类的多样性数据形式。
本公开的一些实施例中,所述待标注图像包括全景图像,则可以根据所述全景图像和预先配置的成像参数,生成多张(二维的)扩展图像;并根据所述全景图像的标注结果对所述扩展图像进行标注。其中,所述成像参数包括下述至少一项:视场角、分辨率、成像角度和噪声比。
本实施例中,可以利用全景图像渲染出成像参数不同的二维图像,从而进一步增加了被标注图像的数量和多样性。而且本实施例中渲染图像以及为扩展图像标注的过程都是自动化处理,相较于手动采集图像和手动标注能够提高效率,节约训练数据的构建时间。
根据本公开实施例的第二方面,提供一种神经网络训练方法,使用训练集中的训练图像对待训练的神经网络进行训练,其中,所述训练图像预先使用第一方面所述的图像标注方法进行标注。
根据本公开实施例的第三方面,提供一种图像标注装置,请参照附图2,所述装置包括:
获取模块201,用于获取待标注图像集,其中,所述待标注图像集包括针 对目标空间采集的多张待标注图像;
建模模块202,用于根据所述待标注图像集,生成所述目标空间的三维模型;
标注模块203,用于根据标注指令对所述三维模型进行标注,并根据所述三维模型的标注结果对所述待标注图像进行标注。
在本公开的一些实施例中,所述待标注图像包括全景图像。
在本公开的一些实施例中,还包括扩展模块,用于:
根据所述全景图像和预先配置的成像参数,生成多张扩展图像;
根据所述全景图像的标注结果对所述扩展图像进行标注。
在本公开的一些实施例中,所述成像参数包括下述至少一项:视场角、分辨率、成像角度和噪声比。
在本公开的一些实施例中,所述目标空间内存在目标对象;
所述标注模块用于根据标注指令对所述三维模型进行标注时,具体用于:
根据标注指令,在所述三维模型中对所述目标对象的位置进行标注。
在本公开的一些实施例中,所述标注模块用于根据标注指令对所述三维模型进行标注时,还具体用于:
根据标注指令,在所述三维模型中对所述目标对象的属性进行标注。
在本公开的一些实施例中,所述待标注图像集包括从不同视角,针对所述目标对象拍摄的多张待标注图像。
在本公开的一些实施例中,所述标注模块用于根据所述三维模型的标注结果对所述待标注图像进行标注时,具体用于:
根据所述三维模型中三维点与所述待标注图像中像素点的对应关系,将所述标注结果投影至所述待标注图像上。
根据本公开实施例的第四方面,提供一种神经网络训练装置,使用训练集中的训练图像对待训练的神经网络进行训练,其中,所述训练图像预先使用第三方面所述的图像标注装置进行标注。
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在第 一方面有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
根据本公开实施例的第五方面,请参照附图3,其示例性的示出了一种电子设备的框图。例如,装置300可以是移动电话,计算机,数字广播终端,消息收发设备,游戏控制台,平板设备,医疗设备,健身设备,个人数字助理等。
参照图3,装置300可以包括以下一个或多个组件:处理组件302,存储器304,电源组件306,多媒体组件308,音频组件310,输入/输出(I/O)的接口312,传感器组件314,以及通信组件316。
处理组件302通常控制装置300的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理元件302可以包括一个或多个处理器320来执行指令,以完成上述的方法的全部或部分步骤。此外,处理组件302可以包括一个或多个模块,便于处理组件302和其他组件之间的交互。例如,处理部件302可以包括多媒体模块,以方便多媒体组件308和处理组件302之间的交互。
存储器304被配置为存储各种类型的数据以支持在设备300的操作。这些数据的示例包括用于在装置300上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器304可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。
电力组件306为装置300的各种组件提供电力。电力组件306可以包括电源管理系统,一个或多个电源,及其他与为装置300生成、管理和分配电力相关联的组件。
多媒体组件308包括在所述装置300和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD)和触控面板(TP)。如果屏幕包括触控面板,屏幕可以被实现为触控屏,以接收来自用户的输入 信号。触控面板包括一个或多个触控传感器以感测触控、滑动和触控面板上的手势。所述触控传感器可以不仅感测触控或滑动动作的边界,而且还检测与所述触控或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件308包括一个前置摄像头和/或后置摄像头。当装置300处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。
音频组件310被配置为输出和/或输入音频信号。例如,音频组件310包括一个麦克风(MIC),当装置300处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器304或经由通信组件316发送。在一些实施例中,音频组件310还包括一个扬声器,用于输出音频信号。
I/O接口312为处理组件302和外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。
传感器组件314包括一个或多个传感器,用于为装置300提供各个方面的状态评估。例如,传感器组件314可以检测到装置300的打开/关闭状态,组件的相对定位,例如所述组件为装置300的显示器和小键盘,传感器组件314还可以检测装置300或装置300一个组件的位置改变,用户与装置300接触的存在或不存在,装置300方位或加速/减速和装置300的温度变化。传感器组件314还可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件314还可以包括光传感器,如CMOS或CCD图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件314还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。
通信组件316被配置为便于装置300和其他设备之间有线或无线方式的通信。装置300可以接入基于通信标准的无线网络,如WiFi,2G或3G,4G 或5G或它们的组合。在一个示例性实施例中,通信部件316经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信部件316还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。
在示例性实施例中,装置300可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述电子设备的供电方法。
第四方面,本公开在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器304,上述指令可由装置300的处理器320执行以完成上述电子设备的供电方法。例如,所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
本领域技术人员在考虑说明书及实践这里公开的公开后,将容易想到本公开的其它实施方案。本申请旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求指出。
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。

Claims (12)

  1. 一种图像标注方法,其特征在于,包括:
    获取待标注图像集,其中,所述待标注图像集包括针对目标空间采集的多张待标注图像;
    根据所述待标注图像集,生成所述目标空间的三维模型;
    根据标注指令对所述三维模型进行标注,并根据所述三维模型的标注结果对所述待标注图像进行标注。
  2. 根据权利要求1所述的图像标注方法,其特征在于,所述待标注图像包括全景图像。
  3. 根据权利要求2所述的图像标注方法,其特征在于,还包括:
    根据所述全景图像和预先配置的成像参数,生成多张扩展图像;
    根据所述全景图像的标注结果对所述扩展图像进行标注。
  4. 根据权利要求3所述的图像标注方法,其特征在于,所述成像参数包括下述至少一项:视场角、分辨率、成像角度和噪声比。
  5. 根据权利要求1所述的图像标注方法,其特征在于,所述目标空间内存在目标对象;
    所述根据标注指令对所述三维模型进行标注,包括:
    根据标注指令,在所述三维模型中对所述目标对象的位置进行标注。
  6. 根据权利要求5所述的图像标注方法,其特征在于,所述根据标注指令对所述三维模型进行标注,还包括:
    根据标注指令,在所述三维模型中对所述目标对象的属性进行标注。
  7. 根据权利要求5所述的图像标注方法,其特征在于,所述待标注图像集包括从不同视角,针对所述目标对象拍摄的多张待标注图像。
  8. 根据权利要求1或5所述的图像标注方法,其特征在于,所述根据所述三维模型的标注结果对所述待标注图像进行标注,包括:
    根据所述三维模型中三维点与所述待标注图像中像素点的对应关系,将所述标注结果投影至所述待标注图像上。
  9. 一种神经网络训练方法,其特征在于,使用训练集中的训练图像对待训练的神经网络进行训练,其中,所述训练图像预先使用权利要求1至8任一项所述的图像标注方法进行标注。
  10. 一种图像标注装置,其特征在于,包括:
    获取模块,用于获取待标注图像集,其中,所述待标注图像集包括针对目标空间采集的多张待标注图像;
    建模模块,用于根据所述待标注图像集,生成所述目标空间的三维模型;
    标注模块,用于根据标注指令对所述三维模型进行标注,并根据所述三维模型的标注结果对所述待标注图像进行标注。
  11. 一种终端设备,其特征在于,所述终端设备包括存储器、处理器,所述存储器用于存储可在处理器上运行的计算机指令,所述处理器用于在执行所述计算机指令时基于权利要求1至8中任一项所述的图像标注方法或权利要求9所述的神经网络训练方法。
  12. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述程序被处理器执行时实现权利要求1至9中任一项所述的方法。
PCT/CN2022/127769 2022-10-26 2022-10-26 图像标注方法及装置、神经网络训练方法及装置 WO2024087067A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/127769 WO2024087067A1 (zh) 2022-10-26 2022-10-26 图像标注方法及装置、神经网络训练方法及装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/127769 WO2024087067A1 (zh) 2022-10-26 2022-10-26 图像标注方法及装置、神经网络训练方法及装置

Publications (1)

Publication Number Publication Date
WO2024087067A1 true WO2024087067A1 (zh) 2024-05-02

Family

ID=90829750

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/127769 WO2024087067A1 (zh) 2022-10-26 2022-10-26 图像标注方法及装置、神经网络训练方法及装置

Country Status (1)

Country Link
WO (1) WO2024087067A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109697468A (zh) * 2018-12-24 2019-04-30 苏州科达科技股份有限公司 样本图像的标注方法、装置及存储介质
CN112348122A (zh) * 2020-12-03 2021-02-09 苏州挚途科技有限公司 可行驶区域的标注方法、装置和电子设备
CN112950667A (zh) * 2021-02-10 2021-06-11 中国科学院深圳先进技术研究院 一种视频标注方法、装置、设备及计算机可读存储介质
JP2022119067A (ja) * 2021-02-03 2022-08-16 キヤノン株式会社 画像処理装置および方法、画像処理システム、プログラム
US20220300738A1 (en) * 2021-03-19 2022-09-22 International Business Machines Corporation Ar-based labeling tool for 3d object detection model training

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109697468A (zh) * 2018-12-24 2019-04-30 苏州科达科技股份有限公司 样本图像的标注方法、装置及存储介质
CN112348122A (zh) * 2020-12-03 2021-02-09 苏州挚途科技有限公司 可行驶区域的标注方法、装置和电子设备
JP2022119067A (ja) * 2021-02-03 2022-08-16 キヤノン株式会社 画像処理装置および方法、画像処理システム、プログラム
CN112950667A (zh) * 2021-02-10 2021-06-11 中国科学院深圳先进技术研究院 一种视频标注方法、装置、设备及计算机可读存储介质
US20220300738A1 (en) * 2021-03-19 2022-09-22 International Business Machines Corporation Ar-based labeling tool for 3d object detection model training

Similar Documents

Publication Publication Date Title
WO2022043741A1 (zh) 网络训练、行人重识别方法及装置、存储介质、计算机程序
JP2016531362A (ja) 肌色調整方法、肌色調整装置、プログラム及び記録媒体
CN109584362B (zh) 三维模型构建方法及装置、电子设备和存储介质
WO2021237590A1 (zh) 图像采集方法、装置、设备及存储介质
CN111159449B (zh) 一种图像显示方法及电子设备
US20210407052A1 (en) Method for processing image, related device and storage medium
US20200402321A1 (en) Method, electronic device and storage medium for image generation
WO2020098431A1 (zh) 一种构建地图模型的方法及装置
CN114140536A (zh) 位姿数据处理方法、装置、电子设备及存储介质
US11606531B2 (en) Image capturing method, apparatus, and storage medium
CN113190307A (zh) 控件添加方法、装置、设备及存储介质
CN113642551A (zh) 指甲关键点检测方法、装置、电子设备及存储介质
WO2024087067A1 (zh) 图像标注方法及装置、神经网络训练方法及装置
WO2022110801A1 (zh) 数据处理方法及装置、电子设备和存储介质
CN107239490B (zh) 一种命名人脸图像的方法、装置及计算机可读存储介质
JP2018503988A (ja) ユーザ情報プッシュ方法及び装置
WO2021237592A1 (zh) 锚点信息处理方法、装置、设备及存储介质
US11308702B2 (en) Method and apparatus for displaying an image, electronic device and computer-readable storage medium
CN114387622A (zh) 动物重识别方法及装置、电子设备和存储介质
CN113989424A (zh) 三维虚拟形象的生成方法、装置及电子设备
CN108769513B (zh) 相机拍照方法及装置
CN112432636A (zh) 定位方法及装置、电子设备和存储介质
WO2024087066A1 (zh) 图像定位方法、装置、电子设备及存储介质
CN113364966B (zh) 用于组队拍摄的拍摄方法、拍摄装置及存储介质
US20230097879A1 (en) Method and apparatus for producing special effect, electronic device and storage medium