WO2024020774A1 - 模型生成方法、物体检测方法、控制器以及电子设备 - Google Patents

模型生成方法、物体检测方法、控制器以及电子设备 Download PDF

Info

Publication number
WO2024020774A1
WO2024020774A1 PCT/CN2022/107858 CN2022107858W WO2024020774A1 WO 2024020774 A1 WO2024020774 A1 WO 2024020774A1 CN 2022107858 W CN2022107858 W CN 2022107858W WO 2024020774 A1 WO2024020774 A1 WO 2024020774A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
convolutional neural
network model
feature extraction
object detection
Prior art date
Application number
PCT/CN2022/107858
Other languages
English (en)
French (fr)
Inventor
董学章
于春生
Original Assignee
江苏树实科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 江苏树实科技有限公司 filed Critical 江苏树实科技有限公司
Priority to CN202280005479.0A priority Critical patent/CN115968488A/zh
Priority to PCT/CN2022/107858 priority patent/WO2024020774A1/zh
Publication of WO2024020774A1 publication Critical patent/WO2024020774A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • the present invention relates to the field of image processing technology, and specifically relates to a model generation method, an object detection method, a controller and an electronic device.
  • microcontrollers With the advancement of computer hardware technology, deep learning models can run on the latest 32-bit microcontrollers.
  • the power consumption of currently commonly used microcontrollers (MCUs) is only a few milliwatts. Based on the low power consumption characteristics of microcontrollers, devices using microcontrollers can be powered by button batteries or some solar cells.
  • Microcontrollers are an important part of the development of the Internet of Things.
  • the real-time operating system (RTOS) has been widely used on the STMicroelectronics STM32 platform, Espressif Systems ESP32 platform andhen platform; the real-time operating system enables the microcontroller to support multi-processors ( CPU), multi-threaded applications.
  • RTOS real-time operating system
  • Image classification based on deep learning Convolutional Neural Network is a feed-forward neural network whose artificial neurons can respond to surrounding units within a part of the coverage area and have excellent performance in large-scale image processing.
  • the convolutional neural network model architecture is a multi-layer structure. After the first input layer, the image has several convolutional layers, batch normalization layers, and downsampling layers arranged in various orders. Finally, the output layer outputs the category of the target in the image. and location.
  • the purpose of the present invention is to provide a model generation method, object detection method, controller and electronic equipment, which can obtain a high-precision convolutional neural network model without the need for a large amount of labeled training data, and at the same time save the time spent on labeling training data. required manpower and time.
  • the present invention provides a model generation method, constructs a convolutional neural network model for multi-scale object detection, and divides the convolutional neural network model into multiple modules, and the multiple modules It includes: a feature extraction module and several detection head modules of different scales; using unlabeled training data to pre-train the feature extraction module to obtain parameters and models of the feature extraction module; extracting the trained features
  • the module is connected to multiple detection head modules respectively, and uses the labeled training data to train the connected multiple modules to obtain parameters and models of each module.
  • the present invention also provides an object detection method, applied to a controller.
  • the method includes: obtaining a convolutional neural network model for multi-scale object detection on images to be detected.
  • the convolutional neural network model is based on the above Generated by a model generation method; using the convolutional neural network model to perform object detection on the image to be detected.
  • the present invention also provides a controller for executing the above-mentioned model generation method and/or the above-mentioned object detection method.
  • the invention also provides an electronic device, including: the above-mentioned controller and a memory communicatively connected with the controller.
  • This embodiment provides a model generation method.
  • a convolutional neural network model for multi-scale object detection is constructed, and the convolutional neural network model is divided into multiple modules.
  • the multiple modules include: features extraction module and several detection head modules of different scales; and then use unlabeled training data to pre-train the feature extraction module to obtain the parameters and models of the feature extraction module, so that the feature extraction module learns the unlabeled features in advance characteristics of the training data, and then combine the trained feature extraction module with multiple detection head modules of different scales to obtain a convolutional neural network model, and use the labeled training data to combine the resulting module including multiple of the modules (
  • the convolutional neural network model (including the feature seeking module and the detection head module) is trained to obtain the parameters and models of each module (including the feature seeking module and the detection head module).
  • the feature extraction module Since the feature extraction module has learned the unlabeled training in advance Characteristics of the data. At this time, you can use only a small amount of labeled training data to perform supervised learning training on the combined convolutional neural network model to obtain the final convolutional neural network model without the need for a large amount of labeled training data. It creates a high-precision convolutional neural network model while saving the manpower and time required for training data annotation.
  • pre-training the feature extraction module using unlabeled training data to obtain parameters and models of the feature extraction module includes: using the feature extraction module as an encoding module of an autoencoder Design the decoding module of the autoencoder, and use unlabeled training data to train the autoencoder to obtain the parameters and model of the feature extraction module.
  • the memory occupied by the parameters of the multi-layer structure model corresponding to the module is smaller than the on-chip storage of the controller running the convolutional neural network model.
  • the trained feature extraction module is connected to a plurality of the detection head modules respectively, and the labeled training data is used to train the connected multiple modules to obtain After the parameters and models of each module, the method further includes: respectively converting the parameters and models of each module into a format for running on the controller.
  • the construction of a convolutional neural network model for object detection includes: based on the attributes of the image to be detected and the system parameters of the controller, generating a convolutional neural network model for object detection on the image to be detected. Accumulated neural network model.
  • the memory occupied by the parameters of each module corresponding to the multi-layer structure model is smaller than the on-chip storage of the controller; the use of the convolution The neural network model performs object detection on the image to be detected, including: running multiple modules included in the convolutional neural network model in parallel in multiple threads of the controller, and performing object detection on the image to be detected. .
  • the memory occupied by the parameters of each module corresponding to the multi-layer structure model is smaller than the on-chip storage of the controller; the use of the convolution The neural network model performs object detection on the image to be detected, including: running multiple modules included in the convolutional neural network model in parallel in multiple processors of the controller, and performing object detection on the image to be detected. detection.
  • Figure 1 is a specific flow chart of a model generation method according to the first embodiment of the present invention
  • Figure 2 is a schematic diagram of a convolutional neural network model according to the first embodiment of the present invention.
  • Figure 3 is a flow chart of step 102 of the model generation method in Figure 1;
  • Figure 4 is a specific flow chart of an object detection method according to the second embodiment of the present invention.
  • the first embodiment of the present invention relates to a model generation method for training a generated convolutional neural network model.
  • the trained convolutional neural network can be used for multi-scale object detection in images.
  • Step 101 Construct a convolutional neural network model for multi-scale object detection, and divide the convolutional neural network model into multiple modules.
  • the multiple modules include: a feature extraction module and several detection head modules of different scales.
  • the convolutional neural network model is used for multi-scale object detection, which can be constructed based on the attributes of the image to be detected and the parameters of the controller running the convolutional neural network model.
  • the constructed convolutional neural network model can be used Object detection at multiple scales, that is, the convolutional neural network model includes multiple detection heads. For example, for an image to be detected, if the image to be detected needs to be detected at 1x1, 2x2, 3x3 and 4x4 scales , the constructed convolutional neural network model includes a 1x1 scale detection head, a 2x2 scale detection head, a 3x3 scale detection head and a 4x4 scale detection head.
  • the multi-layer structure of the convolutional neural network model is divided into multiple modules.
  • the multiple modules include: a feature extraction module and several detection heads of different scales. module; the feature extraction module is used to extract features from the input image to be detected, and each detection head module is used to detect objects of corresponding scales.
  • Each module includes multiple layers of the convolutional neural network model. After combining multiple modules, a complete convolutional neural network model can be obtained.
  • the controller can be an MCU microcontroller.
  • the memory occupied by the parameters of the module's corresponding multi-layer structure model is smaller than the on-chip storage of the controller running the convolutional neural network model. That is, when dividing the convolutional neural network model, it is necessary to ensure that the parameters of the multi-layer structure module corresponding to each divided module occupy less than the on-chip storage of the controller to ensure that a single module can run on the controller.
  • the convolutional neural network model includes an input layer for receiving input images; after the input layer, there are several convolutional layers, batch normalization layers and downsampling layers arranged.
  • the convolutional neural network Used for feature extraction; the convolutional neural network includes N detection heads of different scales.
  • the features extracted by the feature extraction module are connected to the output layer of the detection heads of each scale through a fully connected layer or a convolution layer.
  • the output layer outputs the category of the object in the image.
  • the output layer is concatenated with several groups of convolutional layers, batch normalization layers and downsampling layers used for feature extraction to form a feature extraction module, and the detection heads of each scale are
  • the output layer and the adjacent fully connected layer or convolution layer form a detection head module to obtain N detection head modules of different scales, where N is an integer greater than 1; that is, the convolutional neural network model can be divided into a feature extraction module and N Detection head modules of different sizes.
  • Step 102 Use unlabeled training data to pre-train the feature extraction module to obtain parameters and models of the feature extraction module.
  • the feature extraction module is pre-trained using unlabeled data, and the parameters of the feature extraction module are obtained and saved with the model, where the parameters of the feature extraction module include features Extract the connection weights between layers in the module.
  • Step 102 uses unlabeled training data to pre-train the feature extraction module to obtain the parameters and model of the feature extraction module, including: designing an encoding module using the feature extraction module as an autoencoder.
  • the decoding module of the encoder is used to train the autoencoder using unlabeled training data to obtain the parameters and model of the feature extraction module.
  • the feature extraction module divided by the convolutional neural network model is trained.
  • the feature extraction module is used as the encoding module 11 of the autoencoder to design the decoding module 12 of the autoencoder.
  • the encoding module 11 (feature extraction module) and the decoding module 12 form an autoencoder. Since the autoencoder belongs to unsupervised learning and does not rely on the annotation of training data, it can automatically find the intrinsic features of the training data by mining them.
  • the training data is mapped to the feature space, and then the decoding module 12 maps the sampled features obtained by the encoding module 11 (feature extraction module) back to the original space to obtain reconstructed data, and then compares the reconstructed data with the training data to obtain the reconstruction error, Optimize the encoding module 11 (feature extraction module) and decoding module 12 with minimizing the reconstruction error as the optimization goal to obtain the final required encoding module 11 (feature extraction module), and save the parameters and Model, encoding module 11 (feature extraction module) learns to obtain an abstract feature representation for the training data input.
  • Step 103 Connect the trained feature extraction module to multiple detection head modules respectively, and use the labeled training data to train the connected multiple modules to obtain the parameters and models of each module.
  • the feature extraction module is combined with multiple untrained detection head modules to obtain a complete convolutional neural network model, and then the labeled training data is used to combine
  • the obtained convolutional neural network model is trained by supervised learning, and since the feature extraction module has learned the characteristics of the training data in step 102, only a small amount of labeled training data is needed to train the convolutional neural network in this step.
  • the network model undergoes supervised learning training. After completing the training of the combined convolutional neural network model, the final convolutional neural network model is obtained, and the parameters and models of the feature extraction module and each detection head module are saved respectively.
  • step 103 it also includes:
  • Step 104 Convert the parameters and models of each module into a format for running on the controller.
  • the parameters and models of the feature extraction module and each detection head module are converted respectively, so that the feature extraction module and each detection head module
  • the module can be run on the controller; for example, the parameters and models of the feature extraction module and each detection head module are converted into codes, so that the feature extraction module and each detection head module can be directly compiled in the controller, reducing the need for modules to be installed in the controller. It reduces the memory usage and improves the running speed.
  • This embodiment provides a model generation method.
  • a convolutional neural network model for multi-scale object detection is constructed, and the convolutional neural network model is divided into multiple modules.
  • the multiple modules include: a feature extraction module and several Detection head modules of different scales; then use the unlabeled training data to pre-train the feature extraction module to obtain the parameters and models of the feature extraction module, so that the feature extraction module can learn the characteristics of the unlabeled training data in advance, and then
  • the trained feature extraction module is combined with multiple detection head modules of different scales to obtain a convolutional neural network model, and the annotated training data is used to combine the resulting convolutional model including multiple modules (including feature seeking modules and detection head modules).
  • the cumulative neural network model is trained to obtain the parameters and models of each module (including the feature seeking module and the detection head module). Since the feature extraction module has learned the characteristics of the unlabeled training data in advance, only a small amount of labeled training data can be used at this time. Use the training data to perform supervised learning training on the combined convolutional neural network model to obtain the final convolutional neural network model. A high-precision convolutional neural network model can be obtained without the need for a large amount of labeled training data, while saving money. The manpower and time required for training data annotation.
  • the second embodiment of the present invention discloses an object detection method, which is applied to a controller (which can be an MCU microcontroller).
  • the controller runs a convolutional neural network model for multi-scale object detection on images, so that it can Identify multiple scale target objects contained in the input image to be detected.
  • Step 201 Obtain a convolutional neural network model used for multi-scale object detection on the image to be detected.
  • the convolutional neural network model is generated based on the model generation method in the first embodiment.
  • the convolutional neural network model used for object detection is generated based on the model generation method in the first embodiment. After the convolutional neural network model is generated, it can be run in the controller.
  • Step 202 Use the convolutional neural network model to perform object detection on the image to be detected.
  • the memory occupied by the parameters of each module corresponding to the multi-layer structure model is smaller than the on-chip storage of the running controller; the convolutional neural network model is used to detect objects in the image to be detected, It includes: running multiple modules included in the convolutional neural network model in parallel in multiple threads or processors of the controller to perform object detection on the image to be detected. That is, in the convolutional neural network model generated in the first embodiment, the memory required to run each module of the convolutional neural network model (including the feature extraction module and the detection head module of multiple scales) is smaller than that of the controller.
  • each module can run in the controller, and then multiple modules can be selected to run in parallel in multiple threads in the controller, or for controllers that include multiple processors, multiple modules in parallel
  • Running in multiple processors that is, the feature extraction module and each detection head module run in different threads or processors, which can speed up the calculation speed of the controller and improve the speed of object detection in the image to be detected.
  • the feature extraction module and each detection head module run in different processors.
  • the processor running the feature extraction module inputs the extracted features into the processor running each detection head module.
  • the processor that subsequently runs the feature extraction module can collect and extract features of the next image.
  • the third embodiment of the present invention discloses a controller, such as an MCU controller, which is used to execute the model generation method in the first embodiment and/or the object detection method in the second embodiment, that is, the controller can Run the model generation method and the object detection method at the same time, or the model generation method and the object detection method are implemented by different controllers.
  • the model generation method involves a model training process with high computing power, which can be handed over to the processing power.
  • the controller sends the generated convolutional neural network model to the microcontroller, and the microcontroller performs multi-scale object detection on the image to be detected based on the convolutional neural network.
  • the fourth embodiment of the present invention discloses an electronic device.
  • the electronic device includes the controller in the third embodiment and a memory communicatively connected to the controller.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)

Abstract

一种模型生成方法、物体检测方法、控制器以及电子设备,模型生成方法包括:构建用于进行多尺度物体检测的卷积神经网络模型,并将卷积神经网络模型划分为多个模块,多个模块包括:特征提取模块与若干个不同尺度的检测头模块(101);利用未标注的训练数据对特征提取模块进行预训练,得到特征提取模块的参数与模型(102);将训练后的特征提取模块分别与多个检测头模块进行连接,并利用已标注的训练数据对连接后的多个模块进行训练,得到各模块的参数与模型(103),能够在无需大量标注训练数据的条件下得到了高精度的卷积神经网络模型,同时节约了训练数据标注所需的人力和时间。

Description

模型生成方法、物体检测方法、控制器以及电子设备 技术领域
本发明涉及图像处理技术领域,具体涉及一种模型生成方法、物体检测方法、控制器以及电子设备。
背景技术
随着计算机硬件技术的进步,深度学习(deep learning)模型可以在最新的32位的微控制器上运行。目前常用的微控制器(MCU)的功耗只有几毫瓦,基于微控制器低功耗的特性,使得使用微控制器的设备可以使用纽扣电池或一些太阳能电池来供电。微控制器是物联网发展的重要组成部分,实时操作系统(RTOS)已经广泛地运用在意法半导体STM32平台,乐鑫科技ESP32平台和Arduino平台上;实时操作系统使得微控制器支持多处理器(CPU)、多线程的应用。
目标检测(Object Detection)是给定一张图片,从背景中分离出感兴趣的目标,并确定这一目标的类别和位置;即对于给定一张图像,判断图像里面包含目标的类别和位置。基于深度学习的图像分类卷积神经网络(Convolutional Neural Network,CNN)是一种前馈神经网络,它的人工神经元可以响应一部分覆盖范围内的周围单元,对于大型图像处理有出色表现。卷积神经网络模型架构为多层结构,图像在第一输入层之后,按各种顺序排布有若干个卷积层、批量标准化层、降采样层,最后由输出层输出图像中目标的类别和位置。
卷积神经网络模型的卷积层越多,其表示能力越高。但是卷积神经网络模型的层数越多,其中所涉及的参数也就越多,比如可以用在手机中的图像分类模型MobilenetV2大约有3.5M的参数,但目前的微控制器大约只有256KB到512KB的片内存储器,无法适用于在微控制器中,因此微控制器上只能运行层数较少的图像分类卷积神经网络。
发明内容
本发明的目的是提供一种模型生成方法、物体检测方法、控制器以及电子设备,能够在无需大量标注训练数据的条件下得到了高精度的卷积神经网络模型,同时节约了训练数据标注所需的人力和时间。
为实现上述目的,本发明提供了一种模型生成方法,构建用于进行多尺度物体检测的卷积神经网络模型,并将所述卷积神经网络模型划分为多个模块,所述多个模块包括:特征提取模块与若干个不同尺度的检测头模块;利用未标注的训练数据对所述特征提取模块进行预训练,得到所述特征提取模块的参数与模型;将训练后的所述特征提取模块分别与多个所述检测头模块进行连接,并利用已标注的训练数据对连接后的多个所述模块进行训练,得到各所述模块的参数与模型。
本发明还提供了一种物体检测方法,应用于控制器,所述方法包括:获取用于对待检测图像进行多尺度物体检测的卷积神经网络模型,所述卷积神经网络模型为基于上述的模型生成方法所生成;利用所述卷积神经网络模型对所述待检测图像进行物体检测。
本发明还提供了一种控制器,用于执行上述的模型生成方法和/或上述的物体检测方法。
本发明还提供了一种电子设备,包括:上述的控制器以及与所述控制器通信连接的存储器。
本实施例提供了一种模型生成方法,先构建用于进行多尺度物体检测的卷积神经网络模型,并将所述卷积神经网络模型划分为多个模块,所述多个模块包括:特征提取模块与若干个不同尺度的检测头模块;随后再利用未标注的训练数据对所述特征提取模块进行预训练,得到所述特征提取模块的参数与模型,使得特征提取模块预先学习到了未标注的训练数据的特征,继而再将训练后的特征提取模块与多个不同尺度的检测头模块组合得到卷积神经网络模型,并利用已标注的训练数据对组合得到的包括多个所述模块(包括特征求模块与检测头模块)的卷积神经网络模型进行训练,得到各所述模块(包括特征求模块与检测头模块)的参数与模型,由于特征提取模块已经预先学习到了未标注的训练 数据的特征,此时可以仅使用少量的已标注的训练数据对组合得到的卷积神经网络模型进行有监督学习训练,得到最终的卷积神经网络模型,在无需大量标注训练数据的条件下得到了高精度的卷积神经网络模型,同时节约了训练数据标注所需的人力和时间。
在一个实施例中,所述利用未标注的训练数据对所述特征提取模块进行预训练,得到所述特征提取模块的参数与模型,包括:以所述特征提取模块作为自编码器的编码模块设计所述自编码器的解码模块,并利用未标注的训练数据对所述自编码器进行训练得到所述特征提取模块的参数与模型。
在一个实施例中,对于每个所述模块,所述模块对应多层结构模型的参数所占内存小于运行所述卷积神经网络模型的控制器的片内存储。
在一个实施例中,在所述将训练后的所述特征提取模块分别与多个所述检测头模块进行连接,并利用已标注的训练数据对连接后的多个所述模块进行训练,得到各所述模块的参数与模型之后,还包括:分别将各所述模块的参数和模型转换为用于在控制器上进行运行的格式。
在一个实施例中,所述构建用于进行物体检测的卷积神经网络模型,包括:基于待检测图像的属性与控制器的系统参数,生成用于对所述待检测图像进行物体检测的卷积神经网络模型。
在一个实施例中,在获取的所述卷积神经网络模型中,每个所述模块对应多层结构模型的参数所占内存小于所述控制器的片内存储;所述利用所述卷积神经网络模型对所述待检测图像进行物体检测,包括:将所述卷积神经网络模型包含的多个模块并行运行在所述控制器的多个线程中,对所述待检测图像进行物体检测。
在一个实施例中,在获取的所述卷积神经网络模型中,每个所述模块对应多层结构模型的参数所占内存小于所述控制器的片内存储;所述利用所述卷积神经网络模型对所述待检测图像进行物体检测,包括:将所述卷积神经网络模型包含的多个模块并行运行在所述控制器的多个处理器中,对所述待检测图像进行物体检测。
附图说明
图1是根据本发明第一实施例中的模型生成方法的具体流程图;
图2是根据本发明第一实施例中的一种卷积神经网络模型的示意图;
图3是图1中的模型生成方法的步骤102的流程图;
图4是根据本发明第二实施例中的物体检测方法的具体流程图。
具体实施方式
以下将结合附图对本发明的各实施例进行详细说明,以便更清楚理解本发明的目的、特点和优点。应理解的是,附图所示的实施例并不是对本发明范围的限制,而只是为了说明本发明技术方案的实质精神。
在下文的描述中,出于说明各种公开的实施例的目的阐述了某些具体细节以提供对各种公开实施例的透彻理解。但是,相关领域技术人员将认识到可在无这些具体细节中的一个或多个细节的情况来实践实施例。在其它情形下,与本申请相关联的熟知的装置、机构和技术可能并未详细地示出或描述从而避免不必要地混淆实施例的描述。
除非语境有其它需要,在整个说明书和权利要求中,词语“包括”和其变型,诸如“包含”和“具有”应被理解为开放的、包含的含义,即应解释为“包括,但不限于”。
在整个说明书中对“一个实施例”或“一实施例”的提及表示结合实施例所描述的特定特点、机构或特征包括于至少一个实施例中。因此,在整个说明书的各个位置“在一个实施例中”或“在一实施例”中的出现无需全都指相同实施例。另外,特定特点、机构或特征可在一个或多个实施例中以任何方式组合。
如该说明书和所附权利要求中所用的单数形式“一”和“”包括复数指代物,除非文中清楚地另外规定。应当指出的是术语“或”通常以其包括“或/和”的含义使用,除非文中清楚地另外规定。
在以下描述中,为了清楚展示本发明的机构及工作方式,将借助诸多方向性词语进行描述,但是应当将“前”、“后”、“左”、“右”、“外”、“内”、“向外”、“向内”、“上”、“下”等词语理解为方便用语,而不应当理解 为限定性词语。
本发明第一实施例涉及一种模型生成方法,用于对生成的卷积神经网络模型进行训练,训练后的卷积神经网络可以用于进行图像中的多尺度物体检测。
本实施例中的模型生成方法的具体流程如图1所示。
步骤101,构建用于进行多尺度物体检测的卷积神经网络模型,并将卷积神经网络模型划分为多个模块,多个模块包括:特征提取模块与若干个不同尺度的检测头模块。
具体而言,卷积神经网络模型用于进行多尺度物体检测,其可以基于待检测图像的属性与运行该卷积神经网络模型的控制器的参数来构建,构建的卷积神经网络模型可以用于多个尺度的物体检测,即卷积神经网络模型包括多个检测头,举例来说,对于一张待检测图像,若需要对该待检测图像进行1x1、2x2、3x3以及4x4尺度的物体检测,则构建的卷积神经网络模型包括1x1尺度的检测头、2x2尺度的检测头、3x3尺度的检测头以及4x4尺度的检测头。
在构建了用于进行多尺度检测的卷积神经网络模型之后,对卷积神经网络模型的多层结构依次划分得到多个模块,多个模块包括:特征提取模块与若干个不同尺度的检测头模块;特征提取模块用于对输入的待检测图像进行特征提取,各检测头模块则用于进行对应尺度的物体检测。其中,每个模块包括卷积神经网络模型的多个层,多个模块组合之后便可以得到一个完整的卷积神经网络模型。其中,控制器可以为MCU微控制器。
在一个例子中,对于每个模块,模块对应多层结构模型的参数所占内存小于运行卷积神经网络模型的控制器的片内存储。即在对卷积神经网络模型进行划分时,需确保划分得到的每个模块所对应的多层结构模块的参数占用的存储小于控制器的片内存储,以确保单个模块可以在控制器上运行;并且,后续也可以选择多个模块并行运行在控制器中的多个线程中,或者对于包括多处理器的控制器,多个模块并行运行在多个处理器中,即特征提取模块与各检测头模块分别运行在不同的线程或者处理器中,由此能够加快控制器的运算速度,提升了对待检测图像进行物体检测的速度。
以图2的卷积神经网络模型为例,该卷积神经网络模型包括输入层,用于 接收输入图像;在输入层之后,排布有若干个卷积层、批量标准化层和降采样层,用于进行特征提取;卷积神经网络包括N个不同尺度的检测头,特征提取模块所提取得到的特征通过全连接层或者卷积层连接到各个尺度的检测头的输出层,由各检测头的输出层输出图像中物体的类别。
在对图2的卷积神经网络模型进行划分时,将输出层与用于进行特征提取的若干组卷积层、批量标准化层和降采样层级联构成特征提取模块,将各个尺度的检测头的输出层与相邻的全连接层或卷积层组成检测头模块,得到N个不同尺度的检测头模块,N为大于1的整数;即卷积神经网络模型能够划分得到一个特征提取模块与N个不同尺度的检测头模块。
步骤102,利用未标注的训练数据对特征提取模块进行预训练,得到特征提取模块的参数与模型。
具体而言,步骤101中完成对卷积神经网络模型的划分后,利用未标注的数据对特征提取模块进行预训练,得到特征提取模块的参数与模型进行保存,其中特征提取模块的参数包括特征提取模块中各层之间的连接权重。
在一个例子中,请参考图3,步骤102利用未标注的训练数据对特征提取模块进行预训练,得到特征提取模块的参数与模型,包括:以特征提取模块作为自编码器的编码模块设计自编码器的解码模块,并利用未标注的训练数据对自编码器进行训练得到特征提取模块的参数与模型。
以图2的卷积神经网络模型为例,对由卷积神经网络模型划分得到的特征提取模块进行训练,先将特征提取模块作为自编码器的编码模块11来设计自编码器的解码模块12,由此编码模块11(特征提取模块)与解码模块12组成了自编码器,由于自编码器属于无监督学习,不依赖于训练数据的标注,可以通过对训练数据内在特征的挖掘,自动寻找训练数据之间的关系,由此可以使用未标注的训练数据对该自编码器进行训练;将未标注的训练数据输入到编码模块11(特征提取模块),通过编码模块11(特征提取模块)将训练数据映射到特征空间,随后再由解码模块12将编码模块11(特征提取模块)得到抽样特征映射回原始空间得到重构数据,再将重构数据与训练数据进行对比得到重构误差,以最小化重构误差作为优化目标来优化编码模块11(特征提取模块)与解码模块12,得到最终所需的编码模块11 (特征提取模块),保存编码模块11(特征提取模块)的参数与模型,编码模块11(特征提取模块)学习得到针对训练数据输入的抽象特征表示。
步骤103,将训练后的特征提取模块分别与多个检测头模块进行连接,并利用已标注的训练数据对连接后的多个模块进行训练,得到各模块的参数与模型。
具体而言,在经过上述对特征提取模块的预训练后,将特征提取模块与未经过训练的多个检测头模块组合得到完整的卷积神经网络模型,随后再利用已标注的训练数据对组合得到的卷积神经网络模型进行有监督学习训练,并且由于在步骤102中特征提取模块已经学习了训练数据的特征,由此在本步骤中仅需使用少量的已标注的训练数据对卷积神经网络模型进行有监督学习训练,在对组合得到的卷积神经网络模型完成训练后,得到最终的卷积神经网络模型,分别保存特征提取模块与各检测头模块的参数和模型。
在一个例子中,步骤103之后,还包括:
步骤104,分别将各模块的参数和模型转换为用于在控制器上进行运行的格式。
具体而言,在步骤103中保存最终的特征提取模块与各检测头模块的参数和模型后,分别对特征提取模块与各检测头模块的参数和模型进行转换,使得特征提取模块与各检测头模块可以在控制器上运行;例如对特征提取模块与各检测头模块的参数和模型进行代码形式转换,使得特征提取模块与各检测头模块可以直接编译在控制器中,减少了模块在控制器中的内存占用,提升了运行速度。
本实施例提供一种模型生成方法,先构建用于进行多尺度物体检测的卷积神经网络模型,并将卷积神经网络模型划分为多个模块,多个模块包括:特征提取模块与若干个不同尺度的检测头模块;随后再利用未标注的训练数据对特征提取模块进行预训练,得到特征提取模块的参数与模型,使得特征提取模块预先学习到了未标注的训练数据的特征,继而再将训练后的特征提取模块与多个不同尺度的检测头模块组合得到卷积神经网络模型,并利用已标注的训练数据对组合得到的包括多个模块(包括特征求模块与检测头模块)的卷积神经网络模型进行训练,得到各模块(包括特征求模块与检测头模块)的参数与模型,由 于特征提取模块已经预先学习到了未标注的训练数据的特征,此时可以仅使用少量的已标注的训练数据对组合得到的卷积神经网络模型进行有监督学习训练,得到最终的卷积神经网络模型,在无需大量标注训练数据的条件下得到了高精度的卷积神经网络模型,同时节约了训练数据标注所需的人力和时间。
本发明第二实施例公开了一种物体检测方法,应用于控制器(可以为MCU微控制器),控制器中运行有用于对图像进行多尺度物体检测的卷积神经网络模型,由此能够识别输入的待检测图像中所包含的多个尺度目标物体。
本实例中的物体检测方法的具体流程如图4所示。
步骤201,获取用于对待检测图像进行多尺度物体检测的卷积神经网络模型,卷积神经网络模型为基于第一实施例中的模型生成方法所生成。
具体而言,用于进行物体检测的卷积神经网络模型为基于第一实施例中的模型生成方法所生成,卷积神经网络模型生成之后可在控制器中运行。
步骤202,利用卷积神经网络模型对待检测图像进行物体检测。
在一个例子中,在获取的卷积神经网络模型中,每个模块对应多层结构模型的参数所占内存小于运行控制器的片内存储;利用卷积神经网络模型对待检测图像进行物体检测,包括:将卷积神经网络模型包含的多个模块并行运行在控制器的多个线程或处理器中,对待检测图像进行物体检测。即,在第一实施例所生成的卷积神经网络模型中,运行该卷积神经网络模型的每个模块(包括特征提取模块与多个尺度的检测头模块)所需占用的内存小于控制器的片内存储,由此每个模块均可以在控制器中运行,继而可以选择多个模块并行运行在控制器中的多个线程中,或者对于包括多处理器的控制器,多个模块并行运行在多个处理器中,即特征提取模块与各检测头模块分别运行在不同的线程或者处理器中,由此能够加快控制器的运算速度,提升了对待检测图像进行物体检测的速度。举例来说,特征提取模块与各检测头模块分别运行在不同的处理器中,运行特征提取模块的处理器在完成当前图像的特征提取后,将提取的特征分别输入到运行各检测头模块的处理器中,随后运行特征提取模块的处理器便可以进行下一张图像的采集与特征提取。
本发明第三实施例公开了一种控制器,例如为MCU控制器,控制器用于执行第一实施例中的模型生成方法和/或第二实施例中的物体检测方法,即该控制器能够同时运行模型生成方法与物体检测方法,或者模型生成方法与物体检测方法分别由不同的控制器来实现,例如在模型生成方法中涉及到了对运算能力较高的模型训练过程,可以交由处理能力较强的控制器来实现,控制器在将生成的卷积神经网络模型发送到微控制器,由微控制器基于该卷积神经网络对待检测图像进行多尺度物体检测。
本发明第四实施例公开了一种电子设备,电子设备包括第三实施例中的控制器以及与该控制器通信连接的存储器。
以上已详细描述了本发明的较佳实施例,但应理解到,若需要,能修改实施例的方面来采用各种专利、申请和出版物的方面、特征和构思来提供另外的实施例。
考虑到上文的详细描述,能对实施例做出这些和其它变化。一般而言,在权利要求中,所用的术语不应被认为限制在说明书和权利要求中公开的具体实施例,而是应被理解为包括所有可能的实施例连同这些权利要求所享有的全部等同范围。

Claims (10)

  1. 一种模型生成方法,其特征在于,包括:
    构建用于进行多尺度物体检测的卷积神经网络模型,并将所述卷积神经网络模型划分为多个模块,所述多个模块包括:特征提取模块与若干个不同尺度的检测头模块;
    利用未标注的训练数据对所述特征提取模块进行预训练,得到所述特征提取模块的参数与模型;
    将训练后的所述特征提取模块分别与多个所述检测头模块进行连接,并利用已标注的训练数据对连接后的多个所述模块进行训练,得到各所述模块的参数与模型。
  2. 根据权利要求1所述的模型生成方法,其特征在于,所述利用未标注的训练数据对所述特征提取模块进行预训练,得到所述特征提取模块的参数与模型,包括:
    以所述特征提取模块作为自编码器的编码模块设计所述自编码器的解码模块,并利用未标注的训练数据对所述自编码器进行训练得到所述特征提取模块的参数与模型。
  3. 根据权利要求1所述的模型生成方法,其特征在于,对于每个所述模块,所述模块对应多层结构模型的参数所占内存小于运行所述卷积神经网络模型的控制器的片内存储。
  4. 根据权利要求1所述的模型生成方法,其特征在于,在所述将训练后的所述特征提取模块分别与多个所述检测头模块进行连接,并利用已标注的训练数据对连接后的多个所述模块进行训练,得到各所述模块的参数与模型之后,还包括:
    分别将各所述模块的参数和模型转换为用于在控制器上进行运行的格式。
  5. 根据权利要求1所述的模型生成方法,其特征在于,所述构建用于进行物体检测的卷积神经网络模型,包括:
    基于待检测图像的属性与控制器的系统参数,生成用于对所述待检测图像 进行物体检测的卷积神经网络模型。
  6. 一种物体检测方法,其特征在于,应用于控制器,所述方法包括:
    获取用于对待检测图像进行多尺度物体检测的卷积神经网络模型,所述卷积神经网络模型为基于权利要求1至5中任一项所述的模型生成方法所生成;
    利用所述卷积神经网络模型对所述待检测图像进行物体检测。
  7. 根据权利要求6所述的物体检测方法,其特征在于,在获取的所述卷积神经网络模型中,每个所述模块对应多层结构模型的参数所占内存小于所述控制器的片内存储;所述利用所述卷积神经网络模型对所述待检测图像进行物体检测,包括:
    将所述卷积神经网络模型包含的多个模块并行运行在所述控制器的多个线程中,对所述待检测图像进行物体检测。
  8. 根据权利要求6所述的物体检测方法,其特征在于,在获取的所述卷积神经网络模型中,每个所述模块对应多层结构模型的参数所占内存小于所述控制器的片内存储;所述利用所述卷积神经网络模型对所述待检测图像进行物体检测,包括:
    将所述卷积神经网络模型包含的多个模块并行运行在所述控制器的多个处理器中,对所述待检测图像进行物体检测。
  9. 一种控制器,其特征在于,用于执行权利要求1至5中任一项所述的模型生成方法和/或权利要求6至8中任一项所述的物体检测方法。
  10. 一种电子设备,其特征在于,包括:权利要求9所述的控制器以及与所述控制器通信连接的存储器。
PCT/CN2022/107858 2022-07-26 2022-07-26 模型生成方法、物体检测方法、控制器以及电子设备 WO2024020774A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202280005479.0A CN115968488A (zh) 2022-07-26 2022-07-26 模型生成方法、物体检测方法、控制器以及电子设备
PCT/CN2022/107858 WO2024020774A1 (zh) 2022-07-26 2022-07-26 模型生成方法、物体检测方法、控制器以及电子设备

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/107858 WO2024020774A1 (zh) 2022-07-26 2022-07-26 模型生成方法、物体检测方法、控制器以及电子设备

Publications (1)

Publication Number Publication Date
WO2024020774A1 true WO2024020774A1 (zh) 2024-02-01

Family

ID=87355026

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/107858 WO2024020774A1 (zh) 2022-07-26 2022-07-26 模型生成方法、物体检测方法、控制器以及电子设备

Country Status (2)

Country Link
CN (1) CN115968488A (zh)
WO (1) WO2024020774A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117496362B (zh) * 2024-01-02 2024-03-29 环天智慧科技股份有限公司 基于自适应卷积核和级联检测头的土地覆盖变化检测方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109614985A (zh) * 2018-11-06 2019-04-12 华南理工大学 一种基于密集连接特征金字塔网络的目标检测方法
CN110414380A (zh) * 2019-07-10 2019-11-05 上海交通大学 一种基于目标检测的学生行为检测方法
CN112183591A (zh) * 2020-09-15 2021-01-05 上海电机学院 基于堆栈稀疏降噪自动编码网络的变压器故障诊断方法
WO2021097728A1 (en) * 2019-11-20 2021-05-27 Nvidia Corporation Identification of multi-scale features using neural network
CN114429578A (zh) * 2022-01-28 2022-05-03 北京建筑大学 古建筑脊兽装饰件巡检方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109614985A (zh) * 2018-11-06 2019-04-12 华南理工大学 一种基于密集连接特征金字塔网络的目标检测方法
CN110414380A (zh) * 2019-07-10 2019-11-05 上海交通大学 一种基于目标检测的学生行为检测方法
WO2021097728A1 (en) * 2019-11-20 2021-05-27 Nvidia Corporation Identification of multi-scale features using neural network
CN112183591A (zh) * 2020-09-15 2021-01-05 上海电机学院 基于堆栈稀疏降噪自动编码网络的变压器故障诊断方法
CN114429578A (zh) * 2022-01-28 2022-05-03 北京建筑大学 古建筑脊兽装饰件巡检方法

Also Published As

Publication number Publication date
CN115968488A (zh) 2023-04-14

Similar Documents

Publication Publication Date Title
CN110197276B (zh) 用于深度学习加速的数据体雕刻器
Sze Designing hardware for machine learning: The important role played by circuit designers
WO2019007406A1 (zh) 一种数据处理装置和方法
CN110383300A (zh) 一种计算装置及方法
WO2024020774A1 (zh) 模型生成方法、物体检测方法、控制器以及电子设备
CN112069804B (zh) 基于动态路由的交互式胶囊网络的隐式篇章关系识别方法
CN113392749A (zh) 一种基于gaf-vgg的滚动轴承故障诊断方法及装置
CN110968235A (zh) 信号处理装置及相关产品
Zhang et al. Graph-pbn: Graph-based parallel branch network for efficient point cloud learning
Wu et al. uSystolic: Byte-crawling unary systolic array
Yuan et al. CTIF-Net: A CNN-Transformer Iterative Fusion Network for Salient Object Detection
Belabed et al. Low cost and low power stacked sparse autoencoder hardware acceleration for deep learning edge computing applications
Yan et al. Acceleration and optimization of artificial intelligence CNN image recognition based on FPGA
CN116630753A (zh) 一种基于对比学习的多尺度小样本目标检测方法
Yang et al. Gated res2net for multivariate time series analysis
WO2024020773A1 (zh) 模型生成方法、图像分类方法、控制器以及电子设备
Luo et al. Achieving green ai with energy-efficient deep learning using neuromorphic computing
Lan et al. Efficient converted spiking neural network for 3d and 2d classification
Xu et al. Unsupervised representation learning for large-scale wafer maps in micro-electronic manufacturing
Zhang et al. A Hybrid Neural Network-Based Intelligent Forecasting Approach for Capacity of Photovoltaic Electricity Generation
Langroudi et al. Digital neuromorphic chips for deep learning inference: a comprehensive study
Zu Deep learning parallel computing and evaluation for embedded system clustering architecture processor
Cordeiro et al. Efficient Machine Learning execution with Near-Data Processing
He et al. Healthcare entity recognition based on deep learning
Liu et al. Overview of deep learning research