WO2022237061A1 - Embedded object cognitive system based on image processing - Google Patents

Embedded object cognitive system based on image processing Download PDF

Info

Publication number
WO2022237061A1
WO2022237061A1 PCT/CN2021/122781 CN2021122781W WO2022237061A1 WO 2022237061 A1 WO2022237061 A1 WO 2022237061A1 CN 2021122781 W CN2021122781 W CN 2021122781W WO 2022237061 A1 WO2022237061 A1 WO 2022237061A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
embedded
module
model
image processing
Prior art date
Application number
PCT/CN2021/122781
Other languages
French (fr)
Chinese (zh)
Inventor
王宜怀
刘纯平
王进
施连敏
胡展鹏
常诚
Original Assignee
苏州大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州大学 filed Critical 苏州大学
Publication of WO2022237061A1 publication Critical patent/WO2022237061A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Definitions

  • the invention relates to the technical field of embedded artificial intelligence, in particular to an embedded object recognition system based on image processing.
  • Embedded Artificial Intelligence is the product of deep integration of embedded computer technology, artificial intelligence technology and the actual needs of each application scenario.
  • embedded artificial intelligence also has the excellent real-time performance, applicability, robustness and stability of embedded technology.
  • the traditional embedded intelligent software and hardware platform centers on the cloud server, and transmits the raw data collected by the terminal to the cloud. Data storage and analysis are all completed in the cloud, and the embedded terminal only realizes data collection and output results. The corresponding operation completes a cycle of data in this way.
  • This kind of intelligent embedded software and hardware platform with cloud computing as the core has problems such as high overhead, poor real-time performance, and data privacy, and cannot meet most practical application needs.
  • edge computing and fog computing aiming to overcome the shortcomings of the embedded intelligent software and hardware platform at the core of the cloud have been proposed.
  • this type of method only divides the propagation process of the network model into two parts: the terminal and the cloud.
  • the embedded terminal does not have complete cognitive capabilities, and the complete reasoning process still requires cloud computing.
  • the system collects images in real time, and obtains recognition results after reasoning through a lightweight convolutional neural network model in the model training module; the system can greatly reduce the demand for hardware resources while ensuring the accuracy of object recognition and reasoning speed , to realize the recognition and classification of different types of objects.
  • the purpose of the embodiments of the present invention is to provide an embedded object recognition system based on image processing, which collects images in real time and obtains recognition results after reasoning through a lightweight convolutional neural network model in the model training module ;
  • the system can greatly reduce the demand for hardware resources while ensuring the accuracy of object recognition and reasoning speed, and realize the recognition and classification of different types of objects.
  • an embedded object recognition system based on image processing includes: an image acquisition module, used to collect image features of training objects; a model training module, connected with the image The acquisition module is connected, and the image features obtained by the image acquisition module are used as training materials, and a preset algorithm is used to generate a cognitive model parameter component that can be directly compiled and used under the embedded engineering framework; the model terminal deployment module is used for Deploy the cognitive model parameter components obtained by the model training module on the embedded terminal; the terminal reasoning module uses the cognitive model provided by the model terminal deployment module according to the image of the target object collected by the image collection module The parameter component performs target object recognition.
  • the model training module includes two modes: a PC model training mode and an embedded terminal real-time reasoning training mode.
  • the PC model training mode includes the steps: the embedded terminal acquires image features and transmits the image features to the PC; the PC builds a data set according to the image features and follows the presets in the model training module The algorithm generates a cognitive model parameter component that can be directly compiled and used under the embedded engineering framework; the cognitive model parameter component is deployed to the embedded terminal by burning.
  • the real-time inference training mode of the embedded terminal includes the steps: the embedded terminal acquires image features; the terminal inference module performs image processing according to the image features and generates an image that can be used according to the preset algorithm in the model training module.
  • the hardware configuration in the model terminal deployment module is different according to the nature of the parameters in the embedded object recognition system.
  • constant parameters are stored in FLASH memory, and variable parameters are stored in RAM memory.
  • the constant parameters include filter erosion, bias BIAS parameters, and propagation structure functions;
  • the variable parameters include image features, input variables and output variables.
  • the embedded object recognition system replaces the reading and writing of the dynamic array in the RAM with the erasing and reading and writing of the continuous address space of the preset FLASH designated sector.
  • the parameter format used in the preset algorithm in the model training module is a multidimensional array form in C language.
  • an optimized camera driving algorithm is used in the image acquisition module to drive the camera to acquire image features of the training object.
  • the image acquisition module includes an image processing unit, and the image processing unit uses a threshold filtering method to process the original image of the training object obtained by the image acquisition module to obtain the image features of the training object .
  • the model training module uses a fusion rolling convolution algorithm to generate a cognitive model parameter component that can be directly compiled and used under the embedded engineering framework.
  • the purpose of the embodiments of the present invention is to provide an embedded object recognition system based on image processing, which collects images in real time and obtains recognition results after reasoning through a lightweight convolutional neural network model in the model training module.
  • the model training module in the system adopts the fusion rolling convolution algorithm to effectively optimize the embedded system's demand for the size and space of the image area.
  • the system uses an optimized camera driving algorithm to drive the camera during the image acquisition process, which effectively improves the speed of image reading and display.
  • the system can greatly reduce the demand for hardware resources while ensuring the accuracy of object recognition and reasoning speed, and realize the recognition and classification of different types of objects.
  • FIG. 1 is a schematic diagram of modules of an embedded object recognition system provided by an embodiment of the present invention
  • Fig. 2 is a schematic diagram of the data flow model of the embedded object recognition system of the embodiment shown in Fig. 1;
  • Fig. 3 (a) is the schematic flow chart of PC model training mode in the embodiment shown in Fig. 1;
  • Fig. 3 (b) is the schematic flow chart of embedded terminal real-time inference training mode in the embodiment shown in Fig. 1;
  • Fig. 4 is a frame diagram of hardware configuration allocation in the model terminal deployment module in the embodiment shown in Fig. 1;
  • Fig. 5 is the flowchart of optimizing camera driving algorithm in the embodiment of the present invention.
  • 6(a), 6(b), and 6(c) are schematic diagrams of the fusion rolling convolution algorithm in the embodiment of the present invention.
  • Embedded object recognition system 10. Image acquisition module 20. Model training module 30. Model terminal deployment module 40. Terminal reasoning module.
  • the embedded object recognition system 100 based on image processing includes an image acquisition module 10 , a model training module 20 , a model terminal deployment module 30 and a terminal reasoning module 40 .
  • the image acquisition module 10 is used to acquire image features of the training object.
  • the model training module 20 is connected with the image acquisition module 10, uses the image features obtained by the image acquisition module 10 as the training material, and adopts a preset algorithm to generate cognitive model parameter components that can be directly compiled and used under the embedded engineering framework.
  • the model terminal deployment module 30 is used to deploy the cognitive model parameter components obtained by the model training module 20 on the embedded terminal.
  • the terminal inference module 40 recognizes the target object according to the image of the target object collected by the image collection module 10 and uses the cognitive model parameter components provided by the model terminal deployment module 30 .
  • the image acquisition module 10 first collects the feature data of the corresponding object as the material for training the cognitive model; after collecting a sufficient amount of image data, the model training module 20 trains the model, and finally generates an image that can be embedded in a general purpose through a related algorithm.
  • Cognitive model parameter components directly compiled and used under the engineering framework; the model terminal deployment module 30 deploys the cognitive model parameter components obtained by the model training module 20 on the embedded terminal, that is, after recompiling and burning, the new cognitive model
  • the cognitive model is deployed on the terminal, and at this time, the terminal can recognize the target object through the terminal reasoning module 40 to obtain a cognitive result.
  • model training module 20 can be divided into two modes: PC model training mode and embedded terminal real-time reasoning training mode according to the direction of data transmission and the different functions performed by the system.
  • the PC model training mode includes steps: the embedded terminal obtains image features and transmits the image features to the PC end, that is, the image acquisition module 10 collects image feature data of the corresponding object and transmits the image features to the PC end; Create a data set and generate a cognitive model parameter component that can be directly compiled and used under the embedded engineering framework according to the preset algorithm in the model training module; deploy the cognitive model parameter component to the embedded terminal by burning .
  • the embedded terminal real-time inference training mode includes steps: the embedded terminal obtains image features, that is, the image acquisition module 10 collects image feature data of corresponding objects; the terminal inference module performs image processing according to the image features and follows the preset in the model training module The algorithm generates cognitive model parameter components that can be directly compiled and used under the embedded engineering framework; the cognitive model parameter components are stored in the embedded terminal.
  • RAM Random Access Memory
  • FLASH Flash EEPROM Memory
  • RAM provides temporary data such as local variables required by the processor during operation
  • FLASH stores read-only data for program operation and the program itself.
  • the space size of RAM is much smaller than that of FLASH. Taking the STM32L431RC chip as an example, the size of RAM is 64KB, while the size of FLASH is 256KB.
  • the size of the main control chip RAM determines the performance of the system to process data, and in the reasoning process of the network model, the temporary data generated by each sub-network during the reasoning process will not affect the follow-up, and the feed-forward neural network between each layer
  • the layer-by-layer propagation of the network model is realized only by outputting the feature matrix, so the total data volume of all parameters used by the single-layer network should not be greater than the RAM space size of the main control chip.
  • the resource consumption of the network model in the embedded terminal is also divided into the deployed parameter model itself and the additional data consumption generated by the runtime model propagation.
  • the data resources involved are the input image, network model parameters, temporary space generated during the operation process, input and output during each layer transfer process, and the output of the final model. There are five data categories.
  • the image processing-based embedded object recognition system 100 designs a reasonable model resource configuration framework according to the spatial resource characteristics of the embedded terminal chip.
  • the hardware configuration in the model terminal deployment module 30 is different according to the nature of the parameters in the embedded object recognition system. Specifically, the constant parameters are stored in the FLASH memory, and the variable parameters are stored in the RAM memory. Adapt the model parameters of different characteristics to different physical storage resources in the embedded terminal, rationalize the resource allocation method for terminal operation, and reduce unnecessary resource consumption.
  • the constant parameters include filter erosion, bias BIAS parameters, and propagation structure functions;
  • the variable parameters include image features, input variables, and output variables.
  • network model parameters that do not change during propagation such as convolution kernel parameters, bias, etc.
  • the system uses a constant array It is stored in FLASH in the form of FLASH. Compared with RAM, FLASH has a larger resource space and is more suitable for storing such data.
  • each layer of network transmission process exists as a fixed-size multi-dimensional array in the embedded terminal, but the value of each array member will change during the transmission process, so the system uses a dynamic array
  • the method is stored in RAM, which is more conducive to the computing speed of the system. Since the input image and the output array are jointly processed with other software layer components, the system stores them in RAM as global arrays.
  • the size of RAM is relatively small, but RAM is responsible for the main data calculations.
  • the resources occupied by each layer are different, and the space resources occupied by different network structures of each layer are quite different.
  • the network parameters and input and output used in the first convolutional layer occupy nearly 100KB of space, while the final fully connected neural network layer only occupies less than 1KB of computing space. How to cut the network layer that takes up the most resources and reduce the space occupied by this layer network by other means is one of the problems that need to be solved urgently.
  • the embedded object recognition system 100 replaces the reading and writing of the dynamic array in the RAM with the erasing and reading and writing of the continuous address space of the preset FLASH designated sector.
  • the consumption of RAM resources by the embedded object recognition system 100 is reduced, the applicability of the main control chip to the size of the network model is improved, and the accuracy of model reasoning is improved with less time loss.
  • the specific algorithm of erasing and replacing is shown in Table 1:
  • the embedded object recognition system 100 first reads the parameters of the H5 format file generated by training on the network model fitting platform, and then generates a C of the model parameters that can be directly compiled in the embedded engineering framework according to the designed algorithm. Language version widget.
  • the data format in the H5 file is a tree structure, which is divided into two types of data: weight and bias.
  • the expression form of the specific data of the convolution kernel element a in the h row w column of the nth dimension of the l-layer network is shown in formula 1:
  • Equation 2 The specific expression of the data of the kth bias item b of the l-layer network is shown in Equation 2:
  • the embedded object recognition system 100 designs a model reasoning parameter format conversion algorithm to convert the parameters used in model reasoning into a multidimensional array in C language in storage form, and convert the algorithm model into an embedded engineering component. That is, the parameter format used in the preset algorithm in the model training module 20 is a multidimensional array form in C language.
  • the embodiment of the invention first converts the model parameters into a multidimensional array form in C language, then extracts the commonality in the embedded project, adds the attached header file components, and elements such as variable declarations required at the beginning of the file, and converts them into general
  • the form of embedded engineering components can be directly deployed on embedded terminals, providing model parameter data support for terminal reasoning.
  • the embedded object recognition system 100 designs an image acquisition acceleration algorithm and a feature extraction algorithm to ensure that the terminal can quickly acquire high-quality image data.
  • an optimized camera driving algorithm is used in the image acquisition module 10 to drive the camera to acquire image features of the training object.
  • the size of the image transmitted by the camera to the cache chip is 80 ⁇ 60 pixels, and the pixel information data needs to be read 4800 times, so the method of judging the validity of each pixel and the communication process between the camera and the cache chip The speed increase after zooming in is considerable.
  • FIG. 5 the flow chart of optimizing the camera driving algorithm in the embodiment of the present invention.
  • the embedded object recognition system 100 needs to output 2 clock signals in total to obtain a pixel point, that is, 4 GPIO port high/low level signals, and 19,200 clock signals need to be output to read a complete QVGA image. Therefore, the system first performs an acceleration optimization operation for the operation of outputting the clock signal.
  • the traditional embedded GPIO operation usually calls the packaged GPIO interface function, and the interface function often places parameter verification and judgment operations in the function to ensure the robustness of the program.
  • STMicroelectronics' HAL bottom packaging library a GPIO pull-up operation requires a series of operations such as port correctness verification and GPIO register verification. These operations avoid irregular parameter input and ensure the reusability and robustness of the function.
  • the embedded object recognition system 100 performs setting operations on the input and output registers corresponding to the communication GPIO port of the camera module, avoiding parameter transmission and judgment, and improving the efficiency and speed of image acquisition.
  • the space occupied by the image data can be reduced while ensuring that the image feature data is not lost as much as possible.
  • the image data acquisition module transmits pixel by pixel, so in the process of image acquisition by the embedded terminal, related operations are performed on the one-dimensional image array.
  • the image input dimension of the image processing algorithm adopted is H ⁇ H pixel size
  • the algorithm compresses the image data of QVGA size into H ⁇ H format size.
  • the compression algorithm first cuts the collected image to a size of 60 ⁇ 60, and then judges whether the ordinal number n of the input pixel point is the target pixel point: if so, store it in the corresponding target two-dimensional array position Ax,y , otherwise discard.
  • the specific compression algorithm is shown in Table 2:
  • the embodiment of the present invention also performs LCD display acceleration.
  • the process of displaying single pixel data on the LCD is also the process of the chip and the LCD through the serial peripheral interface (Serial Peripheral Interface). Peripheral Interface, SPI) communication process. Change the process of point-by-point display to first set the LCD display area, and then call the SPI to directly send the pixel data to the LCD, eliminating the need to set the coordinate process for each transmission.
  • the LCD display adopts a point-by-point display method, that is, after receiving a pixel data, it will be displayed directly on the LCD, and only a single pixel resource is occupied and reused to display a pixel on the LCD. a complete image.
  • the traditional LCD display pixel point function is to position and display each pixel point, that is, first determine the relative position displayed on the LCD, and then display a corresponding point. But this is inefficient for the image display case where the designated area is repeated many times.
  • Each pixel displayed in the image has a certain positional continuity relationship with the pixels displayed before and after, and it is not necessary to locate each pixel to display a complete image.
  • the image acquisition module 10 further includes an image processing unit (not shown in the figure).
  • the image processing unit uses a threshold filtering method to process the original image of the training object obtained by the image acquisition module to obtain the image features of the training object, thereby effectively filtering out the image background and retaining the image features of the target object itself as much as possible.
  • the threshold filtering method specifically includes an edge mean method, a bimodal mean method, and a bimodal valley method.
  • the edge mean method performs the mean value operation on all the edge pixels, and the obtained mean value is used as a threshold to filter the image.
  • the algorithm of bimodal mean method is based on the idea of iterative update. The algorithm first processes the number of occurrences of each gray value in the input image into the form of a histogram, and then performs a bimodal judgment on the histogram, that is, whether there are and only two local maxima appear, and if so, take this The average of the two local maxima is used as the filtering threshold. Otherwise, each data point is smoothed with a span of n, and the number of smoothing is given at the same time.
  • the bimodal valley method does not take the two intermediate gray values, but takes the lowest valley between the two peaks, that is, the gray value that appears between the two gray values.
  • a gray value with the lowest frequency is used as a threshold to filter the image.
  • the usual network model processing method is often to obtain the entire image data first, and then input the image data into the network, which is reasonable in the case of high resources.
  • the input image is large, such as a 224 ⁇ 224 pixel image usually used by the network model, a single image occupies a space of 49KB. In this ebb and flow system space, this also affects the number of model parameters, which ultimately reduces the recognition accuracy of the system. How to minimize the space occupied by the input image is the key to improving the space utilization efficiency of the system.
  • the model training module 20 in the embedded object recognition system 100 adopts a dynamic rolling convolution algorithm to generate cognitive model parameter components that can be directly compiled and used under the embedded engineering framework.
  • the system integrates the image acquisition process with the first convolutional layer of the network, and designs a dynamic rolling convolution algorithm.
  • the dynamic rolling convolution algorithm divides the fusion rolling convolution algorithm into the following steps according to the decomposability and controllability of the image acquisition process of the embedded terminal camera:
  • the input image storage space required by the fusion convolution algorithm of the embodiment of the present invention becomes S ⁇ (H+1), while the space occupied by the traditional convolution method is S ⁇ T.
  • the embedded object recognition system based on image processing can collect images in real time and obtain recognition results after reasoning through a lightweight convolutional neural network model in the model training module.
  • the model training module in the system adopts the fusion rolling convolution algorithm to effectively optimize the embedded system's demand for the size and space of the image area.
  • the system uses an optimized camera driving algorithm to drive the camera during the image acquisition process, which effectively improves the speed of image reading and display. The system can greatly reduce the demand for hardware resources while ensuring the accuracy of object recognition and reasoning speed, and realize the recognition and classification of different types of objects.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

An embedded object cognitive system (100) based on image processing. The embedded object cognitive system (100) based on image processing comprises an image acquisition module (10), a model training module (20), a model terminal deployment module (30), and a terminal reasoning module (40). The system can collect images in real time and obtain the recognition result after reasoning by means of a lightweight convolutional neural network model in the model training module (20). The system can greatly reduce the demand for hardware resources while ensuring the accuracy of object recognition and the reasoning speed, and realizes the recognition and classification of different types of objects.

Description

一种基于图像处理的嵌入式物体认知系统An Embedded Object Recognition System Based on Image Processing 技术领域technical field
本发明涉及嵌入式人工智能的技术领域,特别是涉及一种基于图像处理的嵌入式物体认知系统。The invention relates to the technical field of embedded artificial intelligence, in particular to an embedded object recognition system based on image processing.
背景技术Background technique
嵌入式人工智能(Embedded Artificial Intelligence,EAI)是将嵌入式计算机技术、人工智能技术与每个应用场景下的实际需求进行深度融合的产物。嵌入式人工智能除具备人工智能的技术优点之外,兼具嵌入式技术优良的实时性、适用性、鲁棒性与稳定性特点。Embedded Artificial Intelligence (EAI) is the product of deep integration of embedded computer technology, artificial intelligence technology and the actual needs of each application scenario. In addition to the technical advantages of artificial intelligence, embedded artificial intelligence also has the excellent real-time performance, applicability, robustness and stability of embedded technology.
传统的嵌入式智能软硬件平台以云端服务器为中心,将终端采集到的原始数据传输到云,数据的存储、分析都是在云端完成,嵌入式终端仅仅实现了数据的采集以及对输出结果作出相应的操作,如此完成一次数据的循环。此类以云计算为核心的智能嵌入式软硬件平台存在着开销大、实时性差、数据隐私等问题,无法满足大部分的实际应用需求。随着技术的发展,终端的处理能力越来越强大,边缘计算、雾计算等旨在克服云核心的嵌入式智能软硬件平台的弊端的新兴技术被提出。但是此类方法仅仅是将网络模型的传播过程切割为终端与云端两部分,嵌入式终端没有具备完整的认知能力,完整的推理过程仍旧需要云端的计算。The traditional embedded intelligent software and hardware platform centers on the cloud server, and transmits the raw data collected by the terminal to the cloud. Data storage and analysis are all completed in the cloud, and the embedded terminal only realizes data collection and output results. The corresponding operation completes a cycle of data in this way. This kind of intelligent embedded software and hardware platform with cloud computing as the core has problems such as high overhead, poor real-time performance, and data privacy, and cannot meet most practical application needs. With the development of technology, the processing capability of the terminal is becoming more and more powerful, and emerging technologies such as edge computing and fog computing aiming to overcome the shortcomings of the embedded intelligent software and hardware platform at the core of the cloud have been proposed. However, this type of method only divides the propagation process of the network model into two parts: the terminal and the cloud. The embedded terminal does not have complete cognitive capabilities, and the complete reasoning process still requires cloud computing.
因此,针对上述技术问题,有必要提供一种基于图像处理的嵌入式物体认知系统。该系统实时采集图像,并在模型训练模块中通过轻量化的卷积神经网络模型推理后得到识别结果;该系统可以在保证识别物体的准确率以及推理速度的同时,大幅降低对硬件资源的需求,实现不同种类物体的识别与分类。Therefore, in view of the above technical problems, it is necessary to provide an embedded object recognition system based on image processing. The system collects images in real time, and obtains recognition results after reasoning through a lightweight convolutional neural network model in the model training module; the system can greatly reduce the demand for hardware resources while ensuring the accuracy of object recognition and reasoning speed , to realize the recognition and classification of different types of objects.
技术解决方案technical solution
有鉴于此,本发明实施例的目的在于提供一种基于图像处理的嵌入式物体认知系统,该系统实时采集图像并在模型训练模块中通过轻量化的卷积神经网络模型推理后得到识别结果;该系统可以在保证识别物体的准确率以及推理速度的同时,大幅降低对硬件资源的需求,实现不同种类物体的识别与分类。In view of this, the purpose of the embodiments of the present invention is to provide an embedded object recognition system based on image processing, which collects images in real time and obtains recognition results after reasoning through a lightweight convolutional neural network model in the model training module ; The system can greatly reduce the demand for hardware resources while ensuring the accuracy of object recognition and reasoning speed, and realize the recognition and classification of different types of objects.
为了实现上述目的,本发明实施例提供的技术方案如下:一种基于图像处理的嵌入式物体认知系统包括:图像采集模块,用于采集训练物体的图像特征;模型训练模块,与所述图像采集模块连接,以所述图像采集模块所获得的图像特征作为训练的素材,采用预设算法生成可在嵌入式工程框架下直接编译并使用的认知模型参数构件;模型终端部署模块,用于将所述模型训练模块所获得的认知模型参数构件部署在嵌入式终端;终端推理模块,根据所述图像采集模块采集的目标物体的图像并采用所述模型终端部署模块所提供的认知模型参数构件进行目标物体认知。In order to achieve the above object, the technical solution provided by the embodiment of the present invention is as follows: an embedded object recognition system based on image processing includes: an image acquisition module, used to collect image features of training objects; a model training module, connected with the image The acquisition module is connected, and the image features obtained by the image acquisition module are used as training materials, and a preset algorithm is used to generate a cognitive model parameter component that can be directly compiled and used under the embedded engineering framework; the model terminal deployment module is used for Deploy the cognitive model parameter components obtained by the model training module on the embedded terminal; the terminal reasoning module uses the cognitive model provided by the model terminal deployment module according to the image of the target object collected by the image collection module The parameter component performs target object recognition.
作为本发明的进一步改进,所述模型训练模块包括两种模式:PC模型训练模式和嵌入式终端实时推理训练模式。As a further improvement of the present invention, the model training module includes two modes: a PC model training mode and an embedded terminal real-time reasoning training mode.
作为本发明的进一步改进,所述PC模型训练模式包括步骤:嵌入式终端获取图像特征并将图像特征传输给PC端;PC端根据所述图像特征建立数据集并按照模型训练模块中的预设算法生成可在嵌入式工程框架下直接编译并使用的认知模型参数构件;将所述认知模型参数构件采取烧录的方式部署至嵌入式终端。As a further improvement of the present invention, the PC model training mode includes the steps: the embedded terminal acquires image features and transmits the image features to the PC; the PC builds a data set according to the image features and follows the presets in the model training module The algorithm generates a cognitive model parameter component that can be directly compiled and used under the embedded engineering framework; the cognitive model parameter component is deployed to the embedded terminal by burning.
作为本发明的进一步改进,所述嵌入式终端实时推理训练模式包括步骤:嵌入式终端获取图像特征;终端推理模块根据所述图像特征进行图像处理并按照模型训练模块中的预设算法生成可在嵌入式工程框架下直接编译并使用的认知模型参数构件;将所述认知模型参数构件存储至嵌入式终端。As a further improvement of the present invention, the real-time inference training mode of the embedded terminal includes the steps: the embedded terminal acquires image features; the terminal inference module performs image processing according to the image features and generates an image that can be used according to the preset algorithm in the model training module. A cognitive model parameter component directly compiled and used under the embedded engineering framework; the cognitive model parameter component is stored in an embedded terminal.
作为本发明的进一步改进,所述模型终端部署模块中的硬件配置根据嵌入式物体认知系统中参数性质不同而不同。As a further improvement of the present invention, the hardware configuration in the model terminal deployment module is different according to the nature of the parameters in the embedded object recognition system.
作为本发明的进一步改进,常量参数存储至FLASH内存中,变量参数存储至RAM内存中。As a further improvement of the present invention, constant parameters are stored in FLASH memory, and variable parameters are stored in RAM memory.
作为本发明的进一步改进,所述常量参数包括滤波器蚕食、偏置BIAS参数、传播结构函数;所述变量参数包括图像特征、输入变量和输出变量。As a further improvement of the present invention, the constant parameters include filter erosion, bias BIAS parameters, and propagation structure functions; the variable parameters include image features, input variables and output variables.
作为本发明的进一步改进,所述嵌入式物体认知系统将预设的FLASH指定扇区的连续地址空间的擦除与读写替代RAM中动态数组的读写。As a further improvement of the present invention, the embedded object recognition system replaces the reading and writing of the dynamic array in the RAM with the erasing and reading and writing of the continuous address space of the preset FLASH designated sector.
作为本发明的进一步改进,所述模型训练模块中的预设算法中所用的参数格式为C语言中的多维数组形式。As a further improvement of the present invention, the parameter format used in the preset algorithm in the model training module is a multidimensional array form in C language.
作为本发明的进一步改进,所述图像采集模块中采用优化摄像头驱动算法驱动摄像头而采集训练物体的图像特征。As a further improvement of the present invention, an optimized camera driving algorithm is used in the image acquisition module to drive the camera to acquire image features of the training object.
作为本发明的进一步改进,所述图像采集模块中包括图像处理单元,所述图像处理单元采用阈值过滤法对所述图像采集模块所获得的训练物体的原始图像进行处理而获得训练物体的图像特征。As a further improvement of the present invention, the image acquisition module includes an image processing unit, and the image processing unit uses a threshold filtering method to process the original image of the training object obtained by the image acquisition module to obtain the image features of the training object .
作为本发明的进一步改进,所述模型训练模块采用融合滚动卷积算法生成可在嵌入式工程框架下直接编译并使用的认知模型参数构件。As a further improvement of the present invention, the model training module uses a fusion rolling convolution algorithm to generate a cognitive model parameter component that can be directly compiled and used under the embedded engineering framework.
有益效果Beneficial effect
本发明具有以下优点:The present invention has the following advantages:
本发明实施例的目的在于提供一种基于图像处理的嵌入式物体认知系统,该系统实时采集图像并在模型训练模块中通过轻量化的卷积神经网络模型推理后得到识别结果。进一步地,该系统中的模型训练模块采用融合滚动卷积算法,有效地优化嵌入式系统对图像区域大小空间的需求。进一步地,该系统在图像采集过程中采用优化摄像头驱动算法驱动摄像头,有效地提升了图像读取和显示的速度。该系统可以在保证识别物体的准确率以及推理速度的同时,大幅降低对硬件资源的需求,实现不同种类物体的识别与分类。The purpose of the embodiments of the present invention is to provide an embedded object recognition system based on image processing, which collects images in real time and obtains recognition results after reasoning through a lightweight convolutional neural network model in the model training module. Furthermore, the model training module in the system adopts the fusion rolling convolution algorithm to effectively optimize the embedded system's demand for the size and space of the image area. Furthermore, the system uses an optimized camera driving algorithm to drive the camera during the image acquisition process, which effectively improves the speed of image reading and display. The system can greatly reduce the demand for hardware resources while ensuring the accuracy of object recognition and reasoning speed, and realize the recognition and classification of different types of objects.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments described in the present invention. Those skilled in the art can also obtain other drawings based on these drawings without creative work.
图1为本发明实施例提供的嵌入式物体认知系统的模块示意图;FIG. 1 is a schematic diagram of modules of an embedded object recognition system provided by an embodiment of the present invention;
图2为图1所示实施例的嵌入式物体认知系统的数据流模型示意图;Fig. 2 is a schematic diagram of the data flow model of the embedded object recognition system of the embodiment shown in Fig. 1;
图3(a)为图1所示实施例中PC模型训练模式的流程示意图;Fig. 3 (a) is the schematic flow chart of PC model training mode in the embodiment shown in Fig. 1;
图3(b)为图1所示实施例中嵌入式终端实时推理训练模式的流程示意图;Fig. 3 (b) is the schematic flow chart of embedded terminal real-time inference training mode in the embodiment shown in Fig. 1;
图4为图1所示实施例中模型终端部署模块中的硬件配置分配框架图;Fig. 4 is a frame diagram of hardware configuration allocation in the model terminal deployment module in the embodiment shown in Fig. 1;
图5为本发明实施例中优化摄像头驱动算法的流程图;Fig. 5 is the flowchart of optimizing camera driving algorithm in the embodiment of the present invention;
图6(a)、6(b)、6(c)为本发明实施例中融合滚动卷积算法示意图。6(a), 6(b), and 6(c) are schematic diagrams of the fusion rolling convolution algorithm in the embodiment of the present invention.
附图标记说明:Explanation of reference signs:
100、嵌入式物体认知系统   10、图像采集模块   20、模型训练模块30、模型终端部署模块  40、终端推理模块。100. Embedded object recognition system 10. Image acquisition module 20. Model training module 30. Model terminal deployment module 40. Terminal reasoning module.
本发明的实施方式Embodiments of the present invention
为了使本技术领域的人员更好地理解本发明中的技术方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本发明保护的范围。In order to enable those skilled in the art to better understand the technical solutions in the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described The embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts shall fall within the protection scope of the present invention.
如图1所示,本发明实施例提供的一种基于图像处理的嵌入式物体认知系统的模块示意图。在该实施例中,该基于图像处理的嵌入式物体认知系统100包括图像采集模块10、模型训练模块20、模型终端部署模块30和终端推理模块40。图像采集模块10用于采集训练物体的图像特征。模型训练模块20与图像采集模块10连接,以图像采集模块10所获得的图像特征作为训练的素材并采用预设算法生成可在嵌入式工程框架下直接编译并使用的认知模型参数构件。模型终端部署模块30用于将模型训练模块20所获得的认知模型参数构件部署在嵌入式终端。终端推理模块40根据图像采集模块10采集的目标物体的图像并采用模型终端部署模块30提供的认知模型参数构件进行目标物体认知。As shown in FIG. 1 , a block diagram of an embedded object recognition system based on image processing provided by an embodiment of the present invention. In this embodiment, the embedded object recognition system 100 based on image processing includes an image acquisition module 10 , a model training module 20 , a model terminal deployment module 30 and a terminal reasoning module 40 . The image acquisition module 10 is used to acquire image features of the training object. The model training module 20 is connected with the image acquisition module 10, uses the image features obtained by the image acquisition module 10 as the training material, and adopts a preset algorithm to generate cognitive model parameter components that can be directly compiled and used under the embedded engineering framework. The model terminal deployment module 30 is used to deploy the cognitive model parameter components obtained by the model training module 20 on the embedded terminal. The terminal inference module 40 recognizes the target object according to the image of the target object collected by the image collection module 10 and uses the cognitive model parameter components provided by the model terminal deployment module 30 .
继续参考图2,图像采集模块10首先采集对应物体的特征数据,作为训练认知模型的素材;在采集足够数量的图像数据之后,模型训练模块20训练模型,最终通过相关算法生成可以在通用嵌入式工程框架下直接编译并使用的认知模型参数构件;模型终端部署模块30将模型训练模块20所获得的认知模型参数构件部署在嵌入式终端,即重新编译并烧录之后,新的认知模型便部署在了终端上,此时终端通过终端推理模块40可以对目标物体的进行认知而获得认知结果。Continuing to refer to FIG. 2 , the image acquisition module 10 first collects the feature data of the corresponding object as the material for training the cognitive model; after collecting a sufficient amount of image data, the model training module 20 trains the model, and finally generates an image that can be embedded in a general purpose through a related algorithm. Cognitive model parameter components directly compiled and used under the engineering framework; the model terminal deployment module 30 deploys the cognitive model parameter components obtained by the model training module 20 on the embedded terminal, that is, after recompiling and burning, the new cognitive model The cognitive model is deployed on the terminal, and at this time, the terminal can recognize the target object through the terminal reasoning module 40 to obtain a cognitive result.
在整体应用系统中,根据数据的方向传递根据系统执行的功能不同,模型训练模块20可以分为两种模式:PC模型训练模式和嵌入式终端实时推理训练模式。In the overall application system, the model training module 20 can be divided into two modes: PC model training mode and embedded terminal real-time reasoning training mode according to the direction of data transmission and the different functions performed by the system.
其中,PC模型训练模式的流程如图3(a)所示。PC模型训练模式包括步骤:嵌入式终端获取图像特征并将图像特征传输给PC端,即图像采集模块10采集对应物体的图像特征数据并且将图像特征传输给PC端;PC端根据所述图像特征建立数据集并按照模型训练模块中的预设算法生成可在嵌入式工程框架下直接编译并使用的认知模型参数构件;将所述认知模型参数构件采取烧录的方式部署至嵌入式终端。Among them, the flow of the PC model training mode is shown in Figure 3(a). The PC model training mode includes steps: the embedded terminal obtains image features and transmits the image features to the PC end, that is, the image acquisition module 10 collects image feature data of the corresponding object and transmits the image features to the PC end; Create a data set and generate a cognitive model parameter component that can be directly compiled and used under the embedded engineering framework according to the preset algorithm in the model training module; deploy the cognitive model parameter component to the embedded terminal by burning .
其中,嵌入式终端实时推理训练模式的流程如图3(b)所示。嵌入式终端实时推理训练模式包括步骤:嵌入式终端获取图像特征, 即图像采集模块10采集对应物体的图像特征数据;终端推理模块根据所述图像特征进行图像处理并按照模型训练模块中的预设算法生成可在嵌入式工程框架下直接编译并使用的认知模型参数构件;将所述认知模型参数构件存储至嵌入式终端。Among them, the flow of the real-time inference training mode of the embedded terminal is shown in Fig. 3(b). The embedded terminal real-time inference training mode includes steps: the embedded terminal obtains image features, that is, the image acquisition module 10 collects image feature data of corresponding objects; the terminal inference module performs image processing according to the image features and follows the preset in the model training module The algorithm generates cognitive model parameter components that can be directly compiled and used under the embedded engineering framework; the cognitive model parameter components are stored in the embedded terminal.
嵌入式系统中的MCU的物理资源存储器被分为易失性存储器与非易失性存储器两种。随机存取存储器(Random Access Memory,RAM)与闪存(Flash EEPROM Memory,FLASH)分别为二者的代表器件。RAM提供给处理器运行过程中所需要的局部变量等临时数据,FLASH存储程序运行的只读数据以及程序本身。一般情况下RAM的空间大小是远小于FLASH的空间大小的,以STM32L431RC芯片为例,RAM的大小为64KB,而FLASH的大小为256KB。The physical resource memory of the MCU in the embedded system is divided into two types: volatile memory and non-volatile memory. Random Access Memory (Random Access Memory, RAM) and flash memory (Flash EEPROM Memory, FLASH) are the representative devices of the two respectively. RAM provides temporary data such as local variables required by the processor during operation, and FLASH stores read-only data for program operation and the program itself. Under normal circumstances, the space size of RAM is much smaller than that of FLASH. Taking the STM32L431RC chip as an example, the size of RAM is 64KB, while the size of FLASH is 256KB.
主控芯片RAM的大小决定了系统处理数据的性能,而在网络模型的推理过程中,每一层子网络在推理过程中所产生的临时数据不对后续产生影响,前馈神经网络每层之间只通过输出特征矩阵实现网络模型的逐层传播,所以单层网络所使用到的所有参数总数据量不应大于主控芯片的RAM空间大小。网络模型在嵌入式终端中的资源消耗也分为部署后的参数模型本身以及运行时模型传播所产生的额外数据消耗。在网络模型从输入到输出的前向传播过程中,一共涉及到的数据资源为输入图像、网络模型参数、运算过程中产生的临时空间、每一层传递过程中的输入输出以及最终模型的输出共五种数据类别。The size of the main control chip RAM determines the performance of the system to process data, and in the reasoning process of the network model, the temporary data generated by each sub-network during the reasoning process will not affect the follow-up, and the feed-forward neural network between each layer The layer-by-layer propagation of the network model is realized only by outputting the feature matrix, so the total data volume of all parameters used by the single-layer network should not be greater than the RAM space size of the main control chip. The resource consumption of the network model in the embedded terminal is also divided into the deployed parameter model itself and the additional data consumption generated by the runtime model propagation. In the forward propagation process of the network model from input to output, the data resources involved are the input image, network model parameters, temporary space generated during the operation process, input and output during each layer transfer process, and the output of the final model. There are five data categories.
在该实施例中,基于图像处理的嵌入式物体认知系统100根据嵌入式终端芯片的空间资源特性设计出合理的模型资源配置架构。模型终端部署模块30中的硬件配置根据嵌入式物体认知系统中参数性质不同而不同。具体地,常量参数存储至FLASH内存中,变量参数存储至RAM内存中。将不同特性的模型参数与嵌入式终端中的不同物理存储资源进行适配,合理化终端运行的资源配置方法,减少不必要的资源消耗。In this embodiment, the image processing-based embedded object recognition system 100 designs a reasonable model resource configuration framework according to the spatial resource characteristics of the embedded terminal chip. The hardware configuration in the model terminal deployment module 30 is different according to the nature of the parameters in the embedded object recognition system. Specifically, the constant parameters are stored in the FLASH memory, and the variable parameters are stored in the RAM memory. Adapt the model parameters of different characteristics to different physical storage resources in the embedded terminal, rationalize the resource allocation method for terminal operation, and reduce unnecessary resource consumption.
如图4所示,在该实施例中,所述常量参数包括滤波器蚕食、偏置BIAS参数、传播结构函数;所述变量参数包括图像特征、输入变量和输出变量。对于传播过程中不发生改变的网络模型参数,例如卷积核参数、偏置bias等,由于网络模型参数通常占据了最大存储空间,同时也是网络模型推理中无法更新迭代的,所以系统以常量数组的形式存放在FLASH中,由于FLASH相对于RAM来说资源空间更大,更加适合存储此类数据。对于每层网络传递过程中的输入输出由于其结构固定,在嵌入式终端中都是以固定大小的多维数组存在,只是在传播过程中每个数组成员的值会发生改变,所以系统以动态数组的方式存储在RAM中,更加有利于系统的运算速度。由于输入图像以及输出数组是与其他软件层构件所共同处理的,所以系统以全局数组的方式存储在RAM中。As shown in FIG. 4 , in this embodiment, the constant parameters include filter erosion, bias BIAS parameters, and propagation structure functions; the variable parameters include image features, input variables, and output variables. For network model parameters that do not change during propagation, such as convolution kernel parameters, bias, etc., since network model parameters usually occupy the largest storage space and cannot be updated and iterated during network model reasoning, the system uses a constant array It is stored in FLASH in the form of FLASH. Compared with RAM, FLASH has a larger resource space and is more suitable for storing such data. Due to its fixed structure, the input and output of each layer of network transmission process exists as a fixed-size multi-dimensional array in the embedded terminal, but the value of each array member will change during the transmission process, so the system uses a dynamic array The method is stored in RAM, which is more conducive to the computing speed of the system. Since the input image and the output array are jointly processed with other software layer components, the system stores them in RAM as global arrays.
在嵌入式系统中,RAM的大小相对较小,但是RAM负责着主要的数据计算。同时在网络模型推理的过程中,每一层所占用的资源是不同的,每层不同的网络结构所占用的空间资源的差异较大。以VGG16[54]网络为例,在第一层卷积层所用到的网络参数以及输入输出占用了将近100KB的空间,而最终的全连接神经网络层仅占用了不到1KB的运算空间。如何将占用资源最大的网络层切割、通过其他方式减小该层网络的空间占用是目前急需解决的问题之一。In embedded systems, the size of RAM is relatively small, but RAM is responsible for the main data calculations. At the same time, in the process of network model reasoning, the resources occupied by each layer are different, and the space resources occupied by different network structures of each layer are quite different. Taking the VGG16 [54] network as an example, the network parameters and input and output used in the first convolutional layer occupy nearly 100KB of space, while the final fully connected neural network layer only occupies less than 1KB of computing space. How to cut the network layer that takes up the most resources and reduce the space occupied by this layer network by other means is one of the problems that need to be solved urgently.
优选地,嵌入式物体认知系统100将预设的FLASH指定扇区的连续地址空间的擦除与读写替代RAM中动态数组的读写。以此,降低嵌入式物体认知系统100对RAM资源的消耗,提升了主控芯片对网络模型大小的适用性,以较小的时间损耗换取模型推理精度的提升。擦除替代的具体算法,如表1所示:Preferably, the embedded object recognition system 100 replaces the reading and writing of the dynamic array in the RAM with the erasing and reading and writing of the continuous address space of the preset FLASH designated sector. In this way, the consumption of RAM resources by the embedded object recognition system 100 is reduced, the applicability of the main control chip to the size of the network model is improved, and the accuracy of model reasoning is improved with less time loss. The specific algorithm of erasing and replacing is shown in Table 1:
Figure 566998dest_path_image001
Figure 566998dest_path_image001
.
一般的低资源嵌入式终端由于编译框架的不同,无法直接支持Keras等神经网络算法库。在该实施例中,嵌入式物体认知系统100首先读取在网络模型拟合平台训练生成得到的H5格式文件的参数再根据设计的算法生成可以在嵌入式工程框架直接编译的模型参数的C语言版本构件。General low-resource embedded terminals cannot directly support neural network algorithm libraries such as Keras due to different compilation frameworks. In this embodiment, the embedded object recognition system 100 first reads the parameters of the H5 format file generated by training on the network model fitting platform, and then generates a C of the model parameters that can be directly compiled in the embedded engineering framework according to the designed algorithm. Language version widget.
通过对H5文件的数据分析,发现H5文件中的数据格式为树状结构,同时分为权重weight与偏置bias两类数据。其中l层网络第n维的h行w列的卷积核元素a具体数据的表现形式如式1所示:Through the data analysis of the H5 file, it is found that the data format in the H5 file is a tree structure, which is divided into two types of data: weight and bias. Among them, the expression form of the specific data of the convolution kernel element a in the h row w column of the nth dimension of the l-layer network is shown in formula 1:
Figure 687400dest_path_image002
Figure 687400dest_path_image002
.
第l层网络的第k个偏置项b的数据具体表现形式如式2所示:The specific expression of the data of the kth bias item b of the l-layer network is shown in Equation 2:
Figure 201558dest_path_image003
Figure 201558dest_path_image003
.
据此可以直接定位数据的所在位置,为本发明实施例中预设算法的设计提供了理论基础。嵌入式物体认知系统100设计模型推理参数格式转换算法将模型推理所用、的参数在存储形式上转换为C语言中的多维数组形式,将算法模型转为嵌入式工程构件。即模型训练模块20中的预设算法中所用的参数格式为C语言中的多维数组形式。Accordingly, the location of the data can be directly located, which provides a theoretical basis for the design of the preset algorithm in the embodiment of the present invention. The embedded object recognition system 100 designs a model reasoning parameter format conversion algorithm to convert the parameters used in model reasoning into a multidimensional array in C language in storage form, and convert the algorithm model into an embedded engineering component. That is, the parameter format used in the preset algorithm in the model training module 20 is a multidimensional array form in C language.
优选地发明实施例首先将模型参数转变为C语言中的多维数组形式,再抽取嵌入式工程中的共性,添加附属的头文件构件,以及文件开头所需要的变量声明等要素,转为通用的嵌入式工程构件形式,可以直接部署在嵌入式终端,为终端推理提供了模型参数数据支持。Preferably, the embodiment of the invention first converts the model parameters into a multidimensional array form in C language, then extracts the commonality in the embedded project, adds the attached header file components, and elements such as variable declarations required at the beginning of the file, and converts them into general The form of embedded engineering components can be directly deployed on embedded terminals, providing model parameter data support for terminal reasoning.
图像数据的采集过程相对于传统的环境传感器采集过程来说更加复杂、繁琐处理的数据量也更加庞大。针对这些问题,嵌入式物体认知系统100设计图像采集加速算法与特征提取算法,确保终端能够快速地采集到高质量的图像数据。在优选地实施例中,图像采集模块10中采用优化摄像头驱动算法驱动摄像头而采集训练物体的图像特征。Compared with the traditional environmental sensor acquisition process, the image data acquisition process is more complicated, and the amount of tediously processed data is also larger. To address these problems, the embedded object recognition system 100 designs an image acquisition acceleration algorithm and a feature extraction algorithm to ensure that the terminal can quickly acquire high-quality image data. In a preferred embodiment, an optimized camera driving algorithm is used in the image acquisition module 10 to drive the camera to acquire image features of the training object.
以图像格式QVGA为例,摄像头传送给缓存芯片的图像尺寸为80×60个像素,需要读取4800次像素点信息数据,所以对于每一次像素有效性的判断方法和摄像头与缓存芯片通信流程经过放大之后所带来的速度提升是相当可观的。如图5所示,本发明实施例中优化摄像头驱动算法的流程图。嵌入式物体认知系统100获取一个像素点共需要输出2次时钟信号,即4次GPIO口高/低电平信号,读取完整一张QVGA图像需要输出19200次时钟信号。于是系统首先针对输出时钟信号这一操作进行加速优化操作。Taking the image format QVGA as an example, the size of the image transmitted by the camera to the cache chip is 80×60 pixels, and the pixel information data needs to be read 4800 times, so the method of judging the validity of each pixel and the communication process between the camera and the cache chip The speed increase after zooming in is considerable. As shown in FIG. 5 , the flow chart of optimizing the camera driving algorithm in the embodiment of the present invention. The embedded object recognition system 100 needs to output 2 clock signals in total to obtain a pixel point, that is, 4 GPIO port high/low level signals, and 19,200 clock signals need to be output to read a complete QVGA image. Therefore, the system first performs an acceleration optimization operation for the operation of outputting the clock signal.
传统的嵌入式的GPIO操作通常是调用封装的GPIO接口函数,而接口函数往往为了保证程序的健壮性,都在函数中放置了参数校验判断操作。例如意法半导体公司的HAL底层封装库,一个GPIO的上拉操作便需要端口正确性验证、GPIO寄存器验证等一系列操作。这些操作避免不规范的参数输入,保证了函数的复用性与健壮性。但是,对于图像采集的单一GPIO操作是不适合的,也增加了系统运行的负担。嵌入式物体认知系统100对摄像头模块的通信GPIO端口对应的输入输出寄存器进行置位操作,避免了参数的传递以及判断,提高了采集图像的效率与速度。The traditional embedded GPIO operation usually calls the packaged GPIO interface function, and the interface function often places parameter verification and judgment operations in the function to ensure the robustness of the program. For example, STMicroelectronics' HAL bottom packaging library, a GPIO pull-up operation requires a series of operations such as port correctness verification and GPIO register verification. These operations avoid irregular parameter input and ensure the reusability and robustness of the function. However, it is not suitable for single GPIO operation of image acquisition, and it also increases the burden of system operation. The embedded object recognition system 100 performs setting operations on the input and output registers corresponding to the communication GPIO port of the camera module, avoiding parameter transmission and judgment, and improving the efficiency and speed of image acquisition.
对图像采集模块10中的摄像头单元采集到像素大小的图像进行裁剪与压缩。便可以在保证图像特征数据尽可能不丢失的情况下,减少图像数据占用空间。与传统的图像处理算法对二维数组处理不同的是,由于图像数据采集模块传输时逐像素点传输,所以在嵌入式终端采集图像的过程中,是针对一维的图像数组进行相关操作。假设采用的图像处理算法的图像输入维度是H×H像素大小,而算法则将QVGA大小的图像数据压缩为H×H格式大小。在该实施例中,压缩算法首先将采集到的图像裁剪为60×60大小,再判断输入像素点的序数n是否为目标像素点:若是,则存储在对应的目标二维数组位置Ax,y,否则丢弃。具体的压缩算法如表2所示:Crop and compress the pixel-sized image captured by the camera unit in the image capture module 10 . In this way, the space occupied by the image data can be reduced while ensuring that the image feature data is not lost as much as possible. Different from the traditional image processing algorithm for two-dimensional array processing, because the image data acquisition module transmits pixel by pixel, so in the process of image acquisition by the embedded terminal, related operations are performed on the one-dimensional image array. Assume that the image input dimension of the image processing algorithm adopted is H×H pixel size, and the algorithm compresses the image data of QVGA size into H×H format size. In this embodiment, the compression algorithm first cuts the collected image to a size of 60×60, and then judges whether the ordinal number n of the input pixel point is the target pixel point: if so, store it in the corresponding target two-dimensional array position Ax,y , otherwise discard. The specific compression algorithm is shown in Table 2:
Figure 65609dest_path_image004
Figure 65609dest_path_image004
.
进一步地,本发明实施例也进行LCD显示加速。将单个像素点数据显示在LCD上的过程,也是芯片与LCD通过串行外设接口(Serial Peripheral Interface,SPI)进行通信的过程。将逐点显示的过程变为首先设定LCD显示区域,再调用SPI直接将像素数据发送给LCD,免去每次发送的设定坐标过程。为了最大限度地利用MCU资源,LCD显示采取的方法是逐点显示,即在接收完成一个像素点数据之后便直接在LCD上显示,只占用并重复使用单个像素点的资源便在LCD上显示一张完整的图像。同时,传统的LCD显示像素点函数是对显示的每一个像素点进行定位再显示,即先确定在LCD上显示的相对位置,再显示对应的一个点。但是这对于指定区域重复多次的图像显示情况来说,是低效率的。图像显示的每一个像素点与前后显示的像素点都有一定的位置连续关系,并不需要对每个像素点进行定位才能够显示完整的图像。Further, the embodiment of the present invention also performs LCD display acceleration. The process of displaying single pixel data on the LCD is also the process of the chip and the LCD through the serial peripheral interface (Serial Peripheral Interface). Peripheral Interface, SPI) communication process. Change the process of point-by-point display to first set the LCD display area, and then call the SPI to directly send the pixel data to the LCD, eliminating the need to set the coordinate process for each transmission. In order to maximize the use of MCU resources, the LCD display adopts a point-by-point display method, that is, after receiving a pixel data, it will be displayed directly on the LCD, and only a single pixel resource is occupied and reused to display a pixel on the LCD. a complete image. At the same time, the traditional LCD display pixel point function is to position and display each pixel point, that is, first determine the relative position displayed on the LCD, and then display a corresponding point. But this is inefficient for the image display case where the designated area is repeated many times. Each pixel displayed in the image has a certain positional continuity relationship with the pixels displayed before and after, and it is not necessary to locate each pixel to display a complete image.
优选地,图像采集模块10还包括图像处理单元(图中未示出)。图像处理单元采用阈值过滤法对所述图像采集模块所获得的训练物体的原始图像进行处理而获得训练物体的图像特征,从而有效地滤掉图像背景而尽可能地保留目标物体本身的图像特征。Preferably, the image acquisition module 10 further includes an image processing unit (not shown in the figure). The image processing unit uses a threshold filtering method to process the original image of the training object obtained by the image acquisition module to obtain the image features of the training object, thereby effectively filtering out the image background and retaining the image features of the target object itself as much as possible.
在该实施例中,阈值过滤法具体包括边缘均值法、双峰均值法与双峰谷底法。边缘均值法将所有的边缘像素点进行均值运算,得到的均值作为阈值对图像进行滤波操作。双峰均值法的算法是基于迭代更新的思想。算法首先将输入的图像中每个灰度值出现的次数处理成直方图的形式,再对该直方图进行双峰判断,即是否有且仅有两个局部最大值出现,如果是则取这两个局部最大值的平均数作为滤波阈值,否则对每一个数据点进行跨度为n的平滑,同时给定平滑次数N,如果超过上限N则判断该图无法滤波。双峰谷底法在获得两个最大频率的灰度值后,并没有取这两个中间灰度值,而是取这两个顶峰之间的最低谷,也就是二者灰度数值之间出现频率最低的一个灰度值作为阈值对图像滤波。In this embodiment, the threshold filtering method specifically includes an edge mean method, a bimodal mean method, and a bimodal valley method. The edge mean method performs the mean value operation on all the edge pixels, and the obtained mean value is used as a threshold to filter the image. The algorithm of bimodal mean method is based on the idea of iterative update. The algorithm first processes the number of occurrences of each gray value in the input image into the form of a histogram, and then performs a bimodal judgment on the histogram, that is, whether there are and only two local maxima appear, and if so, take this The average of the two local maxima is used as the filtering threshold. Otherwise, each data point is smoothed with a span of n, and the number of smoothing is given at the same time. If it exceeds the upper limit N, it is judged that the graph cannot be filtered. After obtaining the gray values of the two maximum frequencies, the bimodal valley method does not take the two intermediate gray values, but takes the lowest valley between the two peaks, that is, the gray value that appears between the two gray values. A gray value with the lowest frequency is used as a threshold to filter the image.
通常的网络模型的处理方式往往是先获取整张图像数据,再将图像数据输入网络,这在高资源的情况下是合理的。而在低资源情况下,输入图像如果较大,例如网络模型通常使用的224×224像素大小的图像,单一张图像便占据了49KB大小的空间。在此消彼长的系统空间中,这也影响了模型参数数量,最终降低系统的识别精度。如何最大程度地减小输入图像所占用的空间是提升系统的空间利用效率的关键。针对这一问题,嵌入式物体认知系统100中的模型训练模块20采用动态滚动卷积算法,生成可在嵌入式工程框架下直接编译并使用的认知模型参数构件。针对嵌入式终端对图像获取过程的可控性以及可分解性,系统将图像的获取过程与网络的第一层卷积层融合在一起,设计出动态滚动卷积算法。The usual network model processing method is often to obtain the entire image data first, and then input the image data into the network, which is reasonable in the case of high resources. In the case of low resources, if the input image is large, such as a 224×224 pixel image usually used by the network model, a single image occupies a space of 49KB. In this ebb and flow system space, this also affects the number of model parameters, which ultimately reduces the recognition accuracy of the system. How to minimize the space occupied by the input image is the key to improving the space utilization efficiency of the system. To solve this problem, the model training module 20 in the embedded object recognition system 100 adopts a dynamic rolling convolution algorithm to generate cognitive model parameter components that can be directly compiled and used under the embedded engineering framework. In view of the controllability and decomposability of the embedded terminal to the image acquisition process, the system integrates the image acquisition process with the first convolutional layer of the network, and designs a dynamic rolling convolution algorithm.
动态滚动卷积算法根据嵌入式终端摄像头图像获取过程的可分解与可控制性,将融合滚动卷积算法分为以下几个步骤:The dynamic rolling convolution algorithm divides the fusion rolling convolution algorithm into the following steps according to the decomposability and controllability of the image acquisition process of the embedded terminal camera:
(1)获取图像前k+H行像素点数据,存放在对应二维数组G[H+1][S]中。k∈[1,T-H-1]。其中S的值为采集数据的单行大小,并且获取完毕之后应通过相应控制接口暂停接收图像数据。(1) Obtain the pixel point data of the first k+H rows of the image and store them in the corresponding two-dimensional array G[H+1][S]. k∈[1,T-H-1]. Wherein, the value of S is the size of a single row of collected data, and after the acquisition is completed, the receiving of image data should be suspended through the corresponding control interface.
(2)用卷积核A[H][H]与G的第k行至第k+H-1行数据进行卷积运算,并将得到的特征图层数组按照次序存放在特征图层数组的第k行,此传统方法一致,如图6(a),图6(b)所示。(2) Use the convolution kernel A[H][H] to perform convolution operation with the data from the kth row to the k+H-1th row of G, and store the obtained feature layer array in the feature layer in order For row k of the array, this traditional method is consistent, as shown in Figure 6(a) and Figure 6(b).
(3)将二维数组G的第k+1至第k+H行中的成员分别与第k行至第k+H的成员进行交换,为了节省内存空间,这里只额外定义一个变量用于交换,对相邻的第k行与k+1行中的元素,按照列序依次进行交换,直到所有元素交换完毕,并丢弃原有的第k行数据具体交换方法本文采取的是如图6(c)所示。(3) Exchange the members in the k+1th to k+Hth rows of the two-dimensional array G with the members in the kth row to the k+Hth row respectively. In order to save memory space, only one additional variable is defined here for Exchange, for the elements in the adjacent row k and row k+1, exchange them in sequence according to the column order until all the elements are exchanged, and discard the original data of the kth row. The specific exchange method adopted in this paper is as shown in Figure 6 (c) shown.
(4)打开缓存芯片读使能,继续读取第k+H+1行的图像数据,并且存放在G的第k+H行中。(4) Turn on the read enable of the cache chip, continue to read the image data of row k+H+1, and store it in row k+H of G.
(5)此时将k值增1,依次重复步骤(1)(2)(3)(4),直到卷积完成,得到完整的输出数组。(5) At this time, increase the value of k by 1, and repeat steps (1) (2) (3) (4) in sequence until the convolution is completed and a complete output array is obtained.
对于空间资源消耗方面,本发明实施例的融合卷积算法所需要的输入图像存储空间变为了S×(H+1),而传统卷积方法所占用的空间S×T。In terms of space resource consumption, the input image storage space required by the fusion convolution algorithm of the embodiment of the present invention becomes S×(H+1), while the space occupied by the traditional convolution method is S×T.
本发明实施例的融合滚动卷积算法的具体代码过程,如表3所示:The specific code process of the fusion rolling convolution algorithm of the embodiment of the present invention is shown in Table 3:
Figure 614402dest_path_image005
Figure 614402dest_path_image005
.
本发明实施例所提供的基于图像处理的嵌入式物体认知系统,能够实时采集图像并在模型训练模块中通过轻量化的卷积神经网络模型推理后得到识别结果。进一步地,该系统中的模型训练模块采用融合滚动卷积算法,有效地优化嵌入式系统对图像区域大小空间的需求。进一步地,该系统在图像采集过程中采用优化摄像头驱动算法驱动摄像头,有效地提升了图像读取和显示的速度。该系统可以在保证识别物体的准确率以及推理速度的同时,大幅降低对硬件资源的需求,实现不同种类物体的识别与分类。The embedded object recognition system based on image processing provided by the embodiments of the present invention can collect images in real time and obtain recognition results after reasoning through a lightweight convolutional neural network model in the model training module. Furthermore, the model training module in the system adopts the fusion rolling convolution algorithm to effectively optimize the embedded system's demand for the size and space of the image area. Furthermore, the system uses an optimized camera driving algorithm to drive the camera during the image acquisition process, which effectively improves the speed of image reading and display. The system can greatly reduce the demand for hardware resources while ensuring the accuracy of object recognition and reasoning speed, and realize the recognition and classification of different types of objects.
 对于本领域技术人员而言,显然本发明不限于上述示范性实施例的细节,而且在不背离本发明的精神或基本特征的情况下,能够以其他的具体形式实现本发明。因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本发明的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化囊括在本发明内。不应将权利要求中的任何附图标记视为限制所涉及的权利要求。It will be apparent to those skilled in the art that the present invention is not limited to the details of the exemplary embodiments described above, and that the present invention can be implemented in other specific forms without departing from the spirit or essential characteristics of the present invention. Accordingly, the embodiments should be regarded in all points of view as exemplary and not restrictive, the scope of the invention being defined by the appended claims rather than the foregoing description, and it is therefore intended that the scope of the invention be defined by the appended claims rather than by the foregoing description. All changes within the meaning and range of equivalents of the elements are embraced in the present invention. Any reference sign in a claim should not be construed as limiting the claim concerned.
此外,应当理解,虽然本说明书按照实施方式加以描述,但并非每个实施方式仅包含一个独立的技术方案,说明书的这种叙述方式仅仅是为清楚起见,本领域技术人员应当将说明书作为一个整体,各实施例中的技术方案也可以经适当组合,形成本领域技术人员可以理解的其他实施方式。In addition, it should be understood that although this specification is described according to implementation modes, not each implementation mode only includes an independent technical solution, and this description in the specification is only for clarity, and those skilled in the art should take the specification as a whole , the technical solutions in the various embodiments can also be properly combined to form other implementations that can be understood by those skilled in the art.

Claims (10)

  1. 一种基于图像处理的嵌入式物体认知系统,其特征在于,包括:图像采集模块,用于采集训练物体的图像特征;其中,所述图像采集模块中采用优化摄像头驱动算法驱动摄像头而采集训练物体的图像特征;模型训练模块,与所述图像采集模块连接,以所述图像采集模块所获得的图像特征作为训练的素材,采用融合滚动卷积算法生成可在嵌入式工程框架下直接编译并使用的认知模型参数构件;模型终端部署模块,用于将所述模型训练模块所获得的认知模型参数构件部署在嵌入式终端;终端推理模块,根据所述图像采集模块采集的目标物体的图像并采用所述模型终端部署模块所提供的认知模型参数构件进行目标物体认知。An embedded object recognition system based on image processing, characterized in that it includes: an image acquisition module for collecting image features of training objects; wherein, the image acquisition module uses an optimized camera driving algorithm to drive the camera to collect training Image features of the object; the model training module is connected with the image acquisition module, and the image features obtained by the image acquisition module are used as training materials, and the fusion rolling convolution algorithm is used to generate directly compiled and compiled under the embedded engineering framework. Cognitive model parameter component used; model terminal deployment module, used to deploy the cognitive model parameter component obtained by the model training module on the embedded terminal; terminal reasoning module, based on the target object collected by the image collection module image and use the cognitive model parameter component provided by the model terminal deployment module to recognize the target object.
  2. 根据权利要求1所述的一种基于图像处理的嵌入式物体认知系统,其特征在于,所述模型训练模块包括两种模式:PC模型训练模式和嵌入式终端实时推理训练模式。An embedded object recognition system based on image processing according to claim 1, wherein the model training module includes two modes: a PC model training mode and an embedded terminal real-time reasoning training mode.
  3. 根据权利要求2所述的一种基于图像处理的嵌入式物体认知系统,其特征在于,所述PC模型训练模式包括步骤:嵌入式终端获取图像特征并将图像特征传输给PC端;PC端根据所述图像特征建立数据集并按照模型训练模块中的预设算法生成可在嵌入式工程框架下直接编译并使用的认知模型参数构件;将所述认知模型参数构件采取烧录的方式部署至嵌入式终端。A kind of embedded object recognition system based on image processing according to claim 2, it is characterized in that, described PC model training mode comprises the steps: embedded terminal obtains image feature and image feature is transmitted to PC end; PC end Establish a data set according to the image features and generate a cognitive model parameter component that can be directly compiled and used under the embedded engineering framework according to the preset algorithm in the model training module; the cognitive model parameter component is burned. Deploy to embedded terminals.
  4. 根据权利要求2所述的一种基于图像处理的嵌入式物体认知系统,其特征在于,所述嵌入式终端实时推理训练模式包括步骤:嵌入式终端获取图像特征;终端推理模块根据所述图像特征进行图像处理并按照模型训练模块中的预设算法生成可在嵌入式工程框架下直接编译并使用的认知模型参数构件;将所述认知模型参数构件存储至嵌入式终端。An embedded object recognition system based on image processing according to claim 2, wherein the real-time reasoning training mode of the embedded terminal comprises the steps of: the embedded terminal obtains image features; the terminal reasoning module according to the image The feature performs image processing and generates cognitive model parameter components that can be directly compiled and used under the embedded engineering framework according to the preset algorithm in the model training module; the cognitive model parameter components are stored in the embedded terminal.
  5. 根据权利要求1所述的一种基于图像处理的嵌入式物体认知系统,其特征在于,所述模型终端部署模块中的硬件配置根据嵌入式物体认知系统中参数性质不同而不同。An embedded object recognition system based on image processing according to claim 1, wherein the hardware configuration in the model terminal deployment module is different according to the nature of parameters in the embedded object recognition system.
  6. 根据权利要求5所述的一种基于图像处理的嵌入式物体认知系统,其特征在于,常量参数存储至FLASH内存中,变量参数存储至RAM内存中。An embedded object recognition system based on image processing according to claim 5, wherein the constant parameters are stored in the FLASH memory, and the variable parameters are stored in the RAM memory.
  7. 根据权利要求6所述的一种基于图像处理的嵌入式物体认知系统,其特征在于,所述常量参数包括滤波器蚕食、偏置BIAS参数、传播结构函数;所述变量参数包括图像特征、输入变量和输出变量。A kind of embedded object cognition system based on image processing according to claim 6, it is characterized in that, described constant parameter comprises filter to eat away, offset BIAS parameter, propagation structure function; Described variable parameter comprises image characteristic, Input variables and output variables.
  8. 根据权利要求6所述的一种基于图像处理的嵌入式物体认知系统,其特征在于,所述嵌入式物体认知系统将预设的FLASH指定扇区的连续地址空间的擦除与读写替代RAM中动态数组的读写。The embedded object recognition system based on image processing according to claim 6, wherein the embedded object recognition system erases and reads and writes the continuous address space of the preset FLASH designated sector Replaces reading and writing of dynamic arrays in RAM.
  9. 根据权利要求1所述的一种基于图像处理的嵌入式物体认知系统,其特征在于,所述模型训练模块中的预设算法中所用的参数格式为C语言中的多维数组形式。An embedded object recognition system based on image processing according to claim 1, wherein the parameter format used in the preset algorithm in the model training module is a multidimensional array form in C language.
  10. 根据权利要求1所述的一种基于图像处理的嵌入式物体认知系统,其特征在于,所述图像采集模块中包括图像处理单元,所述图像处理单元采用阈值过滤法对所述图像采集模块所获得的训练物体的原始图像进行处理而获得训练物体的图像特征。A kind of embedded object cognition system based on image processing according to claim 1, is characterized in that, comprises image processing unit in the described image acquisition module, and described image processing unit adopts threshold filtering method to describe image acquisition module The obtained original image of the training object is processed to obtain the image features of the training object.
PCT/CN2021/122781 2021-05-10 2021-10-09 Embedded object cognitive system based on image processing WO2022237061A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110505690.6A CN113158968A (en) 2021-05-10 2021-05-10 Embedded object cognitive system based on image processing
CN202110505690.6 2021-05-10

Publications (1)

Publication Number Publication Date
WO2022237061A1 true WO2022237061A1 (en) 2022-11-17

Family

ID=76874165

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/122781 WO2022237061A1 (en) 2021-05-10 2021-10-09 Embedded object cognitive system based on image processing

Country Status (2)

Country Link
CN (1) CN113158968A (en)
WO (1) WO2022237061A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113158968A (en) * 2021-05-10 2021-07-23 苏州大学 Embedded object cognitive system based on image processing
CN115526217A (en) * 2022-11-28 2022-12-27 陕西公众电气股份有限公司 Partial discharge mode identification method and system based on embedded platform

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103595456A (en) * 2013-10-16 2014-02-19 南京邮电大学 Method for achieving multimedia sensor network data transmission system
CN109685017A (en) * 2018-12-26 2019-04-26 中山大学 A kind of ultrahigh speed real-time target detection system and detection method based on light weight neural network
WO2019144575A1 (en) * 2018-01-24 2019-08-01 中山大学 Fast pedestrian detection method and device
CN110349146A (en) * 2019-07-11 2019-10-18 中原工学院 The building method of fabric defect identifying system based on lightweight convolutional neural networks
CN113112431A (en) * 2021-05-10 2021-07-13 苏州大学 Image processing method in embedded system
CN113158968A (en) * 2021-05-10 2021-07-23 苏州大学 Embedded object cognitive system based on image processing

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11443178B2 (en) * 2017-12-15 2022-09-13 Interntional Business Machines Corporation Deep neural network hardening framework
CN109961009B (en) * 2019-02-15 2023-10-31 平安科技(深圳)有限公司 Pedestrian detection method, system, device and storage medium based on deep learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103595456A (en) * 2013-10-16 2014-02-19 南京邮电大学 Method for achieving multimedia sensor network data transmission system
WO2019144575A1 (en) * 2018-01-24 2019-08-01 中山大学 Fast pedestrian detection method and device
CN109685017A (en) * 2018-12-26 2019-04-26 中山大学 A kind of ultrahigh speed real-time target detection system and detection method based on light weight neural network
CN110349146A (en) * 2019-07-11 2019-10-18 中原工学院 The building method of fabric defect identifying system based on lightweight convolutional neural networks
CN113112431A (en) * 2021-05-10 2021-07-13 苏州大学 Image processing method in embedded system
CN113158968A (en) * 2021-05-10 2021-07-23 苏州大学 Embedded object cognitive system based on image processing

Also Published As

Publication number Publication date
CN113158968A (en) 2021-07-23

Similar Documents

Publication Publication Date Title
WO2022237061A1 (en) Embedded object cognitive system based on image processing
US20200104690A1 (en) Neural processing unit (npu) direct memory access (ndma) hardware pre-processing and post-processing
CN102665049B (en) Programmable visual chip-based visual image processing system
US20230334632A1 (en) Image recognition method and device, and computer-readable storage medium
US11763141B2 (en) Neural processing unit (NPU) direct memory access (NDMA) memory bandwidth optimization
WO2021139197A1 (en) Image processing method and apparatus
US20220262093A1 (en) Object detection method and system, and non-transitory computer-readable medium
WO2019222889A1 (en) Image feature extraction method and device
CN110807362A (en) Image detection method and device and computer readable storage medium
CN109598250A (en) Feature extracting method, device, electronic equipment and computer-readable medium
CN114169362A (en) Event stream data denoising method based on space-time correlation filtering
CN111860483B (en) Target detection method based on Haisi platform
CN111226226A (en) Motion-based object detection method, object detection device and electronic equipment
CN104978749A (en) FPGA (Field Programmable Gate Array)-based SIFT (Scale Invariant Feature Transform) image feature extraction system
CN111767947A (en) Target detection model, application method and related device
WO2022237062A1 (en) Image processing method in embedded system
CN110555865B (en) Dynamic visual sensor sample set modeling method based on frame image
CN117275086A (en) Gesture recognition method, gesture recognition device, computer equipment and storage medium
CN112101366A (en) Real-time segmentation system and method based on hybrid expansion network
CN116403200A (en) License plate real-time identification system based on hardware acceleration
WO2020107267A1 (en) Image feature point matching method and device
CN115577747A (en) High-parallelism heterogeneous convolutional neural network accelerator and acceleration method
O’Mahony et al. Convolutional Neural Networks for 3D Vision System Data: A review
CN113850814A (en) CNN model-based litchi leaf pest and disease identification method
CN112381102A (en) Image noise reduction model generation method, image noise reduction method, device, storage medium and equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21941627

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21941627

Country of ref document: EP

Kind code of ref document: A1