CN112435270B - Portable burn depth identification equipment and design method thereof - Google Patents

Portable burn depth identification equipment and design method thereof Download PDF

Info

Publication number
CN112435270B
CN112435270B CN202011629075.8A CN202011629075A CN112435270B CN 112435270 B CN112435270 B CN 112435270B CN 202011629075 A CN202011629075 A CN 202011629075A CN 112435270 B CN112435270 B CN 112435270B
Authority
CN
China
Prior art keywords
data
module
network
image
burn
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011629075.8A
Other languages
Chinese (zh)
Other versions
CN112435270A (en
Inventor
王超
岳克强
李文钧
李宇航
陈石
沈皓哲
张汝林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202011629075.8A priority Critical patent/CN112435270B/en
Publication of CN112435270A publication Critical patent/CN112435270A/en
Application granted granted Critical
Publication of CN112435270B publication Critical patent/CN112435270B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/005Statistical coding, e.g. Huffman, run length coding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H30/00ICT specially adapted for the handling or processing of medical images
    • G16H30/20ICT specially adapted for the handling or processing of medical images for handling medical images, e.g. DICOM, HL7 or PACS
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20024Filtering details
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20048Transform domain processing
    • G06T2207/20052Discrete cosine transform [DCT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Epidemiology (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses portable burn depth identification equipment based on local semantic segmentation network and FPGA hardware acceleration, which belongs to the field of intelligent hardware and auxiliary medical diagnosis and comprises a central control and processing module, a local semantic segmentation network acceleration module based on a Field Programmable Gate Array (FPGA), a high-resolution image acquisition and processing module, a data storage module, a network communication module and a user interaction module. The invention also discloses a design method of the portable burn depth identification equipment based on the local semantic segmentation network and FPGA hardware acceleration, and the system is provided with a special local semantic segmentation network accelerator, so that the burn images can be identified and segmented in real time.

Description

Portable burn depth identification equipment and design method thereof
Technical Field
The invention relates to the field of intelligent hardware and auxiliary medical diagnosis, in particular to portable burn depth identification equipment based on local semantic segmentation network and FPGA hardware acceleration and a design method thereof.
Background
The burn depth diagnosis is an important link for judging burn conditions, and reliable and effective burn wound severity assessment is the basis of clinical decisions. Clinical evaluation is a common and universal judging mode at present, however, the subjective judgment has more influencing factors and uneven diagnosis level; objective diagnostic methods based on various detection tools are various, but are limited by factors such as cost, operation mode, accuracy and the like, and no satisfactory diagnostic tool exists so far. The method opens up a new idea for medical intelligent diagnosis, but how to realize artificial intelligent auxiliary diagnosis conveniently, rapidly, accurately and with low cost is not yet a complete solution in the industry.
Image semantic segmentation is an important part of image processing and image understanding in machine vision technology, and is also an important branch in the field of artificial intelligence. The semantic segmentation is to classify each pixel point in the image, and determine the category of each point so as to divide the target area. Because of the characteristic of classifying single pixels, the image semantic segmentation plays an important role in artificial intelligence aided diagnosis. However, compared with the traditional convolutional neural network structure, the semantic segmentation network not only comprises basic lower convolutional layers, but also needs the same number of upper convolutional layers for image reconstruction, thereby increasing the operation burden. Currently, for computationally intensive neural network reasoning tasks, dedicated hardware circuits are typically employed for hardware acceleration. Common hardware acceleration devices have image processors (GPUs), application Specific Integrated Circuits (ASICs), and Field Programmable Gate Arrays (FPGAs), and FPGAs are suitable as hardware acceleration units for portable devices due to their advantages of being reconfigurable, low power consumption, flexible, configurable, etc.
Disclosure of Invention
In view of the above, the invention provides a portable burn depth identification device based on local semantic segmentation network and FPGA hardware acceleration and a design method thereof, wherein burn images are segmented and identified according to burn degrees of different areas through the semantic segmentation network; the hardware acceleration of forward reasoning operation of the semantic segmentation network is realized through the FPGA, and the local processing of burn images is realized. Compared with an reasoning scheme deployed on a GPU or a large-scale deep learning special server, the technical scheme of the invention has the advantages of low delay, low power consumption and the like.
The technical scheme of the invention is as follows:
a portable burn depth identification device based on local semantic segmentation network and FPGA hardware acceleration adopts a heterogeneous system on a chip with a general processor (CPU) and a Field Programmable Gate Array (FPGA) combined, carries a special convolutional neural network accelerator, carries out the semantic segmentation network acceleration of a localized burn image, carries out real-time segmentation and classification on the burn data of the skin surface of a patient acquired by an image acquisition module, and uploads the data to a server through a network module. Meanwhile, a user can operate the equipment through the graphical interaction subsystem;
in the preferred technical scheme, the heterogeneous system on chip adopts a general processor as a system scheduling (PS) end, and a special embedded Linux operating system is constructed on the basis of the general processor and is used for overall scheduling of each module and subsystem; the heterogeneous system-on-chip adopts an FPGA as a data Processing (PL) end, and based on the FPGA, a burn image semantic segmentation network hardware accelerator is constructed and used for optimizing an image data stream acquired by a camera module and accelerating the forward reasoning speed of a neural network;
in a preferred technical scheme, the burn image semantic segmentation network hardware accelerator is characterized by comprising the following units:
an image preprocessing unit for preprocessing an input burn image;
the convolutional neural network reasoning unit is used for forward reasoning of the convolutional neural network and outputting a segmented image;
the data flow instruction analysis unit is used for analyzing the data flow control instruction and outputting a calculation instruction;
and the data loading and controlling unit is used for reading and storing burn images and controlling internal data flow.
In a preferred technical scheme, the burn image semantic segmentation network hardware accelerator is characterized in that an input image of the convolutional neural network reasoning unit is a processed burn image, a plurality of acceleration core units are integrally carried, and parallel processing of the acceleration core units can be carried out in one clock period;
in a preferred technical scheme, the convolutional neural network reasoning unit is characterized in that the acceleration core unit is provided with a plurality of calculation processing units, each calculation processing unit adopts Booth coding to data, and Wallace tree is adopted to carry out multiplication operation. The acceleration core unit can perform multiply-accumulate operation, maximum selection operation and minimum selection operation of a plurality of data in one clock period, and outputs a plurality of operation results altogether;
in a preferred technical solution, the burn image semantic segmentation network hardware accelerator is characterized in that the data flow instruction analysis unit adopts a Finite State Machine (FSM) mode to analyze the data flow instruction, and the instruction types obtained by analysis include: multiply-accumulate, maximum select, minimum select, load data, read data, temporarily store data, output data. The instruction output by the data flow instruction analysis unit is used for controlling the operation rule of each acceleration core unit;
in a preferred technical scheme, the burn image semantic segmentation network hardware accelerator is characterized in that the data loading and controlling unit comprises the following parts:
a direct memory access controller (DMA) for reading input burn image data from the DDR and writing back the processed divided image data into the DDR;
and the double buffer queues are used for temporarily storing data and adjusting the input sequence. The double buffer queues are realized by adopting the partitioned BRAM, so that the bit width of output data can be flexibly adjusted, and the extra power consumption caused by BRAM reading is reduced;
the entropy coding/decoding module is used for run-length coding and decoding of the image data, and the coding reduces the on-chip storage pressure of the data;
zero detection module: the method is used for detecting whether the pixel data of the image is 0, when the pixel data is 0, skipping the pixel and loading the next pixel, so that the slice operation pressure of the data is reduced;
the normalization module is used for normalizing the data of each layer of the network and reducing drift errors caused by overlarge data range;
in a preferred technical scheme, the portable burn depth identification device based on local semantic segmentation network and FPGA hardware acceleration is characterized in that a graphical interaction subsystem can output classification and identification results of burn images in real time, marks the burn images by using masks with different colors according to burn degrees of different areas, and outputs a burn depth prediction result and an overall auxiliary diagnosis suggestion;
in a preferred technical scheme, the portable burn depth identification device based on local semantic segmentation network and FPGA hardware acceleration is characterized in that the network module is communicated with a server through a wireless network, and an original burn image and an identified burn image are transmitted to the server through the wireless network, so that later comparison is facilitated;
the invention also discloses a design method of the portable burn depth identification equipment based on the local semantic segmentation network and FPGA hardware acceleration, which is characterized by comprising the following steps:
s01: constructing a semantic segmentation network for burn depth identification by using Pytorch, training network parameters and compressing a network model by adopting a quantization and pruning method;
s02: designing an FPGA hardware acceleration unit according to the constructed network and the FPGA resource condition, generating a bit stream file, and programming the bit stream file into a hardware platform for localized network acceleration;
s03: designing and cutting an operating system suitable for a hardware platform, and writing a driving program of an image acquisition and processing module, a data storage module, a network communication module and a user interaction module;
s04: and writing a user layer program to realize system scheduling and calling of the hardware acceleration unit.
In the preferred technical scheme, the step S01 adopts a perceptual quantization mode to fix the floating point data; cutting the number of network model parameters by adopting a mode of combining static pruning and dynamic pruning, cutting most of network parameters into 0 value, and encoding the pruned data by adopting an entropy encoding mode to further reduce the storage space;
in a preferred technical solution, the FPGA hardware acceleration unit in step S02 includes the following modules:
the bus interface and the interconnection matrix comprise a data bus and a control bus and are responsible for transmitting control signals and data between the bus interface and a system scheduling end;
the on-chip memory is used for temporarily storing network data on the FPGA;
the DMA interface is used for reading and writing DDR data;
the forward reasoning module is used for forward reasoning of the convolutional neural network, and various functional units required by forward reasoning operation are contained in the forward reasoning module, and the forward reasoning module comprises: 1) The vector multiplication accumulation unit is used for calculating a matrix multiplication operation result of the input vector and the weight matrix; 2) An accumulation unit for accumulating the matrix multiplication calculation results; 3) An activation function unit for calculating the activation value of each layer of network; 4) And the internal temporary storage unit is used for temporarily storing the calculation result of each matrix multiplication.
In a preferred technical scheme, the preprocessing of the image by the FPGA hardware acceleration unit in step S02 includes the following steps:
s01: performing sliding average filtering on an input image to remove image noise;
s02: performing block processing on an input image, and performing discrete cosine transform on each sub-image block to obtain frequency domain image data;
s03: carrying out low-pass filtering on the frequency domain image data, reserving low-frequency components and removing high-frequency components;
compared with the prior art, the invention has the advantages that:
the invention can realize real-time segmentation and burn degree identification of burn areas locally, and the hardware acceleration unit built based on heterogeneous systems can effectively improve the burn image data processing speed and assist in degree quantification and diagnosis of burn cases. Compared with a semantic segmentation reasoning scheme deployed on a GPU or a large-scale deep learning special server, the technical scheme of the invention has the advantages of low delay, low power consumption, portability, easiness in operation and the like.
Drawings
The invention is further described below with reference to the accompanying drawings and examples:
FIG. 1 is a block diagram of the overall system components of a portable burn depth identification device based on local semantic segmentation network and FPGA hardware acceleration provided by an embodiment of the present invention;
FIG. 2 is a block diagram of a heterogeneous SoC system of a portable burn depth qualification apparatus based on local semantic segmentation network and FPGA hardware acceleration provided by an embodiment of the present invention;
FIG. 3 is a schematic diagram of a data path of a portable burn depth identification device based on a local semantic segmentation network and FPGA hardware acceleration provided by an embodiment of the present invention;
FIG. 4 is a block diagram of the components of an FPGA acceleration core unit employed in an embodiment of the present invention;
FIG. 5 is a schematic diagram of a string-to-moment converting circuit according to an embodiment of the present invention;
FIG. 6 is a timing diagram of a convolution pipeline employed by an embodiment of the present invention;
FIG. 7 is a flow chart of a method of designing an FPGA hardware accelerator employed by an embodiment of the present invention;
FIG. 8 is a flow chart of a method for designing an FPGA hardware accelerator image preprocessing module employed in an embodiment of the invention.
Fig. 9 is a flowchart of the overall software design of the portable burn depth identification device design method based on the local semantic segmentation network and FPGA hardware acceleration according to the embodiment of the present invention.
Detailed Description
The portable burn depth identification device based on the local semantic segmentation network and the FPGA hardware acceleration provided by the embodiment realizes real-time identification and segmentation of burn images through the semantic segmentation network and the local semantic segmentation network accelerator. In addition, the embodiment also provides a system design method suitable for the equipment.
The following describes the detailed implementation of the embodiments of the present invention with reference to the drawings. It should be understood that these examples are illustrative of the present invention and are not intended to limit the scope of the present invention. The implementation conditions adopted in the embodiments can be further adjusted according to specific hardware equipment and application scenes, and the implementation conditions which are not noted are usually conditions in routine experiments.
Example 1:
a portable burn depth identification device based on local semantic segmentation network and FPGA hardware acceleration has an overall system composition block diagram shown in figure 1, and comprises a communication device, a core processing system and an external device. The communication equipment comprises a WIFI module 1, a serial communication interface module 2 and a USB communication interface module 3; the core processing system comprises an ARM-A9 processing system module 4 and an FPGA hardware acceleration system module 5; the external equipment comprises a DDR module 6, a touch screen module 7 and a camera module 8.
The WIFI module 1 is connected with the module 4 through an SDIO interface and is used for uploading the processed burn image data;
the serial communication interface module 2 is connected with the module 4 through a UART interface and is used for receiving and outputting system debugging data;
the USB communication interface module 3 is connected with the module 4 through a USB interface and is used for connecting with other USB devices;
the ARM-A9 processing system module 4 and the FPGA hardware acceleration system module 5 are mutually matched to realize system functions, and are connected through an AXI bus, and the AXI bus is used for internal transmission of image data and control instructions;
the DDR module 6 is connected with the module 4 and the module 5 through an AXI bus and is used for data storage;
the touch screen module 7 is connected with the module 4 through an HDMI interface and is used for real-time presentation of burn images and user interaction;
the camera module 8 is connected with the module 4 through an HDMI interface and is used for collecting burn image data.
Fig. 2 is a block diagram of a heterogeneous SoC system of a portable burn depth qualification apparatus based on a local semantic segmentation network and FPGA hardware acceleration provided by an embodiment of the present invention. The heterogeneous SoC in the embodiment of the present invention refers to a computing system that integrates a system scheduling side (Processing System End, abbreviated as PS) and a programmable logic side (Programmable Logic End, abbreviated as PL) chip at the same time. The PS in this embodiment is a general-purpose ARM processor, and PL is a field programmable gate array (FieldProgammable Gate Arrays, abbreviated as FPGA). Communication between PS and PL may be achieved by an on-chip bus system and a corresponding bus bridge, and in the present embodiment and the drawings, the data path is illustrated by an AMBA bus system, but the present invention is not limited thereto.
The heterogeneous SoC in the embodiment of the invention comprises PS, an off-chip memory, a bus system and PL. The PS comprises a low-speed peripheral module 11 and an ARM core module 12; the off-chip memory comprises a DDR3 module 13; the bus system comprises an APB bus module 14 and an AXI bus module 15; the PL includes a control register block 16, a PE array block 17, a DMA block 18, an input buffer block 19, and an output buffer block 20.
The low-speed peripheral module 11 is connected with the APB bus module 14 through an APB bus interface and is used for interaction between the system and the outside;
the ARM core module 12 is connected with the AXI bus module 15 through an AXI bus interface and is used for overall dispatching and data flow control of the system;
the DDR3 module 13 is connected with the AXI bus module 15 through an AXI bus interface and is used for data storage;
the APB bus module 14 is connected with the AXI bus module 15 through an AXI-APB bus bridge, and the APB bus module 14 and the AXI bus module are commonly used for system data transmission;
the control register module 16 is connected with an AXI bus through an AXI interface and is used for temporarily storing control commands;
the PE array module 17 is connected with the input buffer module 19 through an internal data queue interface and is used for calculating and processing image data;
the DMA module 18 is connected with the AXI bus module 15 through an AXI bus interface, and is connected with the input buffer module 19 and the output buffer module 20 through an internal data queue interface, and is used for automatically loading and storing image data;
the input buffer module 19 is connected with the DMA module 18 and the PE array module 17 through an internal data queue interface and is used for buffering input data;
the output buffer module 20 is connected with the DMA module 18 and the PE array module 17 through an internal data queue interface and is used for buffering output data;
fig. 3 is a schematic diagram of a data path of a portable burn depth identification device based on a local semantic segmentation network and FPGA hardware acceleration according to an embodiment of the present invention. Data is transferred between system modules at different rates over a high speed bus and a low speed bus. The ARM processor core, the external storage device, the high-speed peripheral equipment and the accelerator need higher communication speed and are hung under the high-speed bus; low-speed peripherals require lower communication rates, mounted under a low-speed bus. The high-speed bus and the low-speed bus perform data conversion through a bus bridge.
The core of the embodiment is an acceleration core in an FPGA hardware acceleration system module, and fig. 4 is a block diagram of an FPGA acceleration core unit adopted in the embodiment of the present invention. The external input feature map is input to a zero padding circuit through an external data queue, the feature map size is adjusted through the zero padding circuit, and then the feature map is divided into sub-image blocks of 32 pixels by 32 pixels according to the feature map size, and the sub-image blocks are input to a computing unit taking a convolution core as an example. After the operation is completed in the convolution core, the data is output to an output queue through a data processing unit taking an accumulation unit as an example, and finally output to be a processed characteristic diagram under the control of an external reading signal;
fig. 5 is a schematic diagram of a string-moment converting circuit according to an embodiment of the present invention. The serial moment conversion circuit converts the serial input image data stream into a two-dimensional image window, and provides data for window operations such as convolution, pooling and the like. The string torque conversion circuit and the operation principle thereof in this embodiment are described in detail with reference to fig. 5.
The serial-to-moment conversion circuit structure inputs data through an image serial input port, one data is input in each clock period, meanwhile, the data in all internal RAMs moves to the right, and the data at the tail of a row vector moves to the head of the next row vector. The tail data of each row vector is output as a tap to form a window column vector. Let the input feature diagram be of the sizeThe window size isThe structure size of the string moment conversion circuit is as follows
Before the string moment conversion circuit outputs the first window column vector of the first convolution, the internal RAM of the string moment conversion circuit needs to be filled, and the clock period needed at the stage is as follows; outputting the last window column vector of the first convolution, then requiring an additional clock cycle; in the first period, the data output of the first window can be sent to a convolution module for convolution operation; and in the first period, all window data are output, and window sliding of the feature map is finished.
The computation process of the acceleration core unit needs to be optimized for time sequence to increase the operation frequency of the system. Therefore, the operation sequence of the acceleration core unit, which is exemplified by convolution operation, will be described in detail with reference to fig. 6. In this embodiment, the convolution operation is divided into three steps of multiplication, partial sum generation and convolution result generation, each step occupies one clock period, and the product result, the sum of column vector elements and the final convolution result of window data and corresponding data of the convolution kernel are calculated respectively. In the first clock period, accelerating the core unit to perform the multiplication operation of the corresponding element of window one data; in the second clock period, accelerating the core unit to multiply the corresponding elements of the window two data and the partial product generation operation of the window one data; in the third clock period, the acceleration core unit performs operations of multiplying corresponding elements of the window three data, generating partial products of the window two data and generating convolution results of the window one data; thus, three clock cycles are required to produce the convolution result for the first window, and only one clock cycle is required for each window convolution operation since the three steps are pipelined.
Example 2:
a design method of portable burn depth identification equipment based on local semantic segmentation network and FPGA hardware acceleration, the design method flow is shown in fig. 7, the method 100 comprises:
s110: constructing a semantic segmentation network for burn depth identification by using Pytorch, training network parameters and compressing a network model by adopting a quantization and pruning method;
s120: designing an FPGA hardware acceleration unit according to the constructed network and the FPGA resource condition, generating a bit stream file, and programming the bit stream file into a hardware platform for localized network acceleration;
s130: designing and cutting an operating system suitable for a hardware platform, and writing a driving program of an image acquisition and processing module, a data storage module, a network communication module and a user interaction module;
s140: and writing a user layer program to realize system scheduling and calling of the hardware acceleration unit.
The input image needs to be preprocessed before designing the FPGA hardware accelerator. FIG. 8 is a flow chart of a method for designing an FPGA hardware accelerator image preprocessing module employed in an embodiment of the present invention, the method 200 comprising:
s210: performing sliding average filtering on an input image to remove image noise;
s220: performing block processing on an input image, and performing discrete cosine transform on each sub-image block to obtain frequency domain image data;
s230: carrying out low-pass filtering on the frequency domain image data, reserving low-frequency components and removing high-frequency components;
the overall software design flow of an embodiment of the present invention will be described in detail with reference to fig. 9. After the system is powered on and started, hardware initialization is firstly carried out, including power management module initialization, memory module initialization, bus system initialization, external equipment initialization and hardware acceleration unit initialization. After the initialization is successful, an operating system is started to schedule, and the operating system can start a camera data reading subprocess and a user interaction subprocess. The camera data reading subprocess can read images acquired by the camera and display the images on a screen in real time, and meanwhile, waiting for burn classification commands issued by a user through the user interaction subprocess; the user interaction subprocess waits for a user to issue burn classification commands and transmits the commands to the camera data reading subprocess; after the user issues a burn classification command, the system calls a hardware accelerator to perform neural network forward reasoning on the burn image data acquired by the current camera, obtains classified and segmented burn images, and displays the classification result of the burn images.
The above embodiments are provided to illustrate the technical concept and features of the present invention and are intended to enable those skilled in the art to understand the content of the present invention and implement the same, and are not intended to limit the scope of the present invention. Equivalent changes and modifications are intended to be included within the scope of the present invention.

Claims (5)

1. The portable burn depth identification device based on local semantic segmentation network and FPGA hardware acceleration is characterized in that a heterogeneous on-chip system combining a general processor CPU and a field programmable gate array FPGA is adopted, a burn image semantic segmentation network hardware accelerator is carried, the semantic segmentation network acceleration of a localized burn image is carried out, real-time segmentation and classification are carried out on burn data on the skin surface of a patient, which are acquired by an image acquisition module, and the data are uploaded to a server through a network module; meanwhile, a user can operate the equipment through the graphical interaction subsystem; the heterogeneous system on chip adopts a general processor CPU as a system scheduling end, and a special embedded Linux operating system is constructed on the basis of the system scheduling end and is used for overall scheduling of each module and subsystem; the heterogeneous system-on-chip adopts an FPGA as a data processing end, and based on the FPGA, a burn image semantic segmentation network hardware accelerator is constructed and used for optimizing an image data stream acquired by a camera module and accelerating the forward reasoning speed of a neural network;
the burn image semantic segmentation network hardware accelerator comprises the following units:
the convolutional neural network reasoning unit is used for forward reasoning of the convolutional neural network and outputting a segmented image; the input image of the convolutional neural network reasoning unit is a processed burn image, a plurality of acceleration core units are integrally mounted, and parallel processing of the acceleration core units can be carried out in one clock period;
the data flow instruction analysis unit is used for analyzing the data flow control instruction and outputting a calculation instruction; the data flow instruction analysis unit adopts a finite state machine FSM mode to analyze the data flow control instruction, and the analysis result includes the following instruction types: multiply-accumulate, maximum select, minimum select, load data, read data, temporary store data, output data;
the instruction output by the data flow instruction analysis unit is used for controlling the operation rule of each acceleration core unit;
a data loading and controlling unit for reading, storing and controlling the internal data flow of the burn image;
the acceleration core unit is provided with a plurality of calculation processing units, each calculation processing unit adopts Booth coding to data, and performs multiplication operation by adopting a Wallace tree, and the acceleration core unit can perform multiplication accumulation operation, maximum selection operation and minimum selection operation of a plurality of data in one clock period, and outputs a plurality of operation results altogether.
2. The portable burn depth identification device based on local semantic segmentation network and FPGA hardware acceleration of claim 1, wherein: the data loading and controlling unit comprises the following parts:
a direct memory access controller DMA for reading input burn image data from the DDR and writing back the processed divided image data into the DDR;
the double buffer queues are used for temporarily storing data and adjusting the input sequence;
the double buffer queues are realized by adopting the partitioned BRAM, so that the bit width of output data can be flexibly adjusted, and the extra power consumption caused by BRAM reading is reduced;
the entropy coding/decoding module is used for run-length coding and decoding of the image data, and the coding reduces the on-chip storage pressure of the data;
zero detection module: the method is used for detecting whether the pixel data of the image is 0, when the pixel data is 0, skipping the pixel and loading the next pixel, so that the slice operation pressure of the data is reduced;
and the normalization module is used for normalizing the data of each layer of the network.
3. The portable burn depth identification device based on local semantic segmentation network and FPGA hardware acceleration according to claim 1, wherein the graphical interaction subsystem is capable of outputting classification and identification results of burn images in real time, marking with masks of different colors according to burn degrees of different areas, and outputting burn depth prediction results and overall auxiliary diagnosis suggestions;
the network module is communicated with the server through a wireless network, and the original burn image and the identified burn image are transmitted to the server through the wireless network, so that later comparison is facilitated.
4. The design method of the portable burn depth identification equipment based on local semantic segmentation network and FPGA hardware acceleration is characterized by comprising the following steps:
s01: constructing a semantic segmentation network for burn depth identification by using Pytorch, training network parameters and compressing a network model by adopting a quantization and pruning method;
s02: designing an FPGA hardware acceleration unit according to the constructed network and the FPGA resource condition, generating a bit stream file, and programming the bit stream file into a hardware platform for localized network acceleration;
s03: designing and cutting an operating system suitable for a hardware platform, and writing a driving program of an image acquisition and processing module, a data storage module, a network communication module and a user interaction module;
s04: writing a user layer program to realize system scheduling and calling of a hardware acceleration unit;
the FPGA hardware acceleration unit in step S02 includes the following modules:
the bus interface and the interconnection matrix comprise a data bus and a control bus and are responsible for transmitting control signals and data between the bus interface and a system scheduling end;
the on-chip memory is used for temporarily storing network data on the FPGA;
the DMA interface is used for reading and writing DDR data;
the forward reasoning module is used for forward reasoning of the convolutional neural network, and various functional units required by forward reasoning operation are contained in the forward reasoning module, and the forward reasoning module comprises: 1) The vector multiplication accumulation unit is used for calculating a matrix multiplication operation result of the input vector and the weight matrix; 2) An accumulation unit for accumulating the matrix multiplication calculation results; 3) An activation function unit for calculating the activation value of each layer of network; 4) The internal temporary storage unit is used for temporarily storing the calculation result of each matrix multiplication;
the step S02 of preprocessing the image by the FPGA hardware acceleration unit comprises the following steps:
s01: performing sliding average filtering on an input image to remove image noise;
s02: performing block processing on an input image, and performing discrete cosine transform on each sub-image block to obtain frequency domain image data;
s03: and carrying out low-pass filtering on the frequency domain image data, retaining low-frequency components and removing high-frequency components.
5. The method for designing the portable burn depth identification device based on the local semantic segmentation network and the FPGA hardware acceleration according to claim 4, wherein the step S01 adopts a perceptual quantization method to fix the floating point data; and the number of network model parameters is cut by adopting a mode of combining static pruning and dynamic pruning, most network parameters are cut to be 0 value, and meanwhile, the data after pruning is encoded by adopting an entropy encoding mode, so that the storage space is further reduced.
CN202011629075.8A 2020-12-31 2020-12-31 Portable burn depth identification equipment and design method thereof Active CN112435270B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011629075.8A CN112435270B (en) 2020-12-31 2020-12-31 Portable burn depth identification equipment and design method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011629075.8A CN112435270B (en) 2020-12-31 2020-12-31 Portable burn depth identification equipment and design method thereof

Publications (2)

Publication Number Publication Date
CN112435270A CN112435270A (en) 2021-03-02
CN112435270B true CN112435270B (en) 2024-02-09

Family

ID=74697119

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011629075.8A Active CN112435270B (en) 2020-12-31 2020-12-31 Portable burn depth identification equipment and design method thereof

Country Status (1)

Country Link
CN (1) CN112435270B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108280514A (en) * 2018-01-05 2018-07-13 中国科学技术大学 Sparse neural network acceleration system based on FPGA and design method
CN109784489A (en) * 2019-01-16 2019-05-21 北京大学软件与微电子学院 Convolutional neural networks IP kernel based on FPGA

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11775313B2 (en) * 2017-05-26 2023-10-03 Purdue Research Foundation Hardware accelerator for convolutional neural networks and method of operation thereof

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108280514A (en) * 2018-01-05 2018-07-13 中国科学技术大学 Sparse neural network acceleration system based on FPGA and design method
CN109784489A (en) * 2019-01-16 2019-05-21 北京大学软件与微电子学院 Convolutional neural networks IP kernel based on FPGA

Also Published As

Publication number Publication date
CN112435270A (en) 2021-03-02

Similar Documents

Publication Publication Date Title
EP3627397B1 (en) Processing method and apparatus
CN108268940B (en) Tool for creating reconfigurable interconnect frameworks
US11157764B2 (en) Semantic image segmentation using gated dense pyramid blocks
US20220147791A1 (en) A generic modular sparse three-dimensional (3d) convolution design utilizing sparse 3d group convolution
CN111178518A (en) Software and hardware cooperative acceleration method based on FPGA
TW201917566A (en) Deep vision processor
US20200364552A1 (en) Quantization method of improving the model inference accuracy
Mani et al. Performance comparison of CNN, QNN and BNN deep neural networks for real-time object detection using ZYNQ FPGA node
KR20180034853A (en) Apparatus and method test operating of convolutional neural network
CN113051216B (en) MobileNet-SSD target detection device and method based on FPGA acceleration
CN112183482A (en) Dangerous driving behavior recognition method, device and system and readable storage medium
CN205486304U (en) Portable realtime graphic object detection of low -power consumption and tracking means
Linares-Barranco et al. Dynamic vision sensor integration on fpga-based cnn accelerators for high-speed visual classification
US11443407B2 (en) Sparse matrix optimization mechanism
CN113449859A (en) Data processing method and device
US20210003629A1 (en) Scalable infield scan coverage for multi-chip module for fuctional safety mission application
WO2021249192A1 (en) Image processing method and apparatus, machine vision device, electronic device and computer-readable storage medium
CN115943389A (en) Surveillance camera upgrade via removable media with deep learning accelerator and random access memory
EP4128066A1 (en) Feature reordering based on sparsity for improved memory compression transfers during machine learning jobs
US11704894B2 (en) Semantic image segmentation using gated dense pyramid blocks
CN112435270B (en) Portable burn depth identification equipment and design method thereof
Isik et al. An energy-efficient reconfigurable autoencoder implementation on fpga
DE102021121514A1 (en) Device and method for an artificial neural network
Herrmann et al. A YOLO V3-tiny FPGA architecture using a reconfigurable hardware accelerator for real-time region of interest detection
CN115577747A (en) High-parallelism heterogeneous convolutional neural network accelerator and acceleration method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant