CN113139519A - Target detection system based on fully programmable system on chip - Google Patents

Target detection system based on fully programmable system on chip Download PDF

Info

Publication number
CN113139519A
CN113139519A CN202110529675.5A CN202110529675A CN113139519A CN 113139519 A CN113139519 A CN 113139519A CN 202110529675 A CN202110529675 A CN 202110529675A CN 113139519 A CN113139519 A CN 113139519A
Authority
CN
China
Prior art keywords
module
neural network
deep neural
picture
terminal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110529675.5A
Other languages
Chinese (zh)
Other versions
CN113139519B (en
Inventor
王明伟
时凯胜
陈凤兰
黄叶祺
闫瑞
王钊
王诗鹏
罗宇
迟青松
田甜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shaanxi University of Science and Technology
Original Assignee
Shaanxi University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shaanxi University of Science and Technology filed Critical Shaanxi University of Science and Technology
Priority to CN202110529675.5A priority Critical patent/CN113139519B/en
Publication of CN113139519A publication Critical patent/CN113139519A/en
Application granted granted Critical
Publication of CN113139519B publication Critical patent/CN113139519B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • G06V10/955Hardware or software architectures specially adapted for image or video understanding using specific electronic processors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a target detection system based on a fully programmable system on chip, which comprises a PL terminal and a PS terminal, wherein the PS terminal is used for acquiring a video stream or a picture and preprocessing the acquired video stream or picture, and the PL terminal adopts a deep neural network technology to carry out target detection according to the video stream or the picture. The PS end is realized based on ARM, and the PL end is realized based on FPGA, after the two technologies are combined, the size of the product is reduced, the power consumption of the product is reduced, the performance is strong, and the requirement of a mobile platform on target detection can be met.

Description

Target detection system based on fully programmable system on chip
Technical Field
The invention relates to the technical field of image data processing, in particular to a target detection system based on a fully programmable system on a chip.
Background
An artificial neural network, also called a neural network, is a core technology of artificial intelligence, and is also an adaptive system, which is a topological network structure designed by imitating the operation process of a biological neural network. The artificial neural network is formed by connecting a plurality of artificial neurons, each neuron is activated by an activation function, and when information outside the network is changed and the neuron is activated, signal flow can circulate by a new path, so that the self-adaption is completed.
However, in the image or video-based object detection technology, the storage resources and the operation resources consumed by the artificial neural network are very large, so that the processing of a video stream or a large number of pictures is often assisted by a GPU (graphics processor) or an APU (accelerated processor) server, and the GPU and the APU server are large in size and high in power consumption, so that the neural network-based object detection technology is not suitable for being deployed under a platform of a mobile terminal.
Disclosure of Invention
The embodiment of the invention provides a target detection system based on a fully programmable system on a chip, which is used for solving the problems that in the prior art, an artificial neural network has high resource consumption and is not suitable for deploying a target detection technology on a mobile terminal.
In one aspect, an embodiment of the present invention provides a target detection system based on a fully programmable system on a chip, including: the PL terminal and the PS terminal are respectively realized based on an FPGA and an ARM;
the PS terminal is used for acquiring video streams or pictures and sending the video streams or pictures to the PL terminal;
the PL terminal includes: the system comprises a communication module, a data transfer module and a deep neural network module;
the communication module is used for sending the video stream or the picture acquired by the PS terminal to the data transfer module;
the data transfer module is used for storing the received video stream or picture in the storage unit and sending the video stream or picture stored in the storage unit to the deep neural network module;
the deep neural network module is used for carrying out target identification according to the video stream or the picture.
In one possible implementation, the deep neural network module may include: a neuron module; the neuron module is used for reading the network parameters stored in the storage unit and training the deep neural network in the deep neural network module.
In one possible implementation, the communication module may include: an AXI Stream slave station, an AXI Lite slave station, and an AXI Stream master station; the AXI Stream slave station is used for receiving a transmission command from the user logic and controlling the transfer operation of the data transfer module according to the transmission command; a lookup table is embedded in the AXI Lite slave station, a neuron module reads data in the lookup table, and a coefficient of a deep neural network is processed on a neuron; the AXI Stream master station is used for transmitting the data output by the deep neural network module to the data transfer module.
In one possible implementation manner, the PL end may further include: a BRAM module; the BRAM module is used for accelerating the speed of data passing through the deep neural network module by adopting a double-channel input and output port technology.
In one possible implementation, the deep neural network module may include: a data path module and a control path module; the data path module includes: the input routing network is used for routing data input into the deep neural network module to a proper functional module, the functional module is used for solving arithmetic, logic and relational operation, and the result routing network is used for routing and storing the data output by the functional unit into the storage unit; the control path module is used for managing the execution sequence of each part in the data path module.
In one possible implementation manner, the PL end may further include: a control module; the control module is used for controlling the time sequence of each module in the deep neural network module.
In a possible implementation manner, the PS terminal obtains a video stream or a picture from the camera, performs preprocessing on the video stream or the picture, and sends a result after the preprocessing to the PL terminal.
In one possible implementation, the preprocessing of the video stream or picture by the PS side may include: and after converting the video stream into a picture, converting the picture into a gray image together with the picture acquired from the camera, compressing the gray image and sending the compressed gray image to a PL (provider line) terminal.
In one possible implementation, the memory unit may be a DDR memory.
In a possible implementation manner, the platform can further comprise a display, and the PL end can further comprise a display module; and after the data transfer module sends the target recognition result output by the deep neural network module to the display module, the display module controls the display to display the target recognition result.
The target detection system based on the fully programmable system on chip has the following advantages:
have ARM CPU and FPGA's characteristic concurrently to contained these two advantages, the collaborative design of especially adapted software and hardware, small powerful can, the consumption is lower moreover, especially adapted deploys on mobile platform such as car, unmanned aerial vehicle, robot, medical equipment.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a functional block diagram of a target detection system based on a fully programmable system on a chip according to an embodiment of the present invention;
FIG. 2 is a functional block diagram of the PL terminal;
FIG. 3 is a schematic diagram of a deep neural network module;
FIG. 4 is a schematic diagram of a communication module;
FIG. 5 is a block diagram of an FPGA.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the prior art, when the target detection technology is implemented by combining an artificial neural network, because the artificial neural network consumes relatively large hardware resources, a GPU or APU server needs to be used when target detection is performed by using a video stream or a large number of pictures, and the two servers have strong performance, large volume and high power consumption, so that the current target detection technology based on the artificial neural network cannot be applied to a complex environment, such as a mobile platform.
In order to solve the problems in the prior art, the invention provides a target detection system based on a fully programmable system on chip, which comprises a PL terminal and a PS terminal, wherein the PS terminal is used for acquiring a video stream or a picture and preprocessing the acquired video stream or picture, and the PL terminal adopts a deep neural network technology to perform target detection according to the video stream or the picture. The PS end is realized based on ARM, and the PL end is realized based on FPGA, after the two technologies are combined, the size of the product is reduced, the power consumption of the product is reduced, the performance is strong, and the requirement of a mobile platform on target detection can be met.
Fig. 1 is a functional module schematic diagram of a target detection system based on a fully programmable system on a chip according to an embodiment of the present invention, fig. 2 is a functional module diagram of a PL side, fig. 3 is a schematic diagram of a deep neural network module, fig. 4 is a schematic diagram of a communication module, and fig. 5 is a module schematic diagram of an FPGA. In an embodiment of the present invention, a target detection system based on a fully programmable system on a chip includes: the PL terminal and the PS terminal are respectively realized based on an FPGA and an ARM;
the PS terminal is used for acquiring video streams or pictures and sending the video streams or pictures to the PL terminal;
the PL terminal includes: the system comprises a communication module, a data transfer module and a deep neural network module;
the communication module is used for sending the video stream or the picture acquired by the PS terminal to the data transfer module;
the data transfer module is used for storing the received video stream or picture in the storage unit and sending the video stream or picture stored in the storage unit to the deep neural network module;
the deep neural network module is used for carrying out target identification according to the video stream or the picture.
Illustratively, the PL side is Programmable Logic, abbreviated as Programmable Logic side, and the PS side is Processing System, abbreviated as Processing System side. Besides the functions, the PS terminal also has the functions of calling the deep neural network module and enabling and resetting the whole system through the instruction address.
In the embodiment of the present invention, Xilinx SDK (Software Development Kit) is used as Development Software for interaction between the PS side and the PL side. After the PS terminal based on ARM implementation acquires the video stream or the picture, the PL terminal based on FPGA implementation can be called to enter a working mode.
The data transfer module includes a MM2S (Memory Mapped to Memory Mapped) module and a S2MM (Memory Mapped to Memory Mapped) module, the MM2S module is used for transferring data from the storage unit to the AXI Stream domain, and the S2MM module is used for transferring data from the AXI Stream domain to the AXI domain, and further has a reset block and an error signal. The MM2S module and the S2MM module operate independently in full duplex mode, the size of the address file allocated by the data transfer module is limited to 4KB, partition scheduling can be automatically performed, and the function of using the bandwidth of all AXI4 streams and operating a plurality of transmission requests is realized. The data transfer module provides byte-level data transfers and allows read memory transfers to the location of the specified address. Each MM2S module and S2MM module has a separate command interface, and the received commands are added from one end in one clock cycle, and simultaneously, the width of a command word is optimized during design, and the compatibility of high-speed data transmission of each part is maintained. Specifically, if the system uses a 32-bit AXI address, the command word is 72 bits wide. However, if the system address space is greater than 32 bits, the width of the command word will be extended to the required byte width. For example, a 64-bit address system requires a command word that is 104 bits wide to accommodate the wider initial field. The command interface is an AXI4-Stream interface, so the system address space should be an integer multiple of 8. If the address space is configured with 33 bits, the partial address in the command should be padded with 40 bits, which is done to maintain the compatibility of high-speed data transmission, wherein the data stream formats of the MM2S module or the S2MM module are the same. The command format allows a single-bit data transfer from 1 byte to 8,388,607 bytes to be specified. The communication module automatically breaks down the large amount of data that needs to be transmitted into sizes that comply with the requirements of the AXI4 protocol.
In one possible embodiment, the deep neural network module may include: a neuron module; the neuron module is used for reading the network parameters stored in the storage unit and training the deep neural network in the deep neural network module.
Illustratively, the neuron module includes a multiplier, an accumulator, and a finite state machine, wherein the finite state machine is used as an activator. The neural module trains the deep neural network by using the network parameters to obtain the network weight and the network deviation, and the obtained network weight and the network deviation are converted into binary values by the neural module and stored in the storage unit.
In an embodiment of the present invention, the deep neural network in the deep neural network module is a multilayer perceptron, the multilayer perceptron is designed under a fixed-point digital system, and the digital type in the fixed-point digital system includes a positive number and a negative number, wherein the negative number indicates that the initial system is two complements. The multilayer perceptron is connected with a current neuron and a previous neuron, the input data is multiplied by a multiplier, the result is stored in an accumulator and then transmitted to the next neuron, and the iteration and the accumulation are carried out in sequence. In other embodiments, the deep neural network may also be a convolutional neural network, a yolo (young Look Only one) network, an ssd (single Shot multi box detector) network, and the like. The activation function used in the deep neural network is a sigmoid function.
The multilayer perceptron is composed of an input layer, a hidden layer, a full connection layer and an output layer, and in order to meet the actual resource and performance requirements of development of the board veneer, the number of nerve cells of each layer needs to be reasonably selected.
In one possible embodiment, the communication module may include: an AXI Stream slave station, an AXI Lite slave station, and an AXI Stream master station; the AXI Stream slave station is used for receiving a transmission command from the user logic and controlling the transfer operation of the data transfer module according to the transmission command; a lookup table is embedded in the AXI Lite slave station, a neuron module reads data in the lookup table, and a coefficient of a deep neural network is processed on a neuron; the AXI Stream master station is used for transmitting the data output by the deep neural network module to the data transfer module.
Illustratively, the AXI4 Bus protocol is the most important part of the amba (advanced Microcontroller Bus architecture)3.0 protocol proposed by ARM corporation, and is an on-chip Bus oriented to high performance, high bandwidth, and low latency. The bus commonly used comprises AXI4-Lite and AXI4-Stream, the AXI4-Lite is a lightweight address mapping word transmission interface, the occupied logic units are few, the AXI4-Stream is oriented to high-speed Stream data transmission, and the unlimited data burst transmission size is allowed because the address items are removed.
In the embodiment of the present invention, the AXI Lite slave station is a self-made module under Vivado software, in which a new mapping interface is set, and a lookup table is also embedded therein. In order to prevent the partial sentence from being omitted comprehensively, an (, don't _ touch ═ future') sentence is also added.
When the communication module is designed, the AXI4-Stream protocol and the AXI Memory mapping IP core are combined together, and the data can be sent to and from the storage unit by using the DMA (Direct Memory Access) technology and the IP core of the AXI protocol. In various portions of the communication module, the AXI Stream master and slave stations are memory mapped, and various IP cores interconnected with the Xilinx AXI include the AXI Stream master and slave stations, which may be used to exchange data between one or more AXI master-slave machines.
In one possible embodiment, the PL side further comprises: a BRAM module; the BRAM module is used for accelerating the speed of data passing through the deep neural network module by adopting a double-channel input and output port technology.
Illustratively, BRAM, i.e., Block Memory, is a PL-side RAM Memory of ZYNQ. The BRAM module is called by the PS end through an instruction address. Due to the adoption of the dual-channel input and output port technology, the speed of data passing through an IP core in the deep neural network module is two times faster than that of data passing through a single-channel BRAM.
The deep neural network needs to be initialized before being used, the initialization operation is controlled by a PS (packet switching) end, and the PS end loads the network deviation and the network weight into a header file of a C language by utilizing Matlab and Python scripts, so that the coefficients of the deep neural network can be conveniently initialized and called, and can be conveniently sent to a BRAM (block-independent cache management) module for loading.
In one possible embodiment, the deep neural network module includes: a data path module and a control path module; the data path module includes: the input routing network is used for routing data input into the deep neural network module to a proper functional module, the functional module is used for solving arithmetic, logic and relational operation, and the result routing network is used for routing and storing the data output by the functional unit into the storage unit; the control path module is used for managing the execution sequence of each part in the data path module.
Illustratively, the deep neural network module is designed by adopting a method of comprehensively designing a control unit and a control path, a data unit and a data path, so that the implementation system allows larger control behaviors, wherein each state of a finite state machine determines the state of one data path, and a condition is calculated in the data path to determine the next state of the state machine.
The interface included in the data path module is provided with an input data interface for inputting data to be processed; the output data interface provides an interface of a data processing result; a control interface for the control path to be taken out and used as a control signal for the data path; a state output interface that outputs its current state to the control path.
Since most of the computation is in the process of data conversion, and this process uses multiple processing steps, the control path module will perform these steps in multiple clocks within the hardware system implementing the process. The control path module comprises interfaces including a state input interface and a state output interface, wherein the state input interface inputs data from the control interface; the state output interface is used for informing the current state of the system to the off-chip environment.
In one possible embodiment, the PL side further comprises: a control module; the control module is used for controlling the time sequence of each module in the deep neural network module.
Illustratively, the control module includes a state machine implemented with 12 states, 3 independent processes, 2 sequential statements, and 1 combined statement. Among the 12 states of the state machine, part of the states are used as idle (integrated Development and Learning environment) states, some are used for synchronous states, and others are used as states for executing processing data.
In a possible embodiment, the PS terminal obtains a video stream or picture from the camera, performs preprocessing on the video stream or picture, and sends a result of the preprocessing to the PL terminal.
Illustratively, the preprocessing of the video stream or picture by the PS side includes: and after converting the video stream into a picture, converting the picture into a gray image together with the picture acquired from the camera, compressing the gray image and sending the compressed gray image to a PL (provider line) terminal.
Specifically, in the preprocessing process, the PS terminal converts the RGB format picture into a grayscale image of 0 to 255 grayscales, and scales the grayscale image at a scale value of 4.
In an embodiment of the present invention, Xilinx SDK tool is used to write pre-defined values such as input values, hidden layer neuron number, number of hidden layers, output values, network bias size and network weight size, and pointers to previous layers, current layers, network bias, network weights, hidden layer storage and various initial values of the pre-processed image.
In one possible embodiment, the memory unit is a DDR memory.
Illustratively, the DDR memory is Double Data Rate SDRAM, i.e., Double Rate synchronous dynamic random access memory.
In a possible embodiment, the platform further comprises a display, and the PL terminal further comprises a display module; and after the data transfer module sends the target recognition result output by the deep neural network module to the display module, the display module controls the display to display the target recognition result.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. A system for target detection based on a fully programmable system on a chip, comprising: the PL terminal and the PS terminal are respectively realized based on an FPGA and an ARM;
the PS terminal is used for acquiring video streams or pictures and sending the video streams or pictures to the PL terminal;
the PL terminal includes: the system comprises a communication module, a data transfer module and a deep neural network module;
the communication module is used for sending the video stream or the picture acquired by the PS terminal to the data transfer module;
the data transfer module is used for storing the received video stream or picture in a storage unit and sending the video stream or picture stored in the storage unit to the deep neural network module;
the deep neural network module is used for carrying out target identification according to the video stream or the picture.
2. The system for target detection based on a fully programmable system on a chip of claim 1, wherein the deep neural network module comprises: a neuron module;
the neuron module is used for reading the network parameters stored in the storage unit and training the deep neural network in the deep neural network module.
3. The system for object detection based on system on a fully programmable chip as claimed in claim 2, wherein said communication module comprises: an AXI Stream slave station, an AXI Lite slave station, and an AXI Stream master station;
the AXI Stream slave station is used for receiving a transmission command from user logic and controlling the transfer operation of the data transfer module according to the transmission command;
a lookup table is embedded in the AXI Lite slave station, the neuron module reads data in the lookup table, and coefficients of the deep neural network are processed on neurons;
the AXI Stream master station is used for transmitting the data output by the deep neural network module to the data transfer module.
4. The system for target detection based on a fully programmable system on a chip of claim 1, wherein the PL side further comprises: a BRAM module;
the BRAM module is used for accelerating the speed of data passing through the deep neural network module by adopting a double-channel input and output port technology.
5. The system for target detection based on a fully programmable system on a chip of claim 1, wherein the deep neural network module comprises: a data path module and a control path module;
the datapath module includes: the input routing network is used for routing data input into the deep neural network module to the proper functional module, the functional module is used for solving arithmetic, logic and relational operation, and the result routing network is used for routing and storing the data output by the functional unit into the storage unit;
the control path module is used for managing the execution sequence of each part in the data path module.
6. The system for target detection based on system on a fully programmable chip as claimed in claim 5, wherein said PL side further comprises: a control module;
the control module is used for controlling the time sequence of each module in the deep neural network module.
7. The system according to claim 1, wherein the PS obtains the video stream or picture from a camera, pre-processes the video stream or picture, and sends the pre-processed result to the PL.
8. The system for object detection based on system on chip with full programming of claim 7, wherein the preprocessing of the video stream or picture by the PS end comprises: and after converting the video stream into a picture, converting the picture and the picture acquired from the camera into a gray image, compressing the gray image and sending the compressed gray image to the PL terminal.
9. The system for target detection based on system on chip with full programming of claim 1, wherein the memory unit is a DDR memory.
10. The system for target detection based on system on a fully programmable chip as claimed in claim 1, further comprising a display, wherein said PL side further comprises a display module;
and after the data transfer module sends the target recognition result output by the deep neural network module to the display module, the display module controls the display to display the target recognition result.
CN202110529675.5A 2021-05-14 2021-05-14 Target detection system based on fully programmable system-on-chip Active CN113139519B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110529675.5A CN113139519B (en) 2021-05-14 2021-05-14 Target detection system based on fully programmable system-on-chip

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110529675.5A CN113139519B (en) 2021-05-14 2021-05-14 Target detection system based on fully programmable system-on-chip

Publications (2)

Publication Number Publication Date
CN113139519A true CN113139519A (en) 2021-07-20
CN113139519B CN113139519B (en) 2023-12-22

Family

ID=76817331

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110529675.5A Active CN113139519B (en) 2021-05-14 2021-05-14 Target detection system based on fully programmable system-on-chip

Country Status (1)

Country Link
CN (1) CN113139519B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104954795A (en) * 2015-07-02 2015-09-30 东南大学 Image acquisition and transmission system based on JPEG2000
CN107766812A (en) * 2017-10-12 2018-03-06 东南大学—无锡集成电路技术研究所 A kind of real-time face detection identifying system based on MiZ702N
CN109389120A (en) * 2018-10-29 2019-02-26 济南浪潮高新科技投资发展有限公司 A kind of object detecting device based on zynqMP
CN109820524A (en) * 2019-03-22 2019-05-31 电子科技大学 The acquisition of self-closing disease eye movement characteristics and classification wearable system based on FPGA
CN110175670A (en) * 2019-04-09 2019-08-27 华中科技大学 A kind of method and system for realizing YOLOv2 detection network based on FPGA
CN110390626A (en) * 2019-07-02 2019-10-29 深兰科技(上海)有限公司 A kind of image processing method and device of convolutional neural networks
CN110717852A (en) * 2019-06-13 2020-01-21 内蒙古大学 FPGA-based field video image real-time segmentation system and method
CN110851255A (en) * 2019-11-07 2020-02-28 之江实验室 Method for processing video stream based on cooperation of terminal equipment and edge server
KR20200049428A (en) * 2018-10-23 2020-05-08 정민 황 Driver fatigue monitoring and control system based on FPGA + ARM control
US20200151088A1 (en) * 2018-11-14 2020-05-14 The Mathworks, Inc. Systems and methods for configuring programmable logic devices for deep learning networks
CN111459877A (en) * 2020-04-02 2020-07-28 北京工商大学 FPGA (field programmable Gate array) acceleration-based Winograd YO L Ov2 target detection model method
CN111967468A (en) * 2020-08-10 2020-11-20 东南大学 FPGA-based lightweight target detection neural network implementation method

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104954795A (en) * 2015-07-02 2015-09-30 东南大学 Image acquisition and transmission system based on JPEG2000
CN107766812A (en) * 2017-10-12 2018-03-06 东南大学—无锡集成电路技术研究所 A kind of real-time face detection identifying system based on MiZ702N
KR20200049428A (en) * 2018-10-23 2020-05-08 정민 황 Driver fatigue monitoring and control system based on FPGA + ARM control
CN109389120A (en) * 2018-10-29 2019-02-26 济南浪潮高新科技投资发展有限公司 A kind of object detecting device based on zynqMP
US20200151088A1 (en) * 2018-11-14 2020-05-14 The Mathworks, Inc. Systems and methods for configuring programmable logic devices for deep learning networks
CN109820524A (en) * 2019-03-22 2019-05-31 电子科技大学 The acquisition of self-closing disease eye movement characteristics and classification wearable system based on FPGA
CN110175670A (en) * 2019-04-09 2019-08-27 华中科技大学 A kind of method and system for realizing YOLOv2 detection network based on FPGA
CN110717852A (en) * 2019-06-13 2020-01-21 内蒙古大学 FPGA-based field video image real-time segmentation system and method
CN110390626A (en) * 2019-07-02 2019-10-29 深兰科技(上海)有限公司 A kind of image processing method and device of convolutional neural networks
CN110851255A (en) * 2019-11-07 2020-02-28 之江实验室 Method for processing video stream based on cooperation of terminal equipment and edge server
CN111459877A (en) * 2020-04-02 2020-07-28 北京工商大学 FPGA (field programmable Gate array) acceleration-based Winograd YO L Ov2 target detection model method
CN111967468A (en) * 2020-08-10 2020-11-20 东南大学 FPGA-based lightweight target detection neural network implementation method

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
孙志豪等: "基于ARM和FPGA的Sobel边缘检测异构系统", 《电脑知识与技术》 *
孙志豪等: "基于ARM和FPGA的Sobel边缘检测异构系统", 《电脑知识与技术》, vol. 12, no. 34, 31 December 2016 (2016-12-31), pages 219 - 221 *
张玉婷: "基于卷积神经网络的手势识别算法优化及嵌入式实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
张玉婷: "基于卷积神经网络的手势识别算法优化及嵌入式实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》, vol. 2018, no. 12, 15 December 2018 (2018-12-15), pages 1 - 5 *
杨雨诺: "基于PYNQ的图像分类识别技术研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
杨雨诺: "基于PYNQ的图像分类识别技术研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》, vol. 2021, no. 03, 15 March 2021 (2021-03-15) *
王中正: "基于FPGA的运动目标检测研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
王中正: "基于FPGA的运动目标检测研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》, vol. 2020, no. 02, 15 February 2020 (2020-02-15), pages 135 - 732 *

Also Published As

Publication number Publication date
CN113139519B (en) 2023-12-22

Similar Documents

Publication Publication Date Title
CN110597559B (en) Computing device and computing method
CN109102065B (en) Convolutional neural network accelerator based on PSoC
CN109104876B (en) Arithmetic device and related product
EP3660628B1 (en) Dynamic voltage frequency scaling device and method
US11893424B2 (en) Training a neural network using a non-homogenous set of reconfigurable processors
US11392740B2 (en) Dataflow function offload to reconfigurable processors
CN109426574A (en) Distributed computing system, data transmission method and device in distributed computing system
US11886931B2 (en) Inter-node execution of configuration files on reconfigurable processors using network interface controller (NIC) buffers
CN104954795B (en) A kind of image acquisition transmission system based on JPEG2000
CN105518625A (en) Computation hardware with high-bandwidth memory interface
US20200117990A1 (en) High performance computing system for deep learning
CN113849293B (en) Data processing method, device, system and computer readable storage medium
US10747292B2 (en) Dynamic voltage frequency scaling device and method
US20220004873A1 (en) Techniques to manage training or trained models for deep learning applications
US11182264B1 (en) Intra-node buffer-based streaming for reconfigurable processor-as-a-service (RPaaS)
EP4044070A2 (en) Neural network processing unit, neural network processing method and device
US20220350598A1 (en) Instruction processing apparatus, acceleration unit, and server
US20220308935A1 (en) Interconnect-based resource allocation for reconfigurable processors
CN115456155A (en) Multi-core storage and calculation processor architecture
US20220318162A1 (en) Interpolation acceleration in a processor memory interface
US11082327B2 (en) System and method for computational transport network-on-chip (NoC)
CN113139519B (en) Target detection system based on fully programmable system-on-chip
US20220413804A1 (en) Efficient complex multiply and accumulate
CN112766475A (en) Processing unit and artificial intelligence processor
CN117632844A (en) Reconfigurable AI algorithm hardware accelerator

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant