WO2023121624A2

WO2023121624A2 - Ensemble learning with parallel artificial neural networks in embedded and integrated systems

Info

Publication number: WO2023121624A2
Application number: PCT/TR2022/051533
Authority: WO
Inventors: Caglar ARSLAN; Tacettin KOPRULU
Original assignee: Havelsan Hava Elektronik San. Ve Tic. A.S.
Priority date: 2021-12-22
Filing date: 2022-12-20
Publication date: 2023-06-29
Also published as: WO2023121624A3

Abstract

The present invention can be deployed in various configurations and can be realized using a hardware accelerator (201) and is capable of operating multiple neural networks (202) in a single embedded/integrated system (200) simultaneously and in parallel, or optionally by instantaneously switching from one model to another pre-trained model. The present invention is capable of operating neural network (202) models in a single embedded/integrated system (200), for example in a single or multiple integrated circuits, or in a plurality of single board monolithic computers (205) connected to each other via a network, or in hybrid deployable structures. The data (101) received from a data source (100) specific to a data type can be directed to the related neural networks (202) in order to be classified or interpreted, and the detection results (300) can be collected synchronously at the same time or asynchronously at different times, and the evaluated detection results (302) to be obtained by processing the results with a processor (301) structure of the user's choice, thereby increasing the overall classification success and providing high efficiency in terms of power consumption compared to desktop-like computing environments comprising traditional conventional GPUs.

Description

ENSEMBLE LEARNING WITH PARALLEL ARTIFICIAL NEURAL NETWORKS IN EMBEDDED AND INTEGRATED SYSTEMS

Technical Field

The present invention relates to an embedded or integrated system for executing one or more neural networks, in particular parallel neural networks for ensemble learning, and a method for operating the system.

The invention can be deployed in various configurations and can be realized using hardware accelerators (i.e. FPGA - field-programmable gate array, SoC - system on chip, ASIC - application-specific integrated circuit, VPU - vision processing unit, etc.) and is capable of operating multiple neural networks in a single embedded/integrated system simultaneously and in parallel, or optionally by instantaneously switching from one model to another pre-trained model. The present invention is capable of operating neural network models in a single embedded/integrated system, for example in a single or multiple integrated circuits (FPGA, SoC, ASIC, VPU, etc.), or in a plurality of single board monolithic computers connected to each other via a network, or in hybrid deployable structures. The data received from a data source specific to a data type can be directed to the related neural networks in order to be classified or interpreted, and the detection results can be collected synchronously at the same time or asynchronously at different times, and evaluated detection results to be obtained by processing the results with a processor structure of the user’s choice, thereby increasing the overall classification success and providing high efficiency in terms of power consumption compared to desktop-like computing environments comprising traditional conventional GPUs.

Background

Machine learning applications are used to classify or interpret data. In order to increase the effectiveness of these applications, a large number of neural networks, for example, parallel neural networks, can be utilized. In particular, ensemble learning applications, where a plurality of learning models is used together, promise high classification and interpretation performance. However, the combination of multiple neural networks requires high system resources and increases energy consumption. First of all, it should be noted that an artificial neural network is defined as a set of interconnected layers. An artificial neural network model is formed from the whole of the layers. The present invention does not refer to the parallelism of layers, but to the independent parallelism of models. It should also be noted that within the scope of the present invention, the parallelism of models can be formed, as well as complex serial and parallel model configurations can be formed in series or in a blend, one after another.

In document numbered W02017003830A1, hardware comprising an accelerator component and a memory component is described. The utilization of distributed resources and communication with external components are also discussed. Parallel neural network engines operating on the accelerator component are described. However, the parallel neural network engines mentioned in W02017003830A1 are not the separate independently operating parallel neural network models mentioned in the present invention.

In the document numbered CN113627163 A, an apparatus that may comprise a processor, an accelerator and a memory is described. A parallelization operation involving layers of artificial neural networks is also described. However, the study described in CN113627163A is not in the form of parallel neural network models operating separately and independently of each other as described in the present invention.

In the chapter titled “FPGA based neural network accelerators” in the book titled “Advances in Computers” (Joo-Young Kim, Chapter Five - FPGA based neural network accelerators, Editor(s): Shiho Kim, Ganesh Chandra Deka, Advances in Computers, Elsevier, Volume 122, 2021, Pages 135-165, https://doi.Org/10.1016/bs.adcom.2020.l l.002.), the use of FPGA circuits as accelerators for operating neural network execution is described. Examples of FPGA circuits communicating with each other, with processors and with devices on the network are described. Various memory configurations are also described. However, parallel neural networks are not discussed in this study.

In the publication entitled “Cnnlab: a novel parallel framework for neural networks using gpu and fpga-a practical study with trade-off analysis” (Zhu, Maohua, et al. “Cnnlab: a novel parallel framework for neural networks using gpu and fpga-a practical study with trade-off analysis.” arXiv preprint arXiv: 1606.06234 (2016) a framework developed in response to the challenges of parallel execution of neural network layers is described. Said study is not in the form of the separate and independent parallel neural network models described in the present invention.

Objects and Brief Description of the Invention

The object of the present invention is to develop solutions for ensemble learning utilizing parallel artificial neural networks in embedded or integrated systems. Accordingly, an embedded or integrated system suitable for the execution of parallel neural networks and a method for operating said system have been developed.

The system developed by the invention has an architecture comprising memory modules, at least one artificial neural network and at least one hardware accelerator and/or one or more computers. A system built according to this architecture receives data from a data source as input and the detection results generated as output are presented to a processor for evaluation.

Detailed Description of the Invention

The system realized for achieving the objects of the present invention is shown in the attached figures.

Fig- 1 A schematic view of a system according to the invention.

The parts in the figures are numbered one by one and the corresponding numbers are given below.

100. Data source

101. Data

200. System

201. Hardware accelerator

202. Artificial neural networks

203. Memory

204. Model coefficients

205. Computer

300. Detection results 301. Processor

302. Evaluated detection results

The system (200) subject to the invention for operating one or more artificial neural networks (202) in parallel, in series or in a combination thereof essentially comprises at least one hardware accelerator (201) and/or at least one computer (205) which receives data (101) received from at least one data source (100) as input to the artificial neural networks (202). Said computer(s) (205) may be integrated with the hardware accelerator s) (201), connected via a network, or a combination thereof in a hybrid structure. The detection results (300) obtained as output by the system by operating the artificial neural network (202) are also transmitted to a processor (301), and the processor (301) generates the evaluated detection results (302). In ensemble learning applications of the invention, the evaluated detection results (302) may be ensemble detection results corresponding to the outputs of multiple learning ensemble learning applications.

The method subject to the invention essentially comprises the following steps executed by the system (200) comprising at least one hardware accelerator (201) and/or at least one computer (205); a. receiving the data (101) from at least one data source (100), b. processing the data (101) in artificial neural networks (202) and c. receiving detection results (300) from artificial neural networks (202).

In a preferred application of the invention, the method further comprises the following steps applied after the above-mentioned steps; d. transmitting the received detection results (300) to a processor (301), e. performing operations of associating and inferring on detection results (300), f. generating the evaluated detection results (302).

The data source (100) can be specific to any data type. In this context, data types can be, for example, an audio recording, an image recording, a video sequence, but also a time series data (101) or a spectral data (101) sequence. Within this invention, a data type should be considered as a structure that has a certain integrity within itself, expresses a certain phenomenon and contains meaning. For this reason, in addition to the data types given as examples, it can be any structure that meets the specified criteria.

The first input point of the data (101) is the system (200). The embedded/integrated system (200) may be a network of computers (205) that are connected to each other via a network, as well as a structure with multiple hardware accelerators (201) working in coordination with each other, or a hybrid structure comprising said networks of connected computers (205) and hardware accelerators (201).

The most critical component of the system (200) is the artificial neural networks (202) and these may be located on the hardware accelerator (201) connected to the system (200), or defined on one or more memories (203) belonging to a computer (205) within the system (200) or to computers (205) connected via a network. Said computers (205) may also comprise hardware accelerators (201). The invention may be implemented with volatile or non-volatile memories (203) or combinations thereof. The memories (203) may also be in the form of an external module.

The said computers (205) can be single-board monolithic structures and can also include hardware accelerators (201) (FPGA, SoC, ASIC, VPU, etc.).

One or more artificial neural networks (202) is/are supported by the invention. The layer coefficient values (204) of the models defining the artificial neural networks (202) are stored in the memory (203) module(s) in the system (200). During startup, the system extracts the model coefficients (204) of the neural networks (202) from these memory (203) modules in order to perform the function of the neural networks (202). This process can take place during the startup of the system (200), but also during the operation of the system (200).

The model coefficients (204) can be loaded into the memory (203) modules before the system (200) is put into operation or functioning. The memory (203) modules may be standalone within the system (200), or they may be embedded in the hardware accelerator(s) (201) in a separate software, software-defined hardware, or purely hardwarebased structure. In addition, the mentioned model coefficients (204) may also be defined in non-volatile memory (203) modules defined in the computer(s) (205). In order to reliably use the model coefficients (204) that can be stored on different memories (203), the model coefficients (204) of specific neural networks (202) should be correctly addressed to the memory modules (203). This task can be performed by a single CPU in the embedded/integrated system (200) or by a software-defined CPU connected to a hardware accelerator(s) (201) or by a full hardware CPU. The hardware CPU referred to herein may be a computer (205) or a CPU defined within an SoC and operating in coordination with the hardware accelerator (201) belonging to the SoC and connected to SoC. It should be further noted that CPU and FPGA integrated systems, i.e. SoC integrations, are also within the scope of hardware accelerators (201) according to the present invention. Furthermore, in cases where CPUs and hardware accelerators (201) are used together, CPUs and hardware accelerators (201) of one or more separate computers (205) included in the system (200) may also be used. Systems (200) containing only hardware accelerators (201) may be used, as well as systems containing CPUs and hardware accelerators (201) in a single computer (205). Single board computers (205) containing CPU and hardware accelerators (201) in a certain part and hybrid structures with only hardware accelerators

(201) in a certain part can also be included in the system (200). Due to this structure, the embedded/integrated system (200) can be in the form of a simple SoC or an NoC (network-on-chip).

Artificial neural networks (202) are tasked with generating detection results (300). Generating and directing the detection results (300) by operating the neural networks (202) can be realized by means of hardware accelerator(s) (201) or computer(s) (205). The detection results (300) may include a certain classification, interpretation, approximation or prediction performed on a certain region of the data structure or on the entire data structure on the incoming raw data (101). The artificial neural networks (202) may be independent of each other, multiple and one by one, or they may be blocks of artificial neural networks

(202) responsible for a specific task in a serial structure connected to each other one after another. The serial neural networks (202) may also include artificial sub-neural networks (202) operating in parallel in addition to the serial structures. In other words, neural networks (202) can also have complex embodiments consisting of multiple neural networks (202) in parallel with each other and in both series and parallel.

The mechanism of the processor (301), which will process the outputs of one or more neural networks (202) generating the detection results (300), can be set by an operator in accordance with the application field and objectives. The main task of the processor (301) is to correlate the different detection results (300) from different neural networks (202) to increase the overall detection result quality and consistency. For this reason, according to the invention, the functioning structure of the processor (301) is adjusted according to the choice of artificial neural networks (202) to be used in an embedded/integrated system (200). The results/outputs that are associated with each other or compiled collectively by the processor (301) are called evaluated detection results (302).

With the invention, single or multiple neural network (202) architectures can be operated and ensemble learning functions can be realized with multiple neural networks (202).

The evaluated detection results (302) obtained by the system and method subject to the invention can be used as input in other artificial intelligence systems, for example, for scene interpretation and motion/scene/situation prediction to increase situational awareness.

Experiments have been carried out with some example systems (200) according to the invention and the effectiveness of the invention has been demonstrated. Thanks to the implementation of parallel artificial neural networks (202) in embedded systems (202), for example, it has been observed that 2 artificial neural network (205) models trained in an embedded system (200) supported by an FPGA are 100-150 times more efficient in terms of power consumption compared to the power consumption of 2 GPU supported computers (205) and are equivalent in terms of performance. In another example of a configuration that can be made within the scope of the invention, an embedded system structure that can operate 2 models supported by ASIC/VPU and comprises a single board computer (205) has provided 300 times more efficiency in terms of power consumption compared to computers (205) containing GPUs and has shown half the performance. In another possible configuration, multiple neural network (202) models can be operated using multiple singleboard computers (205) connected to each other with a multi-core and low-power consumption network within the scope of the present invention.

The structures of various configurations realized according to the present invention pave the way for the application of artificial intelligence in low-power consumption mobile systems (200) (portable mobile devices, unmanned systems, wearable systems, satellite systems, etc.).

Claims

CLAIMS A method for operating one or more artificial neural networks (202) in parallel, in series or in a combination thereof, executed by a system (200) comprising at least one hardware accelerator (201) and/or at least one computer (205), characterized in that it comprises the following steps; a. receiving the data (101) from at least one data source (100), b. processing the data (101) in artificial neural networks (202) and c. receiving detection results (300) from artificial neural networks (202). The method according to claim 1, characterized in that it comprises the following steps executed after the step of receiving detection results (300) from artificial neural networks (202); d. transmitting the received detection results (300) to a processor (301), e. performing operations of associating and inferring on detection results (300), f. generating the evaluated detection results (302). The method according to claim 1, wherein the data source (100) is specific to a data type selected from an audio recording, an image recording, a video sequence, a time series data and a spectral data sequence. The method according to claim 1, wherein a plurality of independent artificial neural networks (202) is used. The method according to claim 1, wherein the blocks of neural networks (202) are used which are connected to each other one after another in a serial structure and are responsible for a specific task. The method according to claim 5, wherein the blocks of neural network (202) in serial structure comprise artificial sub-neural networks (202) operating in parallel. A system (200) for operating one or more artificial neural networks (202) in parallel, in series or in a combination thereof comprising at least one hardware accelerator (201) and/or at least one computer (205).

8

8. The system (200) according to claim 7, wherein the computer(s) (205) is/are in an integrated structure with the hardware accelerator s) (201).

9. The system (200) according to claim 7, wherein the computer(s) (205) is/are connected to the hardware accelerator(s) (201) via a network. 10. The system (200) according to claim 7, wherein the computers (205) are of a single board monolithic structure.

11. The system (200) according to claim 10, wherein the computers (205) comprise hardware accelerators (201).

12. The system (200) according to claim 7, wherein it comprises at least one memory (203), whether volatile or non-volatile, in which neural networks (202) are defined, located in the hardware accelerator (201), in a computer (205) or in computers (205) that are connected to each other via a network.

13. The system (200) according to claim 7, wherein it is in the form of an SoC (system- on-chip) or an NoC (network-on-chip).

9