WO2023121624A2 - Ensemble learning with parallel artificial neural networks in embedded and integrated systems - Google Patents
Ensemble learning with parallel artificial neural networks in embedded and integrated systems Download PDFInfo
- Publication number
- WO2023121624A2 WO2023121624A2 PCT/TR2022/051533 TR2022051533W WO2023121624A2 WO 2023121624 A2 WO2023121624 A2 WO 2023121624A2 TR 2022051533 W TR2022051533 W TR 2022051533W WO 2023121624 A2 WO2023121624 A2 WO 2023121624A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- neural networks
- detection results
- data
- artificial neural
- network
- Prior art date
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 65
- 238000001514 detection method Methods 0.000 claims abstract description 28
- 238000012545 processing Methods 0.000 claims abstract description 5
- 230000015654 memory Effects 0.000 claims description 16
- 238000000034 method Methods 0.000 claims description 13
- 230000003595 spectral effect Effects 0.000 claims description 2
- 238000003062 neural network model Methods 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7807—System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7807—System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
- G06F15/7825—Globally asynchronous, locally synchronous, e.g. network on chip
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7867—Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Definitions
- the present invention relates to an embedded or integrated system for executing one or more neural networks, in particular parallel neural networks for ensemble learning, and a method for operating the system.
- the invention can be deployed in various configurations and can be realized using hardware accelerators (i.e. FPGA - field-programmable gate array, SoC - system on chip, ASIC - application-specific integrated circuit, VPU - vision processing unit, etc.) and is capable of operating multiple neural networks in a single embedded/integrated system simultaneously and in parallel, or optionally by instantaneously switching from one model to another pre-trained model.
- the present invention is capable of operating neural network models in a single embedded/integrated system, for example in a single or multiple integrated circuits (FPGA, SoC, ASIC, VPU, etc.), or in a plurality of single board monolithic computers connected to each other via a network, or in hybrid deployable structures.
- the data received from a data source specific to a data type can be directed to the related neural networks in order to be classified or interpreted, and the detection results can be collected synchronously at the same time or asynchronously at different times, and evaluated detection results to be obtained by processing the results with a processor structure of the user’s choice, thereby increasing the overall classification success and providing high efficiency in terms of power consumption compared to desktop-like computing environments comprising traditional conventional GPUs.
- Machine learning applications are used to classify or interpret data.
- a large number of neural networks for example, parallel neural networks
- ensemble learning applications where a plurality of learning models is used together, promise high classification and interpretation performance.
- the combination of multiple neural networks requires high system resources and increases energy consumption.
- an artificial neural network is defined as a set of interconnected layers.
- An artificial neural network model is formed from the whole of the layers.
- the present invention does not refer to the parallelism of layers, but to the independent parallelism of models.
- the parallelism of models can be formed, as well as complex serial and parallel model configurations can be formed in series or in a blend, one after another.
- W02017003830A1 hardware comprising an accelerator component and a memory component is described. The utilization of distributed resources and communication with external components are also discussed. Parallel neural network engines operating on the accelerator component are described. However, the parallel neural network engines mentioned in W02017003830A1 are not the separate independently operating parallel neural network models mentioned in the present invention.
- CN113627163 A an apparatus that may comprise a processor, an accelerator and a memory is described.
- a parallelization operation involving layers of artificial neural networks is also described.
- the study described in CN113627163A is not in the form of parallel neural network models operating separately and independently of each other as described in the present invention.
- the object of the present invention is to develop solutions for ensemble learning utilizing parallel artificial neural networks in embedded or integrated systems. Accordingly, an embedded or integrated system suitable for the execution of parallel neural networks and a method for operating said system have been developed.
- the system developed by the invention has an architecture comprising memory modules, at least one artificial neural network and at least one hardware accelerator and/or one or more computers.
- a system built according to this architecture receives data from a data source as input and the detection results generated as output are presented to a processor for evaluation.
- Fig- 1 A schematic view of a system according to the invention.
- the system (200) subject to the invention for operating one or more artificial neural networks (202) in parallel, in series or in a combination thereof essentially comprises at least one hardware accelerator (201) and/or at least one computer (205) which receives data (101) received from at least one data source (100) as input to the artificial neural networks (202).
- Said computer(s) (205) may be integrated with the hardware accelerator s) (201), connected via a network, or a combination thereof in a hybrid structure.
- the detection results (300) obtained as output by the system by operating the artificial neural network (202) are also transmitted to a processor (301), and the processor (301) generates the evaluated detection results (302).
- the evaluated detection results (302) may be ensemble detection results corresponding to the outputs of multiple learning ensemble learning applications.
- the method subject to the invention essentially comprises the following steps executed by the system (200) comprising at least one hardware accelerator (201) and/or at least one computer (205); a. receiving the data (101) from at least one data source (100), b. processing the data (101) in artificial neural networks (202) and c. receiving detection results (300) from artificial neural networks (202).
- the method further comprises the following steps applied after the above-mentioned steps; d. transmitting the received detection results (300) to a processor (301), e. performing operations of associating and inferring on detection results (300), f. generating the evaluated detection results (302).
- the data source (100) can be specific to any data type.
- data types can be, for example, an audio recording, an image recording, a video sequence, but also a time series data (101) or a spectral data (101) sequence.
- a data type should be considered as a structure that has a certain integrity within itself, expresses a certain phenomenon and contains meaning. For this reason, in addition to the data types given as examples, it can be any structure that meets the specified criteria.
- the first input point of the data (101) is the system (200).
- the embedded/integrated system (200) may be a network of computers (205) that are connected to each other via a network, as well as a structure with multiple hardware accelerators (201) working in coordination with each other, or a hybrid structure comprising said networks of connected computers (205) and hardware accelerators (201).
- the most critical component of the system (200) is the artificial neural networks (202) and these may be located on the hardware accelerator (201) connected to the system (200), or defined on one or more memories (203) belonging to a computer (205) within the system (200) or to computers (205) connected via a network. Said computers (205) may also comprise hardware accelerators (201).
- the invention may be implemented with volatile or non-volatile memories (203) or combinations thereof.
- the memories (203) may also be in the form of an external module.
- the said computers (205) can be single-board monolithic structures and can also include hardware accelerators (201) (FPGA, SoC, ASIC, VPU, etc.).
- hardware accelerators (201) FPGA, SoC, ASIC, VPU, etc.
- One or more artificial neural networks (202) is/are supported by the invention.
- the layer coefficient values (204) of the models defining the artificial neural networks (202) are stored in the memory (203) module(s) in the system (200).
- the system extracts the model coefficients (204) of the neural networks (202) from these memory (203) modules in order to perform the function of the neural networks (202). This process can take place during the startup of the system (200), but also during the operation of the system (200).
- the model coefficients (204) can be loaded into the memory (203) modules before the system (200) is put into operation or functioning.
- the memory (203) modules may be standalone within the system (200), or they may be embedded in the hardware accelerator(s) (201) in a separate software, software-defined hardware, or purely hardwarebased structure.
- the mentioned model coefficients (204) may also be defined in non-volatile memory (203) modules defined in the computer(s) (205). In order to reliably use the model coefficients (204) that can be stored on different memories (203), the model coefficients (204) of specific neural networks (202) should be correctly addressed to the memory modules (203).
- This task can be performed by a single CPU in the embedded/integrated system (200) or by a software-defined CPU connected to a hardware accelerator(s) (201) or by a full hardware CPU.
- the hardware CPU referred to herein may be a computer (205) or a CPU defined within an SoC and operating in coordination with the hardware accelerator (201) belonging to the SoC and connected to SoC.
- CPU and FPGA integrated systems i.e. SoC integrations, are also within the scope of hardware accelerators (201) according to the present invention.
- CPUs and hardware accelerators (201) are used together, CPUs and hardware accelerators (201) of one or more separate computers (205) included in the system (200) may also be used.
- Systems (200) containing only hardware accelerators (201) may be used, as well as systems containing CPUs and hardware accelerators (201) in a single computer (205).
- the embedded/integrated system (200) can be in the form of a simple SoC or an NoC (network-on-chip).
- Artificial neural networks (202) are tasked with generating detection results (300). Generating and directing the detection results (300) by operating the neural networks (202) can be realized by means of hardware accelerator(s) (201) or computer(s) (205).
- the detection results (300) may include a certain classification, interpretation, approximation or prediction performed on a certain region of the data structure or on the entire data structure on the incoming raw data (101).
- the artificial neural networks (202) may be independent of each other, multiple and one by one, or they may be blocks of artificial neural networks
- serial neural networks (202) responsible for a specific task in a serial structure connected to each other one after another.
- the serial neural networks (202) may also include artificial sub-neural networks (202) operating in parallel in addition to the serial structures.
- neural networks (202) can also have complex embodiments consisting of multiple neural networks (202) in parallel with each other and in both series and parallel.
- the mechanism of the processor (301), which will process the outputs of one or more neural networks (202) generating the detection results (300), can be set by an operator in accordance with the application field and objectives.
- the main task of the processor (301) is to correlate the different detection results (300) from different neural networks (202) to increase the overall detection result quality and consistency. For this reason, according to the invention, the functioning structure of the processor (301) is adjusted according to the choice of artificial neural networks (202) to be used in an embedded/integrated system (200).
- the results/outputs that are associated with each other or compiled collectively by the processor (301) are called evaluated detection results (302).
- single or multiple neural network (202) architectures can be operated and ensemble learning functions can be realized with multiple neural networks (202).
- the evaluated detection results (302) obtained by the system and method subject to the invention can be used as input in other artificial intelligence systems, for example, for scene interpretation and motion/scene/situation prediction to increase situational awareness.
- the structures of various configurations realized according to the present invention pave the way for the application of artificial intelligence in low-power consumption mobile systems (200) (portable mobile devices, unmanned systems, wearable systems, satellite systems, etc.).
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Neurology (AREA)
- Image Analysis (AREA)
- Ultra Sonic Daignosis Equipment (AREA)
- Test And Diagnosis Of Digital Computers (AREA)
- Power Sources (AREA)
Abstract
The present invention can be deployed in various configurations and can be realized using a hardware accelerator (201) and is capable of operating multiple neural networks (202) in a single embedded/integrated system (200) simultaneously and in parallel, or optionally by instantaneously switching from one model to another pre-trained model. The present invention is capable of operating neural network (202) models in a single embedded/integrated system (200), for example in a single or multiple integrated circuits, or in a plurality of single board monolithic computers (205) connected to each other via a network, or in hybrid deployable structures. The data (101) received from a data source (100) specific to a data type can be directed to the related neural networks (202) in order to be classified or interpreted, and the detection results (300) can be collected synchronously at the same time or asynchronously at different times, and the evaluated detection results (302) to be obtained by processing the results with a processor (301) structure of the user's choice, thereby increasing the overall classification success and providing high efficiency in terms of power consumption compared to desktop-like computing environments comprising traditional conventional GPUs.
Description
ENSEMBLE LEARNING WITH PARALLEL ARTIFICIAL NEURAL NETWORKS IN EMBEDDED AND INTEGRATED SYSTEMS
Technical Field
The present invention relates to an embedded or integrated system for executing one or more neural networks, in particular parallel neural networks for ensemble learning, and a method for operating the system.
The invention can be deployed in various configurations and can be realized using hardware accelerators (i.e. FPGA - field-programmable gate array, SoC - system on chip, ASIC - application-specific integrated circuit, VPU - vision processing unit, etc.) and is capable of operating multiple neural networks in a single embedded/integrated system simultaneously and in parallel, or optionally by instantaneously switching from one model to another pre-trained model. The present invention is capable of operating neural network models in a single embedded/integrated system, for example in a single or multiple integrated circuits (FPGA, SoC, ASIC, VPU, etc.), or in a plurality of single board monolithic computers connected to each other via a network, or in hybrid deployable structures. The data received from a data source specific to a data type can be directed to the related neural networks in order to be classified or interpreted, and the detection results can be collected synchronously at the same time or asynchronously at different times, and evaluated detection results to be obtained by processing the results with a processor structure of the user’s choice, thereby increasing the overall classification success and providing high efficiency in terms of power consumption compared to desktop-like computing environments comprising traditional conventional GPUs.
Background
Machine learning applications are used to classify or interpret data. In order to increase the effectiveness of these applications, a large number of neural networks, for example, parallel neural networks, can be utilized. In particular, ensemble learning applications, where a plurality of learning models is used together, promise high classification and interpretation performance. However, the combination of multiple neural networks requires high system resources and increases energy consumption.
First of all, it should be noted that an artificial neural network is defined as a set of interconnected layers. An artificial neural network model is formed from the whole of the layers. The present invention does not refer to the parallelism of layers, but to the independent parallelism of models. It should also be noted that within the scope of the present invention, the parallelism of models can be formed, as well as complex serial and parallel model configurations can be formed in series or in a blend, one after another.
In document numbered W02017003830A1, hardware comprising an accelerator component and a memory component is described. The utilization of distributed resources and communication with external components are also discussed. Parallel neural network engines operating on the accelerator component are described. However, the parallel neural network engines mentioned in W02017003830A1 are not the separate independently operating parallel neural network models mentioned in the present invention.
In the document numbered CN113627163 A, an apparatus that may comprise a processor, an accelerator and a memory is described. A parallelization operation involving layers of artificial neural networks is also described. However, the study described in CN113627163A is not in the form of parallel neural network models operating separately and independently of each other as described in the present invention.
In the chapter titled “FPGA based neural network accelerators” in the book titled “Advances in Computers” (Joo-Young Kim, Chapter Five - FPGA based neural network accelerators, Editor(s): Shiho Kim, Ganesh Chandra Deka, Advances in Computers, Elsevier, Volume 122, 2021, Pages 135-165, https://doi.Org/10.1016/bs.adcom.2020.l l.002.), the use of FPGA circuits as accelerators for operating neural network execution is described. Examples of FPGA circuits communicating with each other, with processors and with devices on the network are described. Various memory configurations are also described. However, parallel neural networks are not discussed in this study.
In the publication entitled “Cnnlab: a novel parallel framework for neural networks using gpu and fpga-a practical study with trade-off analysis” (Zhu, Maohua, et al. “Cnnlab: a novel parallel framework for neural networks using gpu and fpga-a practical study with trade-off analysis.” arXiv preprint arXiv: 1606.06234 (2016) a framework developed in response to the challenges of parallel execution of neural network layers is described. Said
study is not in the form of the separate and independent parallel neural network models described in the present invention.
Objects and Brief Description of the Invention
The object of the present invention is to develop solutions for ensemble learning utilizing parallel artificial neural networks in embedded or integrated systems. Accordingly, an embedded or integrated system suitable for the execution of parallel neural networks and a method for operating said system have been developed.
The system developed by the invention has an architecture comprising memory modules, at least one artificial neural network and at least one hardware accelerator and/or one or more computers. A system built according to this architecture receives data from a data source as input and the detection results generated as output are presented to a processor for evaluation.
Detailed Description of the Invention
The system realized for achieving the objects of the present invention is shown in the attached figures.
Fig- 1 A schematic view of a system according to the invention.
The parts in the figures are numbered one by one and the corresponding numbers are given below.
100. Data source
101. Data
200. System
201. Hardware accelerator
202. Artificial neural networks
203. Memory
204. Model coefficients
205. Computer
300. Detection results
301. Processor
302. Evaluated detection results
The system (200) subject to the invention for operating one or more artificial neural networks (202) in parallel, in series or in a combination thereof essentially comprises at least one hardware accelerator (201) and/or at least one computer (205) which receives data (101) received from at least one data source (100) as input to the artificial neural networks (202). Said computer(s) (205) may be integrated with the hardware accelerator s) (201), connected via a network, or a combination thereof in a hybrid structure. The detection results (300) obtained as output by the system by operating the artificial neural network (202) are also transmitted to a processor (301), and the processor (301) generates the evaluated detection results (302). In ensemble learning applications of the invention, the evaluated detection results (302) may be ensemble detection results corresponding to the outputs of multiple learning ensemble learning applications.
The method subject to the invention essentially comprises the following steps executed by the system (200) comprising at least one hardware accelerator (201) and/or at least one computer (205); a. receiving the data (101) from at least one data source (100), b. processing the data (101) in artificial neural networks (202) and c. receiving detection results (300) from artificial neural networks (202).
In a preferred application of the invention, the method further comprises the following steps applied after the above-mentioned steps; d. transmitting the received detection results (300) to a processor (301), e. performing operations of associating and inferring on detection results (300), f. generating the evaluated detection results (302).
The data source (100) can be specific to any data type. In this context, data types can be, for example, an audio recording, an image recording, a video sequence, but also a time series data (101) or a spectral data (101) sequence. Within this invention, a data type should be considered as a structure that has a certain integrity within itself, expresses a
certain phenomenon and contains meaning. For this reason, in addition to the data types given as examples, it can be any structure that meets the specified criteria.
The first input point of the data (101) is the system (200). The embedded/integrated system (200) may be a network of computers (205) that are connected to each other via a network, as well as a structure with multiple hardware accelerators (201) working in coordination with each other, or a hybrid structure comprising said networks of connected computers (205) and hardware accelerators (201).
The most critical component of the system (200) is the artificial neural networks (202) and these may be located on the hardware accelerator (201) connected to the system (200), or defined on one or more memories (203) belonging to a computer (205) within the system (200) or to computers (205) connected via a network. Said computers (205) may also comprise hardware accelerators (201). The invention may be implemented with volatile or non-volatile memories (203) or combinations thereof. The memories (203) may also be in the form of an external module.
The said computers (205) can be single-board monolithic structures and can also include hardware accelerators (201) (FPGA, SoC, ASIC, VPU, etc.).
One or more artificial neural networks (202) is/are supported by the invention. The layer coefficient values (204) of the models defining the artificial neural networks (202) are stored in the memory (203) module(s) in the system (200). During startup, the system extracts the model coefficients (204) of the neural networks (202) from these memory (203) modules in order to perform the function of the neural networks (202). This process can take place during the startup of the system (200), but also during the operation of the system (200).
The model coefficients (204) can be loaded into the memory (203) modules before the system (200) is put into operation or functioning. The memory (203) modules may be standalone within the system (200), or they may be embedded in the hardware accelerator(s) (201) in a separate software, software-defined hardware, or purely hardwarebased structure. In addition, the mentioned model coefficients (204) may also be defined in non-volatile memory (203) modules defined in the computer(s) (205). In order to reliably use the model coefficients (204) that can be stored on different memories (203), the model coefficients (204) of specific neural networks (202) should be correctly addressed to the
memory modules (203). This task can be performed by a single CPU in the embedded/integrated system (200) or by a software-defined CPU connected to a hardware accelerator(s) (201) or by a full hardware CPU. The hardware CPU referred to herein may be a computer (205) or a CPU defined within an SoC and operating in coordination with the hardware accelerator (201) belonging to the SoC and connected to SoC. It should be further noted that CPU and FPGA integrated systems, i.e. SoC integrations, are also within the scope of hardware accelerators (201) according to the present invention. Furthermore, in cases where CPUs and hardware accelerators (201) are used together, CPUs and hardware accelerators (201) of one or more separate computers (205) included in the system (200) may also be used. Systems (200) containing only hardware accelerators (201) may be used, as well as systems containing CPUs and hardware accelerators (201) in a single computer (205). Single board computers (205) containing CPU and hardware accelerators (201) in a certain part and hybrid structures with only hardware accelerators
(201) in a certain part can also be included in the system (200). Due to this structure, the embedded/integrated system (200) can be in the form of a simple SoC or an NoC (network-on-chip).
Artificial neural networks (202) are tasked with generating detection results (300). Generating and directing the detection results (300) by operating the neural networks (202) can be realized by means of hardware accelerator(s) (201) or computer(s) (205). The detection results (300) may include a certain classification, interpretation, approximation or prediction performed on a certain region of the data structure or on the entire data structure on the incoming raw data (101). The artificial neural networks (202) may be independent of each other, multiple and one by one, or they may be blocks of artificial neural networks
(202) responsible for a specific task in a serial structure connected to each other one after another. The serial neural networks (202) may also include artificial sub-neural networks (202) operating in parallel in addition to the serial structures. In other words, neural networks (202) can also have complex embodiments consisting of multiple neural networks (202) in parallel with each other and in both series and parallel.
The mechanism of the processor (301), which will process the outputs of one or more neural networks (202) generating the detection results (300), can be set by an operator in accordance with the application field and objectives. The main task of the processor (301) is to correlate the different detection results (300) from different neural networks (202) to increase the overall detection result quality and consistency. For this reason, according to
the invention, the functioning structure of the processor (301) is adjusted according to the choice of artificial neural networks (202) to be used in an embedded/integrated system (200). The results/outputs that are associated with each other or compiled collectively by the processor (301) are called evaluated detection results (302).
With the invention, single or multiple neural network (202) architectures can be operated and ensemble learning functions can be realized with multiple neural networks (202).
The evaluated detection results (302) obtained by the system and method subject to the invention can be used as input in other artificial intelligence systems, for example, for scene interpretation and motion/scene/situation prediction to increase situational awareness.
Experiments have been carried out with some example systems (200) according to the invention and the effectiveness of the invention has been demonstrated. Thanks to the implementation of parallel artificial neural networks (202) in embedded systems (202), for example, it has been observed that 2 artificial neural network (205) models trained in an embedded system (200) supported by an FPGA are 100-150 times more efficient in terms of power consumption compared to the power consumption of 2 GPU supported computers (205) and are equivalent in terms of performance. In another example of a configuration that can be made within the scope of the invention, an embedded system structure that can operate 2 models supported by ASIC/VPU and comprises a single board computer (205) has provided 300 times more efficiency in terms of power consumption compared to computers (205) containing GPUs and has shown half the performance. In another possible configuration, multiple neural network (202) models can be operated using multiple singleboard computers (205) connected to each other with a multi-core and low-power consumption network within the scope of the present invention.
The structures of various configurations realized according to the present invention pave the way for the application of artificial intelligence in low-power consumption mobile systems (200) (portable mobile devices, unmanned systems, wearable systems, satellite systems, etc.).
Claims
CLAIMS A method for operating one or more artificial neural networks (202) in parallel, in series or in a combination thereof, executed by a system (200) comprising at least one hardware accelerator (201) and/or at least one computer (205), characterized in that it comprises the following steps; a. receiving the data (101) from at least one data source (100), b. processing the data (101) in artificial neural networks (202) and c. receiving detection results (300) from artificial neural networks (202). The method according to claim 1, characterized in that it comprises the following steps executed after the step of receiving detection results (300) from artificial neural networks (202); d. transmitting the received detection results (300) to a processor (301), e. performing operations of associating and inferring on detection results (300), f. generating the evaluated detection results (302). The method according to claim 1, wherein the data source (100) is specific to a data type selected from an audio recording, an image recording, a video sequence, a time series data and a spectral data sequence. The method according to claim 1, wherein a plurality of independent artificial neural networks (202) is used. The method according to claim 1, wherein the blocks of neural networks (202) are used which are connected to each other one after another in a serial structure and are responsible for a specific task. The method according to claim 5, wherein the blocks of neural network (202) in serial structure comprise artificial sub-neural networks (202) operating in parallel. A system (200) for operating one or more artificial neural networks (202) in parallel, in series or in a combination thereof comprising at least one hardware accelerator (201) and/or at least one computer (205).
8
8. The system (200) according to claim 7, wherein the computer(s) (205) is/are in an integrated structure with the hardware accelerator s) (201).
9. The system (200) according to claim 7, wherein the computer(s) (205) is/are connected to the hardware accelerator(s) (201) via a network. 10. The system (200) according to claim 7, wherein the computers (205) are of a single board monolithic structure.
11. The system (200) according to claim 10, wherein the computers (205) comprise hardware accelerators (201).
12. The system (200) according to claim 7, wherein it comprises at least one memory (203), whether volatile or non-volatile, in which neural networks (202) are defined, located in the hardware accelerator (201), in a computer (205) or in computers (205) that are connected to each other via a network.
13. The system (200) according to claim 7, wherein it is in the form of an SoC (system- on-chip) or an NoC (network-on-chip).
9
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TR2021/020689 TR2021020689A2 (en) | 2021-12-22 | COMMUNITY LEARNING WITH PARALLEL NEURAL NETWORKS IN EMBEDDED AND INTEGRATED SYSTEMS | |
TR2021020689 | 2021-12-22 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2023121624A2 true WO2023121624A2 (en) | 2023-06-29 |
WO2023121624A3 WO2023121624A3 (en) | 2023-08-03 |
Family
ID=86903535
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/TR2022/051533 WO2023121624A2 (en) | 2021-12-22 | 2022-12-20 | Ensemble learning with parallel artificial neural networks in embedded and integrated systems |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2023121624A2 (en) |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11544540B2 (en) * | 2019-05-10 | 2023-01-03 | Hewlett Packard Enterprise Development Lp | Systems and methods for neural network training and deployment for hardware accelerators |
US11250107B2 (en) * | 2019-07-15 | 2022-02-15 | International Business Machines Corporation | Method for interfacing with hardware accelerators |
GB2588951A (en) * | 2019-11-15 | 2021-05-19 | Prevayl Ltd | Method and electronics arrangement for a wearable article |
-
2022
- 2022-12-20 WO PCT/TR2022/051533 patent/WO2023121624A2/en unknown
Also Published As
Publication number | Publication date |
---|---|
WO2023121624A3 (en) | 2023-08-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | Ftrans: energy-efficient acceleration of transformers using fpga | |
Verma et al. | Performance evaluation of deep learning compilers for edge inference | |
Niu et al. | Reuse kernels or activations? A flexible dataflow for low-latency spectral CNN acceleration | |
Xiao et al. | FPGA implementation of CNN for handwritten digit recognition | |
Alhamali et al. | FPGA-accelerated hadoop cluster for deep learning computations | |
Yang et al. | Deploy large-scale deep neural networks in resource constrained iot devices with local quantization region | |
Kumari et al. | EdgeL^ 3: compressing L^ 3-net for mote scale urban noise monitoring | |
Wang et al. | Briefly Analysis about CNN Accelerator based on FPGA | |
WO2023045257A1 (en) | Compressed sensing image recovery method and apparatus, and device and medium | |
Kyrkou et al. | SCoPE: Towards a systolic array for SVM object detection | |
Jiang et al. | Optimized FPGA-based deep learning accelerator for sparse CNN using high bandwidth memory | |
Bhowmik et al. | ESCA: Event-based split-CNN architecture with data-level parallelism on ultrascale+ FPGA | |
Ali et al. | Hardware accelerators and accelerators for machine learning | |
Aung et al. | Deepfire2: A convolutional spiking neural network accelerator on fpgas | |
WO2023121624A2 (en) | Ensemble learning with parallel artificial neural networks in embedded and integrated systems | |
Wang et al. | Acceleration and implementation of convolutional neural network based on FPGA | |
Morcel et al. | Fpga-based accelerator for deep convolutional neural networks for the spark environment | |
Wu et al. | Accelerator design for vector quantized convolutional neural network | |
Mandal et al. | Design of a systolic array based multiplierless support vector machine classifier | |
Piyasena et al. | Lowering dynamic power of a stream-based cnn hardware accelerator | |
TR2021020689A2 (en) | COMMUNITY LEARNING WITH PARALLEL NEURAL NETWORKS IN EMBEDDED AND INTEGRATED SYSTEMS | |
Wang et al. | AutoMap: Automatic Mapping of Neural Networks to Deep Learning Accelerators for Edge Devices | |
Syed et al. | FPGA Implementation of a Fault-Tolerant Fused and Branched CNN Accelerator With Reconfigurable Capabilities | |
Ye | The Development of FPGA Implementation on Convolutional Neural Network | |
Harris et al. | Machine Learning for Triggering and Data Acquisition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22912154 Country of ref document: EP Kind code of ref document: A2 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |