WO2022131397A1

WO2022131397A1 - Cnn-rnn architecture conversion type computational acceleration device design method

Info

Publication number: WO2022131397A1
Application number: PCT/KR2020/018462
Authority: WO
Inventors: 전해룡
Original assignee: 주식회사 모빌린트
Priority date: 2020-12-16
Filing date: 2020-12-16
Publication date: 2022-06-23

Abstract

Disclosed is a method for controlling a device equipped with an FPGA computational accelerator. According to an embodiment of the present specification, an FPGA board interface configuring a common interface and respective interfaces for operations of different deep learning algorithms may be configured. Respective hardware images for accelerating computation of different deep learning algorithms may be stored in a memory. When one of the respective hardware images is selected, a gate array of the FPGA computational accelerator may be reconfigured according to the selected hardware image. Accordingly, it is possible to perform different deep learning algorithm computations having faster and higher power efficiency by using limited resources of the FPGA.

Description

CNN-RNN Architecture Convertible Computational Accelerator Design Method

This specification relates to a CNN-RNN architecture switchable computational accelerator design method.

FPGA is a device that can flexibly convert a hardware semiconductor design desired by a designer based on internal programmable DSP, LUT, and memory. It is a device that is applied and used in various semiconductor designs.

In addition, due to the recent development of deep learning technology, image recognition using CNN algorithm and RNN speech recognition are receiving many research and achievements. Since these deep learning algorithms require a large amount of computation, GPUs are mainly used as computational devices. However, due to factors such as price, performance, and power efficiency, NPUs (Neural Processing Units) optimized for deep learning algorithm calculations are being developed to replace GPUs.

It can be used as a deep learning computation accelerator by changing the hardware of the FPGA by replacing these computational devices. However, depending on the resources (DSP, LUT, memory, etc.) of the FPGA, it is determined whether a high-performance, low-power arithmetic accelerator can be implemented. In general, DSPs, LUTs, and FPGAs with large memory can implement a high-performance computational accelerator architecture, but the disadvantage is that they are expensive.

It is a very important technical factor to efficiently use the resources of the FPGA to increase the performance and power efficiency of the computation accelerator, but the CNN and RNN algorithms differ in many parts of the architecture, so it is generally difficult to implement them using resources efficiently. In addition, even if implemented, there is a limit to using a lot of FPGAs compared to the architecture implemented exclusively for CNN or RNN.

This specification is intended to solve the above-described technical problem, and according to the present specification, focusing on the point that the CNN algorithm and the RNN algorithm are used independently, the FPGA board or A device with an FPGA board is provided.

The technical problems to be achieved by the present invention are not limited to the technical problems mentioned above, and other technical problems not mentioned are clear to those of ordinary skill in the art to which the present invention belongs from the detailed description of the invention below. can be understood clearly.

The control method of a device equipped with an FPGA arithmetic accelerator according to an embodiment of the present specification constitutes each interface for the operation of each different deep learning algorithm, and a common interface commonly applied to the different deep learning algorithm operations. Interface setting step of the FPGA board; Storing each hardware image for the operation acceleration of the different deep learning algorithms in the non-volatile memory of the FPGA board; receiving a selection signal for selecting any one of the respective hardware images; and reconfiguring the gate array of the FPGA computation accelerator according to the selected hardware image.

The different deep learning algorithms may include a CNN algorithm and an RNN algorithm.

Each of the interfaces may include a first interface for the operation of the CNN algorithm and a second interface for the operation of the RNN algorithm.

The first interface may have one end connected to a video device among external devices, and may receive data from the video device.

The imaging device may include at least one of a camera, a lidar (LiDAR), and a radar (RADAR).

The second interface may have one end connected to an audio device among external devices, and may receive data from the audio device.

The audio device may include at least one of a microphone and a speaker.

The common interface is connected to a common source used for data processing using the different deep learning algorithms, and the common source may include a communication module and a storage module.

The step of storing in the non-volatile memory of the FPGA board is a first hardware image for the operation of the RNN algorithm, a second hardware image for the operation of the CNN algorithm, and the RNN algorithm and data processing using the CNN algorithm. A common source may be loaded at different addresses.

The control method of the device equipped with the FPGA computation accelerator further comprises; storing a first weight for the operation of the RNN algorithm and a second weight for the operation of the CNN algorithm, the first weight value and the second weight value The two weight values may be stored in the non-volatile memory or a storage device distinct from the non-volatile memory.

The FPGA computation accelerator may be divided into a reconfigurable region in which the first hardware image or the second hardware image is selectively loaded, and a static region in which the common source is not changed after being uploaded once.

An apparatus equipped with an FPGA arithmetic accelerator according to another embodiment of the present specification includes a memory including a plurality of hardware images for operation of different deep learning algorithms, respectively; and a neural processing unit that processes data based on the selected hardware image and outputs a result when any one of the plurality of hardware images is selected, wherein the neural processing unit includes any one of the plurality of hardware images When a selection signal for selecting is received, the selected hardware image is loaded, and the FPGA operation accelerator is configured to reconfigure a gate array according to the selected hardware image.

The memory may further store different weight information used for different deep learning algorithm operations.

The memory may include a non-volatile memory for storing the plurality of hardware images and a volatile memory for storing the weight information.

All of the plurality of hardware images and the weight information may be stored in the non-volatile memory.

The FPGA computation accelerator can be divided into a dynamic region in which a selected hardware image among the plurality of hardware images can be selectively loaded, and a static region in which a common source commonly applicable to the execution of the plurality of different deep learning algorithms is loaded. have.

A computer program stored in a computer-readable medium comprising a plurality of instructions executed by one or more processors according to another embodiment of the present specification, each interface for the operation of each different deep learning algorithm, the each other a command to set the interface of the FPGA board constituting a common interface commonly applied to other deep learning algorithm operations; a command to store each hardware image for operation acceleration of the different deep learning algorithms in the non-volatile memory of the FPGA board; a command to reconfigure the gate array of the FPGA operation accelerator according to the selected hardware image when receiving a selection signal for selecting any one of the respective hardware images, wherein the FPGA operation accelerator includes each of the hardware images It may be divided into a reconfigurable region in which any one of them is selectively loaded, and a static region in which a common source commonly used for execution of each hardware image is not changed after being uploaded once.

CNN-RNN architecture conversion type AI computation accelerator board design method according to another embodiment of the present specification, a first interface for the operation of the CNN algorithm, a second interface for the operation of the RNN algorithm, and the CNN algorithm and the RNN algorithm Interface setting step of the FPGA board constituting a third interface commonly applied to the operation; configuring a first architecture for accelerating the computation of the CNN algorithm and a second architecture for accelerating the computation of the RNN algorithm, respectively, and storing them in a flash memory of the FPGA board; receiving a selection signal for selecting either the first architecture or the second architecture; and loading the selected architecture into the FGPA computation accelerator.

One end of the first interface may be connected to a video device among external devices.

One end of the second interface may be connected to an audio device among external devices.

The audio device may include at least one of a microphone and a speaker.

The third interface is a common source used for driving the first architecture and the second architecture. One end is connected to the common source, and the other end is connected to the common architecture stored in the flash memory of the FPGA board. can

The common source may include a communication module and a storage module.

In the step of storing the FPGA board in the flash memory, the first architecture and the second architecture are mounted at different addresses, and the architecture selected according to the selection signal is placed and routed to the FGPA computation accelerator. can be set.

The details of other embodiments are included in the detailed description and drawings.

According to an embodiment of the present specification, in implementing a deep learning algorithm using the limited resources of the FPGA, it is possible to reconfigurely correspond to various deep learning applications.

In addition, the FPGA board according to an embodiment of the present specification may have faster performance and higher power efficiency compared to a structure in which two different algorithms are simultaneously operated.

Effects that can be obtained in the present specification are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those of ordinary skill in the art to which the present invention belongs from the description below. .

BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are included as a part of the detailed description to help the understanding of the present specification, provide embodiments of the present specification, and together with the detailed description, explain the technical features of the present specification.

1 and 2 are diagrams for explaining an interface of a deep learning algorithm conversion type calculation accelerator board according to an embodiment of the present specification.

3 is a block diagram of a device equipped with an FPGA computation accelerator according to an embodiment of the present specification.

4A to 4B are diagrams for explaining a design method of a CNN-RNN conversion type computation accelerator according to an embodiment of the present specification.

5 is a flowchart of a control method of a device equipped with an FPGA arithmetic accelerator according to an embodiment of the present specification.

6 is a view for explaining another example of the configuration of the FPGA calculation accelerator board according to an embodiment of the present specification.

BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are included as a part of the detailed description to facilitate the understanding of the present invention, provide embodiments of the present invention, and together with the detailed description, explain the technical features of the present invention.

Hereinafter, the embodiments disclosed in the present specification will be described in detail with reference to the accompanying drawings, but the same or similar components are assigned the same reference numbers regardless of reference numerals, and redundant description thereof will be omitted. The suffixes "module" and "part" for components used in the following description are given or mixed in consideration of only the ease of writing the specification, and do not have distinct meanings or roles by themselves. In addition, in describing the embodiments disclosed in the present specification, if it is determined that detailed descriptions of related known technologies may obscure the gist of the embodiments disclosed in the present specification, the detailed description thereof will be omitted. In addition, the accompanying drawings are only for easy understanding of the embodiments disclosed in the present specification, and the technical idea disclosed herein is not limited by the accompanying drawings, and all changes included in the spirit and scope of the present invention , should be understood to include equivalents or substitutes.

Terms including an ordinal number such as 1st, 2nd, etc. may be used to describe various elements, but the elements are not limited by the terms. The above terms are used only for the purpose of distinguishing one component from another.

When an element is referred to as being “connected” or “connected” to another element, it is understood that it may be directly connected or connected to the other element, but other elements may exist in between. it should be On the other hand, when it is said that a certain element is "directly connected" or "directly connected" to another element, it should be understood that the other element does not exist in the middle.

The singular expression includes the plural expression unless the context clearly dictates otherwise.

In the present application, terms such as “comprises” or “have” are intended to designate that a feature, number, step, operation, component, part, or combination thereof described in the specification exists, but one or more other features It should be understood that this does not preclude the existence or addition of numbers, steps, operations, components, parts, or combinations thereof.

The FPGA operation accelerator board according to an embodiment of the present specification may be mounted in various electronic devices. The FPGA operation accelerator board may be designed to perform various functions in a semiconductor system. In addition, the electronic device may analyze input data in real time based on the neural network to extract valid information, make a situation determination based on the extracted information, or control configurations of electronic devices mounted on the electronic device. Illustratively, the electronic device includes a drone, a robot device such as an advanced driver assistance system (ADAS), a smart TV, a smart phone, a medical device, a mobile device, an image display device, a measurement device, and the IoT. It may be applied to one of computing devices that perform various functions, such as an (Internet of Things) device, and may be mounted on at least one of various types of electronic devices.

Meanwhile, the electronic device may include various types of IPs. For example, IPs include a processing unit, a plurality of cores included in the processing unit, a Multi-Format Codec (MFC), a video module (eg, a camera interface), a Joint Photographic (JPEG) Experts Group processor, video processor, or mixer, etc.), 3D graphics core, audio system, driver, display driver, volatile memory ( volatile memory, non-volatile memory, memory controller, input and output interface block, cache memory, and the like.

As a technology for connecting IPs, a connection method based on a system bus is mainly used. For example, as a standard bus standard, an Advanced Microcontroller Bus Architecture (AMBA) protocol of Advanced RISC Machine (ARM) may be applied. The bus type of the AMBA protocol may include Advanced High-Performance Bus (AHB), Advanced Peripheral Bus (APB), Advanced eXtensible Interface (AXI), AXI4, and AXI Coherency Extensions (ACE). Among the above-described bus types, AXI is an interface protocol between IPs, and may provide a multiple outstanding address function and a data interleaving function. In addition to this, other types of protocols such as uNetwork of SONICs Inc., CoreConnect of IBM, and Open Core Protocol of OCP-IP may be applied to the system bus.

On the other hand, it may be implemented as a configuration inside the neural processing unit according to an embodiment of the present specification. The neural processing unit generates a neural network, trains or learns a neural network, performs an operation based on received input data, generates an information signal based on a result of performing an operation, or a neural network The network can be re-trained. That is, the neural processing unit can perform complex operations required for deep learning or machine learning.

According to an embodiment of the present specification, the neural processing unit may include an FPGA. The FPGA may be designed to perform an additional operation among complex operations among various operations performed by the neural processing unit. The programmable logic may reconfigure the gate array by loading a hardware image corresponding to the application, and may process additional operations required for the application according to the reconfigured gate array. Here, the addition operation may include a pre-processing operation and a post-operation required to perform a complex operation of the neural processing unit.

The neural processing unit may receive various types of application data from IPs through a system bus, and load a hardware image adaptive to the application into the FPGA based on this.

The neural processing unit performs complex operations related to neural network creation, training, or retraining, but in the case of additional operations required for the operation, the FPGA processes various operations independently without the support of an AP (application processor). .

Referring to Figure 1, the FPGA board according to an embodiment of the present specification is a CNN architecture for performing a CNN algorithm, an RNN architecture for performing an RNN algorithm, a common architecture commonly applied to the CNN algorithm and performing the RNN algorithm It can be mounted on one FPAG board. However, the fact that the CNN architecture and the RNN architecture are simultaneously mounted on one FPGA board only means that the two architectures are simultaneously stored in the non-volatile memory, and actually means that two deep learning algorithms are simultaneously executed on one FPGA. is not doing

That is, the FPGA board according to an embodiment of the present specification may include input/output interfaces for performing different deep learning algorithms in order to selectively perform any one of the plurality of deep learning algorithms described above. For example, a CNN input pin that receives the data required to apply the CNN algorithm, a CNN output pin that outputs the result after performing the CNN algorithm, a CNN input pin that receives the data required to apply the RNN algorithm, and outputs the result after performing the CNN algorithm It may include an RNN output pin, a common input pin for receiving resources commonly required for CNN and RNN algorithm execution, and a common output pin.

2, the FPGA board interface of the present specification will be described in more detail. Referring to FIG. 2 , the FPGA interface 200 includes a CNN input/output pin 211 , and for application of a CNN algorithm specialized in image processing, the CNN input/output pin 211 is an interface 241 for an image of an external video device. can be connected with The external imaging device may include at least one of a camera, a camera, a lidar (LiDAR), and a radar (RADAR).

In addition, the FPGA interface 200 includes an RNN input/output pin 212, and for application of an RNN algorithm specialized for speech processing (speech recognition, speech synthesis, natural language processing, etc.), the RNN input/output pin 212 is an audio interface. (242) can be connected. The audio interface may include a connection to an audio device including at least one of a microphone and a speaker.

In addition, the FPGA interface 200 may be connected to a module commonly applied to the implementation of the RNN and CNN algorithms. For example, it may be connected to the interface 220 for memory and the interface 230 for communication. The memory interface 220 may include a USB, SD card, and the like, and the communication interface 230 may include an Ethernet communication interface and a PCIE communication interface.

In the FPAG interface setting step, pins to be independently connected to CNN and RNN algorithm are respectively connected, and in the case of a common interface, external input pins are connected as common pins to minimize the required external input pins.

In some cases, in the process of implementing the FPGA interface, a configurable logic block (CLB) and an input output block (IOB) and a connection circuit capable of configuring a connection between the two may be used as programmable logic.

Referring to FIG. 3 , the FPGA board according to an embodiment of the present specification may include a flash memory 310 and an operation accelerator 320 .

The flash memory may store a plurality of deep learning architectures implementing a plurality of different deep learning algorithms. Illustratively, the plurality of different deep learning architectures may include the CNN architecture 311 and the RNN architecture 315, but the scope of the present specification is not limited thereto, and may be applied to all deep learning algorithms having different computational structures. have. In addition, the flash memory 310 further includes a common architecture 315 that can be commonly applied to the plurality of deep learning architectures.

According to an embodiment, the computation accelerator 320 implemented as an FPGA is loaded with any one selected from among a plurality of different deep learning architectures stored in the flash memory 310 and dynamically reconfigured as the selected architecture is loaded. to be able to perform specific actions. In addition, the common architecture 315 stored in the flash memory 310 may be loaded into the arithmetic accelerator 320 implemented by the FPGA.

Meanwhile, an input for selecting a specific architecture among a plurality of architectures stored in the flash memory may include a user input, but the present specification is not limited thereto. For example, the FPGA board may be applied to determine the type of application that requires calculation, and a specific architecture may be selected corresponding to the application. For example, when image data is input through a CNN input pin, the processor may control the CNN architecture to be loaded into the computation accelerator. Alternatively, when audio data is input through the RNN input pin, the processor may control the RNN architecture to be loaded into the arithmetic accelerator.

When a specific deep learning architecture is loaded into the computation accelerator 320, the computation accelerator 320 may be programmed and reconfigured according to the loaded architecture, and performs deep learning acceleration computation using the common architecture 322, and performs You can print the results.

4A to 4B are diagrams for explaining a design method of a CNN-RNN conversion type computation accelerator according to an embodiment of the present specification. The FPGA board according to an embodiment of the present specification may include a processor capable of processing internal operations, or a processor that is separated from the FPGA board and controls the operation of the FPGA board as a whole.

4A illustrates an example in which a CNN architecture is loaded into a computational accelerator and a CNN algorithm operation is performed.

The memory 410 may store data necessary for accelerating a deep learning algorithm operation, as shown in FIG. 4A . For example, the memory may include reconstruction information for storing information necessary for an operation for an artificial intelligence function. The reconstruction information may include a hardware image for reconfiguring the FPGA. For example, CNN architecture, RNN architecture, common source, etc. may be stored in memory in the form of an FPGA image. Here, the memory 410 may be a non-volatile memory.

Meanwhile, the reconstruction information for reconfiguring the FPGA may include weight information required for deep learning operation. For example, the weight information may include CNN weights and RNN weights. The weight information may be stored in a volatile memory (DRAM). Although the example shown in FIG. 4A describes an example in which the FPGA image and weights are stored independently, the present specification is not limited thereto, and weight information may also be stored in association with the FPGA image. Also, for example, in the present specification, application information, FPGA image, and weight information requiring deep learning operation may be stored in the form of a mapping table.

Meanwhile, in FIG. 4A , the FPGA operation accelerator 420 may include a dynamic region (eg, region 421) and a static region (eg, region 422).

The dynamic region 421 may be designed so that a specific operation is performed according to a hardware image loaded corresponding to a selection of a specific architecture or an application. The dynamic region 4210 may be a programmable logic device (PLD) that is dynamically reconfigured through an FPGA hardware image and used to design a digital circuit to perform a specific operation.

The static region 422 may perform a specific operation without loading a hardware image. In the case of simple operations or operations to be performed by default regardless of the type of the deep learning algorithm, the corresponding operation may be designed to be performed in the FPGA of the static area 422 . For example, a nonlinear function operation of an operation frequently performed in the deep learning algorithm operation may be designed to be performed in the static region 422 . The nonlinear function operation may include a tangent hyperbolic (Tanh) function, a sigmoid function, a GeLU function, an exponential function, a logarithm function, and the like.

According to one embodiment of the present specification, in the state that no architecture is loaded in the FPGA computation accelerator 420, when a CNN selection signal is input, the processor determines that the FPGA computation accelerator 420 needs to be reconfigured and , activates the dynamic region 421 of the FPGA computation accelerator 420 . When the operating region 421 is activated, the processor loads the CNN hardware image among the architectures stored in the memory into the dynamic region 421 . The processor also loads the common source into the static region 422 of the FPGA computation accelerator 420 . Meanwhile, according to an embodiment, CNN weights may be added to the FPGA computation accelerator 420 .

The FPGA operation accelerator 420 is configured to reconfigure the FPGA according to the CNN hardware image loaded in the FPGA, calculate the data received through the CNN input interface of the FPGA board interface, and output the operation result.

Meanwhile, referring to FIG. 4B , in the state in which the FPGA operation accelerator 420 is reconfigured by the CNN architecture in FIG. 4A , when it is necessary to perform the RNN algorithm, it is necessary to switch the loaded architecture.

The processor loads the FPGA image (RNN hardware image) from the memory into the dynamic region 423 of the FPGA operation accelerator 420 upon receiving the RNN selection signal. The FPGA operation accelerator 420 reconfigures the FPGA according to the RNN hardware image loaded in the FPGA, calculates the data received through the RNN input interface of the FPGA board interface, and outputs the operation result. Meanwhile, according to an embodiment, an RNN weight may be added to the FPGA computation accelerator 420 .

However, in the static area 422 of the FPGA operation accelerator 420, the common architecture is already loaded, and even if the deep learning algorithm is changed from CNN to RNN, the common architecture pre-stored in the static area 422 is RNN algorithm operation acceleration operation. can be used in the same way.

5 is a flowchart of a control method of a device equipped with an FPGA arithmetic accelerator according to an embodiment of the present specification. The control method shown in FIG. 5 may be controlled by the processing operation of the FPGA board itself on which the FPGA operation accelerator is mounted, or may be controlled by a separate processor separate from the FPGA board. Hereinafter, a control method of a device equipped with an FPGA arithmetic accelerator in an embodiment of the present specification may be controlled by the above-described processor.

The processor may set the FPGA mode interface (S500).

More specifically, the processor may perform an interface setting operation of the FPGA board constituting each interface for the operation of each different deep learning algorithm, and a common interface commonly applied to the operation of the different deep learning algorithm. It may include a first interface for the operation of the CNN algorithm and a second interface for the operation of the RNN algorithm.

The first interface may have one end connected to a video device among external devices, and may receive data from the video device. The imaging device may include at least one of a camera, a lidar (LiDAR), and a radar (RADAR). The second interface may have one end connected to an audio device among external devices, and may receive data from the audio device. The audio device may include at least one of a microphone and a speaker.

The processor may store hardware images respectively corresponding to a plurality of different deep learning algorithms in the memory (S510).

The memory may be a non-volatile memory implemented integrally with the FPGA board. As described above, the plurality of different deep learning algorithms may be classified in consideration of the operation operation or FPGA resources required for the operation operation.

Meanwhile, according to an embodiment of the present specification, a first weight for the operation of the RNN algorithm and a second weight for the operation of the CNN algorithm may be further stored in a memory, and the first weight value and the second weight value The value may be stored in the non-volatile memory or a storage device distinct from the non-volatile memory.

When the processor receives a selection signal for selecting one of the respective hardware images (S520), the processor may reconfigure the gate array of the FPGA operation accelerator according to the selected hardware image (S530).

Above, according to an embodiment of the present specification, a memory for storing a plurality of different deep learning architectures and a configuration in which a specific deep learning architecture is selected from the memory and loaded into the FGPA computation accelerator has been described. However, as shown in FIG. 6 , the FPGA operation accelerator may be included as a component of the neural processing 620 . The neural processing unit (

When receiving a selection signal for selecting any one of the neural processing unit 620 and a plurality of different deep learning architectures, a hardware image corresponding to the selected architecture is displayed in the dynamic region 630 of the FPGA operation accelerator 630 . is loaded, and the FPGA computation accelerator reconfigures the gate array according to the loaded hardware image. In some cases, after the common architecture stored in the memory is loaded once into the static area 631 of the FPGA operation accelerator 630, even if a design change is made in the dynamic area 630, the common architecture is maintained in a fixed state for deep learning operation. can be utilized.

The present invention described above can be implemented as computer-readable codes on a medium in which a program is recorded. The computer-readable medium includes all types of recording devices in which data readable by a computer system is stored. Examples of computer-readable media include Hard Disk Drive (HDD), Solid State Disk (SSD), Silicon Disk Drive (SDD), ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc. There is also a carrier wave (eg, transmission over the Internet) that is implemented in the form of. Accordingly, the above detailed description should not be construed as restrictive in all respects but as exemplary. The scope of the present invention should be determined by a reasonable interpretation of the appended claims, and all modifications within the equivalent scope of the present invention are included in the scope of the present invention.

Embodiments disclosed herein relate to a deep learning computation accelerator, and in particular, to a CNN-RNN architecture switchable FPGA computation accelerator capable of selectively performing computations of CNNs and RNNs using FGPA.

Claims

Interface setting step of the FPGA board configuring each interface for the operation of each different deep learning algorithm, a common interface commonly applied to the operation of the different deep learning algorithm;

Storing each hardware image for the operation acceleration of the different deep learning algorithms in the non-volatile memory of the FPGA board;

receiving a selection signal for selecting any one of the respective hardware images; and

reconfiguring the gate array of the FPGA computation accelerator according to the selected hardware image;

Control method of a device equipped with an FPGA computation accelerator comprising a.
The method of claim 1,

The different deep learning algorithms are a control method of a device equipped with an FPGA arithmetic accelerator, characterized in that it includes a CNN algorithm and an RNN algorithm.
3. The method of claim 2,

Each of the interfaces includes a first interface for the operation of the CNN algorithm and a second interface for the operation of the RNN algorithm.
4. The method of claim 3,

The first interface is

A control method of a device equipped with an FPGA arithmetic accelerator, wherein one end is connected to an imaging device among external devices and receives data from the imaging device.
3. The method of claim 2,

The second interface is

One end is connected to an audio device among external devices, and the control method of a device equipped with an FPGA arithmetic accelerator, characterized in that receiving data from the audio device.
The method of claim 1,

The common interface is

It is connected to a common source used for data processing using the different deep learning algorithms,

The common source is a control method of a device equipped with an FPGA arithmetic accelerator, characterized in that it includes a communication module and a storage module.
3. The method of claim 2,

The step of storing in the non-volatile memory of the FPGA board,

A first hardware image for the operation of the RNN algorithm, a second hardware image for the operation of the CNN algorithm, and a common source used for data processing using the RNN algorithm and the CNN algorithm are loaded at different addresses, respectively. A control method of a device equipped with an FPGA computation accelerator.
8. The method of claim 7,

Storing a first weight for the operation of the RNN algorithm and a second weight for the operation of the CNN algorithm; further comprising,

The first weight value and the second weight value are stored in the non-volatile memory or a storage device separate from the non-volatile memory.
8. The method of claim 7,

The FPGA computation accelerator is

a reconfigurable region into which the first hardware image or the second hardware image is selectively loaded;

The control method of a device equipped with an FPGA arithmetic accelerator, characterized in that the common source is divided into a static area that does not change after being uploaded once.
a memory including a plurality of hardware images for each operation of a different deep learning algorithm; and

When any one of the plurality of hardware images is selected, a neural processing unit that processes data based on the selected hardware image and outputs a result;

The neural processing unit,

FPGA operation comprising an FPGA operation accelerator in which the selected hardware image is loaded and a gate array is reconfigured according to the selected hardware image when receiving a selection signal for selecting any one of the plurality of hardware images A device equipped with an accelerator.
A computer program stored on a computer-readable medium comprising a plurality of instructions executed by one or more processors, the computer program comprising:

The computer program is

A command to set each interface for the operation of each different deep learning algorithm, the interface of the FPGA board constituting a common interface commonly applied to the different deep learning algorithm operations;

a command to store each hardware image for operation acceleration of the different deep learning algorithms in the non-volatile memory of the FPGA board;

a command to reconfigure the gate array of the FPGA operation accelerator according to the selected hardware image when receiving a selection signal for selecting any one of the respective hardware images;

The FPGA computation accelerator includes a reconfigurable region in which any one of the respective hardware images is selectively loaded, and a static region that does not change after a common source commonly used for execution of each hardware image is uploaded once. A computer program stored in a computer-readable medium, characterized in that it is separated into a.