EP4052188B1 - Neural network instruction streaming - Google Patents

Neural network instruction streaming Download PDF

Info

Publication number
EP4052188B1
EP4052188B1 EP20771400.7A EP20771400A EP4052188B1 EP 4052188 B1 EP4052188 B1 EP 4052188B1 EP 20771400 A EP20771400 A EP 20771400A EP 4052188 B1 EP4052188 B1 EP 4052188B1
Authority
EP
European Patent Office
Prior art keywords
instruction
address
neural network
data
instruction stream
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
EP20771400.7A
Other languages
German (de)
French (fr)
Other versions
EP4052188A1 (en
Inventor
John E. Mixter
David R. Mucha
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Raytheon Co
Original Assignee
Raytheon Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Raytheon Co filed Critical Raytheon Co
Publication of EP4052188A1 publication Critical patent/EP4052188A1/en
Application granted granted Critical
Publication of EP4052188B1 publication Critical patent/EP4052188B1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3851Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • Embodiments described herein generally relate to a system and method for implementing an artificial neural network via an instruction stream.
  • Artificial neural networks consist of many layers. These layers, regardless of type, can be thought of as just connections and weights. Each layer has an input from a previous layer or connection and a weight associated with that input. Layer types only differ in how the outputs of one layer are connected to the inputs of the next layer.
  • Artificial neural networks can be trained to implement artificially intelligent processes and functions that can infer and/or predict many things. Neural network training and inference can be distilled down to simple multiply and accumulation operations. During inference, also known as forward propagation, the sums of the multiply and accumulate operations are fed into activation functions that inject nonlinearity into the network. During training, also known as back propagation, the derivative of the activation functions along with the multiply and accumulate sums are used to determine the perceptron output error. It is this error that is used to adjust perceptron input weights allowing the network to be trained.
  • a difficulty in installing an artificial neural network on a hardware platform is the substantial amount of research required regarding the hardware on which the neural networks are installed because neural networks are large and use a lot of resources.
  • US 2019/087708 A1 relates to a dynamically adaptive neural network processing system includes memory to store instructions representing a neural network in contiguous blocks, hardware acceleration (HA) circuitry to execute the neural network, direct memory access (DMA) circuitry to transfer the instructions from the contiguous blocks of the memory to the HA circuitry, and a central processing unit (CPU) to dynamically modify a linked list representing the neural network during execution of the neural network by the HA circuitry to perform machine learning, and to generate the instructions in the contiguous blocks of the memory based on the linked list.
  • HA hardware acceleration
  • DMA direct memory access
  • CPU central processing unit
  • US 2019/147342 A1 relates to a processing circuitry for a deep neural network can include input/output ports, and a plurality of neural network layers coupled in order from a first layer to a last layer, each of the plurality of neural network layers including a plurality of weighted computational units having circuitry to interleave forward propagation of computational unit input values from the first layer to the last layer and backward propagation of output error values from the last layer to the first layer.
  • WO 2019/085655 A1 relates to an information processing method and a terminal device.
  • the method comprises: acquiring first information, wherein the first information is information to be processed by a terminal device; calling an operation instruction in a calculation apparatus to calculate the first information so as to obtain second information; and outputting the second information.
  • a calculation apparatus of a terminal device can be used to call an operation instruction to process first information, so as to output second information of a target desired by a user, thereby improving the information processing efficiency.
  • US5325464A relates to a Pyramid Learning Architecture Neurocomputer (PLAN) is a scalable stacked pyramid arrangement of processor arrays.
  • PLAN Pyramid Learning Architecture Neurocomputer
  • SYPs N2 SYnapse Processors
  • SCATs multiple folded Communicating Adder Tree structures
  • NEPs Neuron Execution Processors
  • Level 3 made up of multiple Programmable Communicating Alu Tree (PCATs) structures, similar to Level 5 SCATs but
  • a processor having two or more execution lanes includes a data cache coupled to memory, a wide memory load circuit that concurrently loads two or more words from a cache line of the data cache, and a writeback circuit situated to send a respective word of the concurrently-loaded words to a selected execution lane of the processor, either into an operand buffer or bypassing the operand buffer.
  • a sharding circuit is provided that allows bitwise, byte-wise, and/or word-wise manipulation of memory operation data.
  • wide cache loads allows for concurrent execution of plural execution lanes of the processor.
  • the present disclosure provides a process for implementing an artificial neural network via an instruction stream comprising: defining a format for instructions in the instruction stream, the format comprising an opcode, an address, and data, wherein the opcode is followed by the address, and wherein the address is followed by the data; defining a header of the instruction stream, the header comprising a learning rate and activation function parameters; creating the instruction stream using the header, the opcode, the address, and the data; and implementing the artificial neural network by providing the instruction stream to a computer processor for execution of the instruction stream; wherein the opcode comprises one or more of a layer type, a load instruction, a store instruction, a map done instruction, a set bias instruction, and a set value instruction; and wherein the address comprises one or more of an address from which to retrieve input data for input into a perceptron of the artificial neural network and an address to transmit output data from the perceptron of the artificial neural network.
  • the present disclosure provides a non-transitory computer-readable medium comprising instructions that when executed by a processor execute a process for implementing an artificial neural network via an instruction stream, the process comprising: defining a format for instructions in the instruction stream, the format comprising an opcode, an address, and data, wherein the opcode is followed by the address, and wherein the address is followed by the data; defining a header of the instruction stream, the header comprising a learning rate and activation function parameters; creating the instruction stream using the header, the opcode, the address, and the data; and implementing the artificial neural network by providing the instruction stream to a computer processor for execution of the instruction stream; wherein the opcode comprises one or more of a layer type, a load instruction, a store instruction, a map done instruction, a set bias instruction, and a set value instruction; and wherein the address comprises one or more of an address from which to retrieve input data for input into a perceptron of the artificial neural network and an address to
  • the present disclosure provides a system comprising: a computer processor; and a computer memory coupled to the computer processor; wherein the computer processor is configured to implement an artificial neural network via an instruction stream by: defining a format for instructions in the instruction stream, the format comprising an opcode, an address, and data, wherein the opcode is followed by the address, and wherein the address is followed by the data; defining a header of the instruction stream, the header comprising a learning rate and activation function parameters; creating the instruction stream using the header, the opcode, the address, and the data; and implementing the artificial neural network by providing the instruction stream to the computer processor that executes the instruction stream; wherein the opcode comprises one or more of a layer type, a load instruction, a store instruction, a map done instruction, a set bias instruction, and a set value instruction; and wherein the address comprises one or more of an address from which to retrieve input data for input into a perceptron of the artificial neural network and an address to transmit output
  • an embodiment takes a different approach to address the difficulty of the vast amount of research, hardware, and other resources that it takes to implement an artificial neural network.
  • an embodiment turns an artificial neural network into an instruction stream. Because there is no real processing difference between the neural network layer types, a very simple set of instruction opcodes can be created to execute any of the standard neural network architectures.
  • the hardware acts as a simple processor that takes in the instructions in the instruction stream and performs the multiplication and accumulation upon which a neural network is based. Simply put, the embodiment deconstructs a neural network into an instruction stream that can be executed in the hardware.
  • the instruction stream is divided into multiple streams and the multiple streams are executed in parallel. This exploitation of the natural parallelism of a neural network permits such neural networks to run in a constrained hardware environment.
  • a first step in creating the instruction stream is to analyze the neural network itself. Specifically, the neural network is separated into its layers. All the multiplies and accumulates that execute in a particular layer can all be computed at the same time. That is, the multiplies and accumulates are data independent. Therefore, once a layer receives its input from the previous layer, all the different perceptrons in that layer can execute in parallel. This parallelism is mirrored in the creation of the instruction streams. That is, the layers of the neural network are separated, and then the perceptrons that can be grouped together are identified. In other words, instead of executing one perceptron at a time, the perceptrons are separated into groups. This grouping depends on the available hardware.
  • FIG. 1 illustrates how a single instruction stream is assembled.
  • an embodiment includes a perceptron 110.
  • the perceptron 110 receives input 112 and these inputs are multiplied by weights 114.
  • the results of these multiplications are summed/accumulated and then applied to an activation function 120. This generates the output, and the output is converted to its instruction equivalent.
  • the instruction includes a header 121 that sets some activation values, a bias value 122 is set, and then one after another the weights 124 and inputs 126 are loaded.
  • the inputs 112 and weights 114 correspond to the loaded weights 124 and inputs 126. While only five weight and input pairs are illustrated in FIG.
  • a processing element executes the instruction stream by multiplying the inputs by the weights and accumulating the results, and as indicated at 128, the output of the processing element is stored in memory.
  • FIG. 2 illustrates an embodiment of an instruction 200, which can be 64 bits long.
  • a first part of the instruction is an op code 210, which determines what that instruction does. In a sense, the op code informs where in the neural network the execution is, which informs the processing engine what it needs to do. In an embodiment, the op code is eight bits long.
  • the opcode sets the layer types. The different types of layers include an input, a fully connected convolution, a single or multiple map convolution, a pooling layer, or a classifier. As is known to those of skill in the art, a two-dimensional convolution layer is a common layer within a network, and it has different connection patterns and it recycles weights.
  • a benefit of a convolution layer is it uses very few weights; however, it does take a lot of calculations.
  • the instructions point to a different pattern of inputs from the previous layer.
  • the op code changes to tell the processing element what the function of the instruction is and for what purpose the data will be used.
  • the instruction could also relate to load, store, map done (indicates when a convolution is done), or set value (used to set register values with constants) functions to be executed by the processing element.
  • the next bits in the instruction 200 inform the processing element of address locations 220, that is, either where to find data or where to put data.
  • the address 220 is sixteen bits long. For example, if it is a LOAD instruction, these bits will inform from where to load the data. That is, the previous layer stored its data in memory, and the LOAD instruction tells the processing element where to get that data in order to start the multiplication process for the current perceptron.
  • the next value in the instruction 200 is the actual weight value 230 to be applied to the input. For example, if the instruction was a LOAD instruction, the instruction would cause the loading of the input, the weight would be obtained, and then the multiplication would be executed. The last 32 bits are also used during back propagation to inform from where to get information or where to store information. As illustrated in FIG. 3 , the instructions 200 are commingled into the instruction stream 300, which consists of a header 310 and a plurality of layers 320 and associated instructions 200. The layers 320 correspond to the multiple layers in the artificial neural network.
  • FIG. 4 illustrates an example of how an instruction stream can be divided up.
  • the system can have several processing elements within that hardware.
  • Each one of the divided-up instruction streams has a header, a layer identification, and then some instructions, followed by another layer and some more instructions.
  • one layer can be divided into four instructions 300A for two processing elements 310A and 310B, or two instructions 300B for four processing elements 310C, 310D, 310E, and 310F. This division of the instruction stream into many processing elements reduces the amount of time that it takes to execute that instruction stream.
  • Each layer 320 must finish before proceeding onto the next layer. If the instructions do not divide out equally, then the processing element that has fewer instructions waits for the other processing elements to finish their execution before it goes on to the next layer, because the next layer needs all the data from the previous layer.
  • the forward propagation instruction stream begins with the header 310.
  • the header contains global information needed to execute the artificial neural network. Global information includes learning rates, activation functions, and other hyper parameters.
  • input values, weight values, and destinations for the resulting sums are needed.
  • LOAD opcodes are used to retrieve input data and weight values
  • STORE opcodes are used to place sums in the proper memory destinations.
  • the input and output values are located in the memory of a field programmable gate array (FPGA).
  • FPGA field programmable gate array
  • the FPGA processing element must be provided with the FPGA memory address for each neuron value when the FPGA processing element receives the LOAD instruction. Because of FPGA internal memory limitations, the weights are stored in memory. Consequently, the value of the weight must be provided to the FPGA processing element on the same LOAD opcode.
  • STORE opcode the destination address for the resulting sum is transmitted to the FPGA processing element.
  • the STORE opcode occurs once for each perceptron and marks the end of a multiply and accumulate operation. In most embodiments, the LOAD and STORE opcodes makeup the bulk of the instruction stream.
  • the instruction stream is reversed.
  • the system starts up the classifier at the outputs, which are already there, and the outputs are used to calculate the error, adjust the weights, and then go to the previous layer and perform the same operations. That is, during back propagation, the error is determined, the weights are adjusted based on the error, and using the instruction stream the weights are sent out of the hardware back into memory. Thereafter, when forward propagating, all the new weights are available.
  • the outputs of every layer must be retained. This retention cannot be done in restricted hardware environments because there is not enough memory to store all the output data. Consequently, during forward propagation, the STORE command is used to transmit the output of a perceptron in a layer to memory.
  • the needed input data must be retrieved from memory for processing by the processing element. For this purpose, as illustrated in FIG. 5 , every instruction is preceded by data 510. Consequently, for back propagation, the processing element needs the input value, the weight value, and the previous output value, which are all included in the instruction 510. While this does increase the size of the instruction stream, and slightly slows processing, it does allow for on-chip training of the neural network in hardware.
  • the streamed instructions for forward propagation are located in an memory 610.
  • a direct memory access module 620 is used to access the instruction streams.
  • the microprocessor 630 in conjunction with the direct memory access modules 620 access the instruction streams, and these instruction streams are transferred into the fabric 640 and processed by the processing elements 645.
  • the microprocessor instructs the DMA to transfer data from one location to another, and the DMA simply transfers the data as a big block of memory into the fabric. After processing by the processing elements, the data are stored in memory 650 via the STORE command.
  • the streamed instructions for backward propagation are located in memory 710. It is noted that unlike the forward propagation of FIG. 6 , the backward propagation has the needed data 710A.
  • the microprocessor 730 in conjunction with the direct memory access modules 720 access the instruction streams, and these instruction streams are transferred into the fabric 740 and processed by the processing elements 745.
  • the microprocessor instructs the DMA to transfer data from one location to another, and the DMA simply transfers the data as a big block of memory into the fabric.
  • the error is applied to the weight to adjust the weight, and the weight is stored back into memory.
  • FIG. 8 illustrates how the updated weights are moved through the DMA 720 back to their original locations in the forward propagation instruction stream.
  • FIG.9 is another diagram illustrating a system and process for implementing an artificial neural network via an instruction stream according to some aspects of the embodiments.
  • FIG. 9 includes process blocks 905-961. Though arranged substantially serially in the example of FIG. 9 , other examples may reorder the blocks, omit one or more blocks, and/or execute two or more blocks in parallel using multiple processors or a single processor organized as two or more virtual machines or sub-processors. Moreover, still other examples can implement the blocks as one or more specific interconnected hardware or integrated circuit modules with related control and data signals communicated between and through the modules. Thus, any process flow is applicable to software, firmware, hardware, and hybrid implementations.
  • a header of the instruction stream is defined.
  • the header includes a learning rate and activation function parameters.
  • a format is defined for instructions that implement an artificial neural network via an instruction stream.
  • the format includes an opcode, an address, and data.
  • the opcode can include a layer type, a load instruction, a store instruction, a map done instruction, a set bias instruction (which indicates the beginning of processor element calculations), and/or a set value instruction.
  • the instruction is structured such that the opcode is followed by the address, and the address is followed by the data.
  • the address is either an address from which input data are retrieved for input into a perceptron of the artificial neural network, or an address to which output data are transmitted from the perceptron of the artificial neural network.
  • the instruction stream can include a single header and sets of neural network layer identifications.
  • each neural network layer identification is associated with one or more instructions. ( See e.g., FIG. 3 ).
  • the instruction stream is created using the opcode, the address, and the data ( 920 ). Thereafter, at 930 , the artificial neural network is implemented by providing the instruction stream to a computer processor for execution of the instruction stream.
  • a data layer is positioned before or prior to each of the instructions. As disclosed in connection with FIG. 5 , this data layer is for use in connection with a backward propagation of the neural network.
  • the system includes a processing element.
  • the input data are received into the processing element via a LOAD instruction, and the LOAD instruction includes an address field that indicates the neuron in the current layer to which the instruction is applied.
  • output data are transmitted from the processing element to an memory.
  • the instruction stream is divided into several instruction streams prior to providing the instruction stream to the computer processor or processing element for execution. Then at 961, the several instruction streams are executed in parallel.
  • FIG. 10 is a block diagram illustrating a computing and communications platform 1000 in the example form of a general-purpose machine on which some or all of the system of FIG. 1 may be carried out according to various embodiments.
  • programming of the computing platform 1000 according to one or more particular algorithms produces a special-purpose machine upon execution of that programming.
  • the computing platform 1000 may operate in the capacity of either a server or a client machine in server-client network environments, or it may act as a peer machine in peer-to-peer (or distributed) network environments.
  • Example computing platform 1000 includes at least one processor 1002 (e.g., a central processing unit (CPU), a graphics processing unit (GPU) or both, processor cores, compute nodes, etc.), a main memory 1004 and a static memory 1006, which communicate with each other via a link 1008 ( e.g., bus).
  • the computing platform 1000 may further include a video display unit 1010, input devices 1012 (e.g., a keyboard, camera, microphone), and a user interface (UI) navigation device 1014 (e.g., mouse, touchscreen).
  • the computing platform 1000 may additionally include a storage device 1016 (e.g., a drive unit), a signal generation device 1018 (e.g., a speaker), and a RF-environment interface device (RFEID) 1020.
  • processor 1002 e.g., a central processing unit (CPU), a graphics processing unit (GPU) or both, processor cores, compute nodes, etc.
  • main memory 1004 e.g.,
  • the storage device 1016 includes a non-transitory machine-readable medium 1022 on which is stored one or more sets of data structures and instructions 1024 (e . g ., software) embodying or utilized by any one or more of the methodologies or functions described herein.
  • the instructions 1024 may also reside, completely or at least partially, within the main memory 1004, static memory 1006, and/or within the processor 1002 during execution thereof by the computing platform 1000, with the main memory 1004, static memory 1006, and the processor 1002 also constituting machine-readable media.
  • machine-readable medium 1022 is illustrated in an example embodiment to be a single medium, the term “machine-readable medium” may include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more instructions 1024.
  • the term “machine-readable medium” shall also be taken to include any tangible medium that is capable of storing, encoding or carrying instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present disclosure or that is capable of storing, encoding or carrying data structures utilized by or associated with such instructions.
  • the term “machine-readable medium” shall accordingly be taken to include, but not be limited to, solid-state memories, and optical and magnetic media.
  • machine-readable media include nonvolatile memory, including but not limited to, by way of example, semiconductor memory devices (e.g., electrically programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM)) and flash memory devices; magnetic disks such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks.
  • semiconductor memory devices e.g., electrically programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM)
  • EPROM electrically programmable read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • flash memory devices e.g., electrically programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM)
  • flash memory devices e.g., electrically programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM)
  • RFEID 1020 includes radio receiver circuitry, along with analog-to-digital conversion circuitry, and interface circuitry to communicate via link 1008 according to various embodiments.
  • RFEID may be in the form of a wideband radio receiver, or scanning radio receiver, that interfaces with processor 1002 via link 1008.
  • link 1008 includes a PCI Express (PCIe) bus, including a slot into which the NIC form-factor may removably engage.
  • RFEID 1020 includes circuitry laid out on a motherboard together with local link circuitry, processor interface circuitry, other input/output circuitry, memory circuitry, storage device and peripheral controller circuitry, and the like.
  • RFEID 1020 is a peripheral that interfaces with link 1008 via a peripheral input/output port such as a universal serial bus (USB) port.
  • RFEID 1020 receives RF emissions over wireless transmission medium 1026.
  • RFEID 1020 may be constructed to receive RADAR signaling, radio communications signaling, unintentional emissions, or some combination of such emissions.
  • Examples, as described herein, may include, or may operate on, logic or a number of components, circuits, or engines, which for the sake of consistency are termed engines, although it will be understood that these terms may be used interchangeably.
  • Engines may be hardware, software, or firmware communicatively coupled to one or more processors in order to carry out the operations described herein.
  • Engines may be hardware engines, and as such engines may be considered tangible entities capable of performing specified operations and may be configured or arranged in a certain manner.
  • circuits may be arranged ( e . g ., internally or with respect to external entities such as other circuits) in a specified manner as an engine.
  • the whole or part of one or more computing platforms may be configured by firmware or software (e . g ., instructions, an application portion, or an application) as an engine that operates to perform specified operations.
  • the software may reside on a machine-readable medium.
  • the software when executed by the underlying hardware of the engine, causes the hardware to perform the specified operations.
  • the term hardware engine is understood to encompass a tangible entity, be that an entity that is physically constructed, specifically configured ( e . g ., hardwired), or temporarily ( e.g., transitorily) configured ( e.g., programmed) to operate in a specified manner or to perform part or all of any operation described herein.
  • each of the engines need not be instantiated at any one moment in time.
  • the engines comprise a general-purpose hardware processor configured using software; the general-purpose hardware processor may be configured as respective different engines at different times.
  • Software may accordingly configure a hardware processor, for example, to constitute a particular engine at one instance of time and to constitute a different engine at a different instance of time.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Neurology (AREA)
  • Advance Control (AREA)

Description

    TECHNICAL FIELD
  • Embodiments described herein generally relate to a system and method for implementing an artificial neural network via an instruction stream.
  • BACKGROUND
  • Artificial neural networks consist of many layers. These layers, regardless of type, can be thought of as just connections and weights. Each layer has an input from a previous layer or connection and a weight associated with that input. Layer types only differ in how the outputs of one layer are connected to the inputs of the next layer.
  • Artificial neural networks can be trained to implement artificially intelligent processes and functions that can infer and/or predict many things. Neural network training and inference can be distilled down to simple multiply and accumulation operations. During inference, also known as forward propagation, the sums of the multiply and accumulate operations are fed into activation functions that inject nonlinearity into the network. During training, also known as back propagation, the derivative of the activation functions along with the multiply and accumulate sums are used to determine the perceptron output error. It is this error that is used to adjust perceptron input weights allowing the network to be trained.
  • Before neural networks can be used for predictions, the networks must be installed on a hardware platform. A difficulty in installing an artificial neural network on a hardware platform is the substantial amount of research required regarding the hardware on which the neural networks are installed because neural networks are large and use a lot of resources.
  • US 2019/087708 A1 relates to a dynamically adaptive neural network processing system includes memory to store instructions representing a neural network in contiguous blocks, hardware acceleration (HA) circuitry to execute the neural network, direct memory access (DMA) circuitry to transfer the instructions from the contiguous blocks of the memory to the HA circuitry, and a central processing unit (CPU) to dynamically modify a linked list representing the neural network during execution of the neural network by the HA circuitry to perform machine learning, and to generate the instructions in the contiguous blocks of the memory based on the linked list.
  • US 2019/147342 A1 relates to a processing circuitry for a deep neural network can include input/output ports, and a plurality of neural network layers coupled in order from a first layer to a last layer, each of the plurality of neural network layers including a plurality of weighted computational units having circuitry to interleave forward propagation of computational unit input values from the first layer to the last layer and backward propagation of output error values from the last layer to the first layer.
  • WO 2019/085655 A1 relates to an information processing method and a terminal device. The method comprises: acquiring first information, wherein the first information is information to be processed by a terminal device; calling an operation instruction in a calculation apparatus to calculate the first information so as to obtain second information; and outputting the second information. By means of the embodiments in the present disclosure, a calculation apparatus of a terminal device can be used to call an operation instruction to process first information, so as to output second information of a target desired by a user, thereby improving the information processing efficiency.
  • US5325464A relates to a Pyramid Learning Architecture Neurocomputer (PLAN) is a scalable stacked pyramid arrangement of processor arrays. There are six processing levels in PLAN consisting of the pyramid base, Level 6, containing N2 SYnapse Processors (SYPs), Level 5 containing multiple folded Communicating Adder Tree structures (SCATs), Level 4 made up of N completely connected Neuron Execution Processors (NEPs), Level 3 made up of multiple Programmable Communicating Alu Tree (PCATs) structures, similar to Level 5 SCATs but with programmable function capabilities in each tree node, Level 2 containing the Neuron Instruction Processor (NIP), and Level 1 comprising the Host and user interface.
  • US 2019/236009 A1 relates to systems and methods for performing wide memory operations for a wide data cache line. In some examples of the disclosed technology, a processor having two or more execution lanes includes a data cache coupled to memory, a wide memory load circuit that concurrently loads two or more words from a cache line of the data cache, and a writeback circuit situated to send a respective word of the concurrently-loaded words to a selected execution lane of the processor, either into an operand buffer or bypassing the operand buffer. In some examples, a sharding circuit is provided that allows bitwise, byte-wise, and/or word-wise manipulation of memory operation data. In some examples, wide cache loads allows for concurrent execution of plural execution lanes of the processor.
  • SUMMARY
  • In a first aspect, the present disclosure provides a process for implementing an artificial neural network via an instruction stream comprising: defining a format for instructions in the instruction stream, the format comprising an opcode, an address, and data, wherein the opcode is followed by the address, and wherein the address is followed by the data; defining a header of the instruction stream, the header comprising a learning rate and activation function parameters; creating the instruction stream using the header, the opcode, the address, and the data; and implementing the artificial neural network by providing the instruction stream to a computer processor for execution of the instruction stream; wherein the opcode comprises one or more of a layer type, a load instruction, a store instruction, a map done instruction, a set bias instruction, and a set value instruction; and wherein the address comprises one or more of an address from which to retrieve input data for input into a perceptron of the artificial neural network and an address to transmit output data from the perceptron of the artificial neural network.
  • In a second aspect, the present disclosure provides a non-transitory computer-readable medium comprising instructions that when executed by a processor execute a process for implementing an artificial neural network via an instruction stream, the process comprising: defining a format for instructions in the instruction stream, the format comprising an opcode, an address, and data, wherein the opcode is followed by the address, and wherein the address is followed by the data; defining a header of the instruction stream, the header comprising a learning rate and activation function parameters; creating the instruction stream using the header, the opcode, the address, and the data; and implementing the artificial neural network by providing the instruction stream to a computer processor for execution of the instruction stream; wherein the opcode comprises one or more of a layer type, a load instruction, a store instruction, a map done instruction, a set bias instruction, and a set value instruction; and wherein the address comprises one or more of an address from which to retrieve input data for input into a perceptron of the artificial neural network and an address to transmit output data from the perceptron of the artificial neural network.
  • In a third aspect, the present disclosure provides a system comprising: a computer processor; and a computer memory coupled to the computer processor; wherein the computer processor is configured to implement an artificial neural network via an instruction stream by: defining a format for instructions in the instruction stream, the format comprising an opcode, an address, and data, wherein the opcode is followed by the address, and wherein the address is followed by the data; defining a header of the instruction stream, the header comprising a learning rate and activation function parameters; creating the instruction stream using the header, the opcode, the address, and the data; and implementing the artificial neural network by providing the instruction stream to the computer processor that executes the instruction stream; wherein the opcode comprises one or more of a layer type, a load instruction, a store instruction, a map done instruction, a set bias instruction, and a set value instruction; and wherein the address comprises one or more of an address from which to retrieve input data for input into a perceptron of the artificial neural network and an address to transmit output data from the perceptron of the artificial neural network.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • In the drawings, which are not necessarily drawn to scale, like numerals may describe similar components in different views. Like numerals having different letter suffixes may represent different instances of similar components. Some embodiments are illustrated by way of example, and not limitation, in the figures of the accompanying drawings.
    • FIG. 1 is a block diagram illustrating an implementation of an artificial neural network via an instruction stream.
    • FIG. 2 is an illustration of an instruction format for implementing an artificial neural network.
    • FIG. 3 is an illustration of an instruction stream for implementing an artificial neural network.
    • FIG. 4 is an illustration of an instruction stream for implementing an artificial neural network that has been divided into several instruction streams.
    • FIG. 5 is an illustration of an instruction stream for implementing backward propagation in an artificial neural network.
    • FIG. 6 is a block diagram illustrating a computer architecture for implementing a forward propagation in an artificial neural network.
    • FIG. 7 is a block diagram illustrating a computer architecture for implementing a backward propagation in an artificial neural network.
    • FIG. 8 is another block diagram illustrating a computer architecture for implementing a backward propagation in an artificial neural network.
    • FIG. 9 is a block diagram illustrating operations and features of implementing an artificial neural network via an instruction stream.
    • FIG. 10 is a block diagram illustrating a general computer architecture upon which one or more of the embodiments disclosed herein can execute.
    DETAILED DESCRIPTION
  • The present disclosure relates to installing an artificial network into hardware, and an embodiment takes a different approach to address the difficulty of the vast amount of research, hardware, and other resources that it takes to implement an artificial neural network. To this end, an embodiment turns an artificial neural network into an instruction stream. Because there is no real processing difference between the neural network layer types, a very simple set of instruction opcodes can be created to execute any of the standard neural network architectures. In this embodiment, the hardware acts as a simple processor that takes in the instructions in the instruction stream and performs the multiplication and accumulation upon which a neural network is based. Simply put, the embodiment deconstructs a neural network into an instruction stream that can be executed in the hardware. In a further embodiment, the instruction stream is divided into multiple streams and the multiple streams are executed in parallel. This exploitation of the natural parallelism of a neural network permits such neural networks to run in a constrained hardware environment.
  • A first step in creating the instruction stream is to analyze the neural network itself. Specifically, the neural network is separated into its layers. All the multiplies and accumulates that execute in a particular layer can all be computed at the same time. That is, the multiplies and accumulates are data independent. Therefore, once a layer receives its input from the previous layer, all the different perceptrons in that layer can execute in parallel. This parallelism is mirrored in the creation of the instruction streams. That is, the layers of the neural network are separated, and then the perceptrons that can be grouped together are identified. In other words, instead of executing one perceptron at a time, the perceptrons are separated into groups. This grouping depends on the available hardware.
  • FIG. 1 illustrates how a single instruction stream is assembled. Referring to FIG. 1 , an embodiment includes a perceptron 110. The perceptron 110 receives input 112 and these inputs are multiplied by weights 114. The results of these multiplications are summed/accumulated and then applied to an activation function 120. This generates the output, and the output is converted to its instruction equivalent. Specifically, the instruction includes a header 121 that sets some activation values, a bias value 122 is set, and then one after another the weights 124 and inputs 126 are loaded. It is noted that the inputs 112 and weights 114 correspond to the loaded weights 124 and inputs 126. While only five weight and input pairs are illustrated in FIG. 1 , in practice, that are thousands in a real model. Once all the weights and inputs have been turned into an instruction, the multiply and accumulate are executed, and the resulting output is stored in memory on the hardware device. It is noted that the activation function is stored in the header, because the activation function is fixed and does not change. In contrast, the weights and the addresses of the inputs are different values as one goes down the stream. A processing element executes the instruction stream by multiplying the inputs by the weights and accumulating the results, and as indicated at 128, the output of the processing element is stored in memory.
  • FIG. 2 illustrates an embodiment of an instruction 200, which can be 64 bits long. A first part of the instruction is an op code 210, which determines what that instruction does. In a sense, the op code informs where in the neural network the execution is, which informs the processing engine what it needs to do. In an embodiment, the op code is eight bits long. The opcode sets the layer types. The different types of layers include an input, a fully connected convolution, a single or multiple map convolution, a pooling layer, or a classifier. As is known to those of skill in the art, a two-dimensional convolution layer is a common layer within a network, and it has different connection patterns and it recycles weights. A benefit of a convolution layer is it uses very few weights; however, it does take a lot of calculations. The instructions point to a different pattern of inputs from the previous layer. The op code changes to tell the processing element what the function of the instruction is and for what purpose the data will be used. The instruction could also relate to load, store, map done (indicates when a convolution is done), or set value (used to set register values with constants) functions to be executed by the processing element.
  • The next bits in the instruction 200 inform the processing element of address locations 220, that is, either where to find data or where to put data. In an embodiment, the address 220 is sixteen bits long. For example, if it is a LOAD instruction, these bits will inform from where to load the data. That is, the previous layer stored its data in memory, and the LOAD instruction tells the processing element where to get that data in order to start the multiplication process for the current perceptron.
  • The next value in the instruction 200 is the actual weight value 230 to be applied to the input. For example, if the instruction was a LOAD instruction, the instruction would cause the loading of the input, the weight would be obtained, and then the multiplication would be executed. The last 32 bits are also used during back propagation to inform from where to get information or where to store information. As illustrated in FIG. 3 , the instructions 200 are commingled into the instruction stream 300, which consists of a header 310 and a plurality of layers 320 and associated instructions 200. The layers 320 correspond to the multiple layers in the artificial neural network.
  • FIG. 4 illustrates an example of how an instruction stream can be divided up. Given the size of the hardware on hand, the system can have several processing elements within that hardware. Each one of the divided-up instruction streams has a header, a layer identification, and then some instructions, followed by another layer and some more instructions. Specifically, as illustrated in FIG. 4 , there is a single instruction stream 300 with three layers 320, and many instructions 200 between the layers. As further illustrated in FIG. 4 , one layer can be divided into four instructions 300A for two processing elements 310A and 310B, or two instructions 300B for four processing elements 310C, 310D, 310E, and 310F. This division of the instruction stream into many processing elements reduces the amount of time that it takes to execute that instruction stream. Each layer 320 must finish before proceeding onto the next layer. If the instructions do not divide out equally, then the processing element that has fewer instructions waits for the other processing elements to finish their execution before it goes on to the next layer, because the next layer needs all the data from the previous layer.
  • As noted above, the implementation and use of artificial neural networks involves forward and backward propagation. In an embodiment, the forward propagation instruction stream begins with the header 310. The header contains global information needed to execute the artificial neural network. Global information includes learning rates, activation functions, and other hyper parameters. Next, as alluded to above, to execute the function of a perceptron, input values, weight values, and destinations for the resulting sums are needed. LOAD opcodes are used to retrieve input data and weight values, and STORE opcodes are used to place sums in the proper memory destinations.
  • In one embodiment, the input and output values are located in the memory of a field programmable gate array (FPGA). The FPGA processing element must be provided with the FPGA memory address for each neuron value when the FPGA processing element receives the LOAD instruction. Because of FPGA internal memory limitations, the weights are stored in memory. Consequently, the value of the weight must be provided to the FPGA processing element on the same LOAD opcode. During the execution of a STORE opcode, the destination address for the resulting sum is transmitted to the FPGA processing element. The STORE opcode occurs once for each perceptron and marks the end of a multiply and accumulate operation. In most embodiments, the LOAD and STORE opcodes makeup the bulk of the instruction stream.
  • In back propagation, the instruction stream is reversed. Starting with the classifier layer whose outputs were just calculated, the system starts up the classifier at the outputs, which are already there, and the outputs are used to calculate the error, adjust the weights, and then go to the previous layer and perform the same operations. That is, during back propagation, the error is determined, the weights are adjusted based on the error, and using the instruction stream the weights are sent out of the hardware back into memory. Thereafter, when forward propagating, all the new weights are available.
  • In the case of training a neural network via back propagation, the outputs of every layer must be retained. This retention cannot be done in restricted hardware environments because there is not enough memory to store all the output data. Consequently, during forward propagation, the STORE command is used to transmit the output of a perceptron in a layer to memory. When the back propagation is executed, the needed input data must be retrieved from memory for processing by the processing element. For this purpose, as illustrated in FIG. 5 , every instruction is preceded by data 510. Consequently, for back propagation, the processing element needs the input value, the weight value, and the previous output value, which are all included in the instruction 510. While this does increase the size of the instruction stream, and slightly slows processing, it does allow for on-chip training of the neural network in hardware.
  • Referring to FIG. 6 , an embodiment of hardware architecture is illustrated. The streamed instructions for forward propagation are located in an memory 610. A direct memory access module 620 is used to access the instruction streams. The microprocessor 630 in conjunction with the direct memory access modules 620 access the instruction streams, and these instruction streams are transferred into the fabric 640 and processed by the processing elements 645. The microprocessor instructs the DMA to transfer data from one location to another, and the DMA simply transfers the data as a big block of memory into the fabric. After processing by the processing elements, the data are stored in memory 650 via the STORE command.
  • Referring to FIG. 7 , the streamed instructions for backward propagation are located in memory 710. It is noted that unlike the forward propagation of FIG. 6 , the backward propagation has the needed data 710A. The microprocessor 730 in conjunction with the direct memory access modules 720 access the instruction streams, and these instruction streams are transferred into the fabric 740 and processed by the processing elements 745. Once again, the microprocessor instructs the DMA to transfer data from one location to another, and the DMA simply transfers the data as a big block of memory into the fabric. After the calculation of the error, the error is applied to the weight to adjust the weight, and the weight is stored back into memory. FIG. 8 illustrates how the updated weights are moved through the DMA 720 back to their original locations in the forward propagation instruction stream.
  • FIG.9 is another diagram illustrating a system and process for implementing an artificial neural network via an instruction stream according to some aspects of the embodiments. FIG. 9 includes process blocks 905-961. Though arranged substantially serially in the example of FIG. 9 , other examples may reorder the blocks, omit one or more blocks, and/or execute two or more blocks in parallel using multiple processors or a single processor organized as two or more virtual machines or sub-processors. Moreover, still other examples can implement the blocks as one or more specific interconnected hardware or integrated circuit modules with related control and data signals communicated between and through the modules. Thus, any process flow is applicable to software, firmware, hardware, and hybrid implementations.
  • Referring now to FIG. 9, at 905, a header of the instruction stream is defined. The header includes a learning rate and activation function parameters. At 910, a format is defined for instructions that implement an artificial neural network via an instruction stream. As claimed, the format includes an opcode, an address, and data. As indicated at 911, the opcode can include a layer type, a load instruction, a store instruction, a map done instruction, a set bias instruction (which indicates the beginning of processor element calculations), and/or a set value instruction. As indicated at 912, as claimed, the instruction is structured such that the opcode is followed by the address, and the address is followed by the data. As further indicated at 913, in an embodiment, the address is either an address from which input data are retrieved for input into a perceptron of the artificial neural network, or an address to which output data are transmitted from the perceptron of the artificial neural network.
  • As indicated at 915, the instruction stream can include a single header and sets of neural network layer identifications. In the neural network layer identifications, each neural network layer identification is associated with one or more instructions. (See e.g., FIG. 3 ).
  • After the defining of the format at operation 910, the instruction stream is created using the opcode, the address, and the data (920). Thereafter, at 930, the artificial neural network is implemented by providing the instruction stream to a computer processor for execution of the instruction stream.
  • In a particular embodiment, as illustrated in FIG. 5 and indicated at 940, a data layer is positioned before or prior to each of the instructions. As disclosed in connection with FIG. 5 , this data layer is for use in connection with a backward propagation of the neural network.
  • As indicated at 950, the system includes a processing element. The input data are received into the processing element via a LOAD instruction, and the LOAD instruction includes an address field that indicates the neuron in the current layer to which the instruction is applied. As indicated at 951, output data are transmitted from the processing element to an memory.
  • At 960, the instruction stream is divided into several instruction streams prior to providing the instruction stream to the computer processor or processing element for execution. Then at 961, the several instruction streams are executed in parallel.
  • FIG. 10 is a block diagram illustrating a computing and communications platform 1000 in the example form of a general-purpose machine on which some or all of the system of FIG. 1 may be carried out according to various embodiments. In certain embodiments, programming of the computing platform 1000 according to one or more particular algorithms produces a special-purpose machine upon execution of that programming. In a networked deployment, the computing platform 1000 may operate in the capacity of either a server or a client machine in server-client network environments, or it may act as a peer machine in peer-to-peer (or distributed) network environments.
  • Example computing platform 1000 includes at least one processor 1002 (e.g., a central processing unit (CPU), a graphics processing unit (GPU) or both, processor cores, compute nodes, etc.), a main memory 1004 and a static memory 1006, which communicate with each other via a link 1008 (e.g., bus). The computing platform 1000 may further include a video display unit 1010, input devices 1012 (e.g., a keyboard, camera, microphone), and a user interface (UI) navigation device 1014 (e.g., mouse, touchscreen). The computing platform 1000 may additionally include a storage device 1016 (e.g., a drive unit), a signal generation device 1018 (e.g., a speaker), and a RF-environment interface device (RFEID) 1020.
  • The storage device 1016 includes a non-transitory machine-readable medium 1022 on which is stored one or more sets of data structures and instructions 1024 (e.g., software) embodying or utilized by any one or more of the methodologies or functions described herein. The instructions 1024 may also reside, completely or at least partially, within the main memory 1004, static memory 1006, and/or within the processor 1002 during execution thereof by the computing platform 1000, with the main memory 1004, static memory 1006, and the processor 1002 also constituting machine-readable media.
  • While the machine-readable medium 1022 is illustrated in an example embodiment to be a single medium, the term "machine-readable medium" may include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more instructions 1024. The term "machine-readable medium" shall also be taken to include any tangible medium that is capable of storing, encoding or carrying instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present disclosure or that is capable of storing, encoding or carrying data structures utilized by or associated with such instructions. The term "machine-readable medium" shall accordingly be taken to include, but not be limited to, solid-state memories, and optical and magnetic media. Specific examples of machine-readable media include nonvolatile memory, including but not limited to, by way of example, semiconductor memory devices (e.g., electrically programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM)) and flash memory devices; magnetic disks such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks.
  • RFEID 1020 includes radio receiver circuitry, along with analog-to-digital conversion circuitry, and interface circuitry to communicate via link 1008 according to various embodiments. Various form factors are contemplated for RFEID 1020. For instance, RFEID may be in the form of a wideband radio receiver, or scanning radio receiver, that interfaces with processor 1002 via link 1008. In one example, link 1008 includes a PCI Express (PCIe) bus, including a slot into which the NIC form-factor may removably engage. In another embodiment, RFEID 1020 includes circuitry laid out on a motherboard together with local link circuitry, processor interface circuitry, other input/output circuitry, memory circuitry, storage device and peripheral controller circuitry, and the like. In another embodiment, RFEID 1020 is a peripheral that interfaces with link 1008 via a peripheral input/output port such as a universal serial bus (USB) port. RFEID 1020 receives RF emissions over wireless transmission medium 1026. RFEID 1020 may be constructed to receive RADAR signaling, radio communications signaling, unintentional emissions, or some combination of such emissions.
  • Examples, as described herein, may include, or may operate on, logic or a number of components, circuits, or engines, which for the sake of consistency are termed engines, although it will be understood that these terms may be used interchangeably. Engines may be hardware, software, or firmware communicatively coupled to one or more processors in order to carry out the operations described herein. Engines may be hardware engines, and as such engines may be considered tangible entities capable of performing specified operations and may be configured or arranged in a certain manner. In an example, circuits may be arranged (e.g., internally or with respect to external entities such as other circuits) in a specified manner as an engine. In an example, the whole or part of one or more computing platforms (e.g., a standalone, client or server computing platform) or one or more hardware processors may be configured by firmware or software (e.g., instructions, an application portion, or an application) as an engine that operates to perform specified operations. In an example, the software may reside on a machine-readable medium. In an example, the software, when executed by the underlying hardware of the engine, causes the hardware to perform the specified operations. Accordingly, the term hardware engine is understood to encompass a tangible entity, be that an entity that is physically constructed, specifically configured (e.g., hardwired), or temporarily (e.g., transitorily) configured (e.g., programmed) to operate in a specified manner or to perform part or all of any operation described herein.
  • Considering examples in which engines are temporarily configured, each of the engines need not be instantiated at any one moment in time. For example, where the engines comprise a general-purpose hardware processor configured using software; the general-purpose hardware processor may be configured as respective different engines at different times. Software may accordingly configure a hardware processor, for example, to constitute a particular engine at one instance of time and to constitute a different engine at a different instance of time.
  • The above detailed description includes references to the accompanying drawings, which form a part of the detailed description. The drawings show, by way of illustration, specific embodiments that may be practiced. These embodiments are also referred to herein as "examples." Such examples may include elements in addition to those shown or described. However, also contemplated are examples that include the elements shown or described. Moreover, also contemplated are examples using any combination or permutation of those elements shown or described (or one or more aspects thereof), either with respect to a particular example (or one or more aspects thereof), or with respect to other examples (or one or more aspects thereof) shown or described herein.
  • In the event of inconsistent usages between this document and those documents so incorporated by reference, the usage in the incorporated reference(s) are supplementary to that of this document; for irreconcilable inconsistencies, the usage in this document controls.
  • In this document, the terms "a" or "an" are used, as is common in patent documents, to include one or more than one, independent of any other instances or usages of "at least one" or "one or more." In this document, the term "or" is used to refer to a nonexclusive or, such that "A or B" includes "A but not B," "B but not A," and "A and B," unless otherwise indicated. In the appended claims, the terms "including" and "in which" are used as the plain-English equivalents of the respective terms "comprising" and "wherein." Also, in the following claims, the terms "including" and "comprising" are open-ended, that is, a system, device, article, or process that includes elements in addition to those listed after such a term in a claim are still deemed to fall within the scope of that claim. Moreover, in the following claims, the terms "first," "second," and "third," etc. are used merely as labels, and are not intended to suggest a numerical order for their objects.
  • The above description is intended to be illustrative, and not restrictive. For example, the above-described examples (or one or more aspects thereof) may be used in combination with others. Other embodiments may be used, such as by one of ordinary skill in the art upon reviewing the above description. The Abstract is to allow the reader to quickly ascertain the nature of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. Also, in the above Detailed Description, various features may be grouped together to streamline the disclosure. However, the claims may not set forth every feature disclosed herein as embodiments may feature a subset of said features. Further, embodiments may include fewer features than those disclosed in a particular example. Thus, the following claims are hereby incorporated into the Detailed Description, with a claim standing on its own as a separate embodiment. The scope of the embodiments disclosed herein is to be determined with reference to the appended claims.

Claims (15)

  1. A process for implementing an artificial neural network via an instruction stream (300) comprising:
    defining a format for instructions in the instruction stream (300), the format comprising an opcode (210), an address (220), and data, wherein the opcode (210) is followed by the address (220), and wherein the address (220) is followed by the data;
    defining a header (310) of the instruction stream (300), the header (310) comprising a learning rate and activation function parameters;
    creating the instruction stream (300) using the header (310), the opcode (210), the address (220), and the data; and
    implementing the artificial neural network by providing the instruction stream (300) to a computer processor for execution of the instruction stream (300);
    wherein the opcode (210) comprises one or more of a layer type, a load instruction, a store instruction, a map done instruction, a set bias instruction, and a set value instruction; and
    wherein the address (220) comprises one or more of an address from which to retrieve input data for input into a perceptron of the artificial neural network and an address to transmit output data from the perceptron of the artificial neural network.
  2. The process of claim 1, wherein the instruction stream (300) comprises a single header and one or more sets of a neural network layer identifications; and wherein each neural network layer identification is associated with one or more instructions.
  3. The process of claim 1, comprising a data layer positioned prior to each of the instructions; wherein the data layer is for use in connection with a backward propagation.
  4. The process of claim 1, wherein the computer processor comprises a processing element; wherein the input data are received into the processing element via the load instruction; and wherein the load instruction comprises an address of the computer processor processing element.
  5. The process of claim 4, wherein the output data are transmitted from the processing element to a memory.
  6. The process of claim 1, comprising dividing the instruction stream (300) into a plurality of instruction streams prior to providing the instruction stream (300) to the computer processor.
  7. The process of claim 6, wherein the plurality of instruction streams is executed in parallel.
  8. A non-transitory computer-readable medium comprising instructions that when executed by a processor execute a process for implementing an artificial neural network via an instruction stream (300), the process comprising:
    defining a format for instructions in the instruction stream (300), the format comprising an opcode (210), an address (220), and data, wherein the opcode (210) is followed by the address (220), and wherein the address (220) is followed by the data;
    defining a header (310) of the instruction stream (300), the header (310) comprising a learning rate and activation function parameters;
    creating the instruction stream (300) using the header (310), the opcode (210), the address (220), and the data; and
    implementing the artificial neural network by providing the instruction stream (300) to a computer processor for execution of the instruction stream (300);
    wherein the opcode (210) comprises one or more of a layer type, a load instruction, a store instruction, a map done instruction, a set bias instruction, and a set value instruction; and
    wherein the address (220) comprises one or more of an address from which to retrieve input data for input into a perceptron of the artificial neural network and an address to transmit output data from the perceptron of the artificial neural network.
  9. The non-transitory computer-readable medium of claim 8, wherein the instruction stream (300) comprises a single header and one or more sets of a neural network layer identifications; and wherein each neural network layer identification is associated with one or more instructions.
  10. The non-transitory computer-readable medium of claim 8, comprising a data layer positioned prior to each of the instructions; wherein the data layer is for use in connection with a backward propagation.
  11. The non-transitory computer-readable medium of claim 8, wherein the computer processor comprises a processing element; wherein the input data are received into the processing element via the load instruction; and wherein the load instruction comprises an address of the computer processor processing element.
  12. The non-transitory computer-readable medium of claim 11, wherein the output data are transmitted from the processing element to a memory.
  13. The non-transitory computer readable medium of claim 8, comprising instructions for dividing the instruction stream (300) into a plurality of instruction streams prior to providing the instruction stream (300) to the computer processor.
  14. The non-transitory computer-readable medium of claim 13, wherein the plurality of instruction streams is executed in parallel.
  15. A system comprising:
    a computer processor; and
    a computer memory coupled to the computer processor;
    wherein the computer processor is configured to implement an artificial neural network via an instruction stream (300) by:
    defining a format for instructions in the instruction stream (300), the format comprising an opcode (210), an address (220), and data, wherein the opcode (210) is followed by the address (220), and wherein the address (220) is followed by the data;
    defining a header (310) of the instruction stream (300), the header (310) comprising a learning rate and activation function parameters;
    creating the instruction stream (300) using the header (310), the opcode (210), the address (220), and the data; and
    implementing the artificial neural network by providing the instruction stream (300) to the computer processor that executes the instruction stream (300);
    wherein the opcode (210) comprises one or more of a layer type, a load instruction, a store instruction, a map done instruction, a set bias instruction, and a set value instruction; and
    wherein the address (220) comprises one or more of an address from which to retrieve input data for input into a perceptron of the artificial neural network and an address to transmit output data from the perceptron of the artificial neural network.
EP20771400.7A 2019-10-30 2020-08-28 Neural network instruction streaming Active EP4052188B1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US16/668,957 US11475311B2 (en) 2019-10-30 2019-10-30 Neural network instruction streaming
PCT/US2020/048465 WO2021086486A1 (en) 2019-10-30 2020-08-28 Neural network instruction streaming

Publications (2)

Publication Number Publication Date
EP4052188A1 EP4052188A1 (en) 2022-09-07
EP4052188B1 true EP4052188B1 (en) 2024-07-31

Family

ID=72470605

Family Applications (1)

Application Number Title Priority Date Filing Date
EP20771400.7A Active EP4052188B1 (en) 2019-10-30 2020-08-28 Neural network instruction streaming

Country Status (3)

Country Link
US (1) US11475311B2 (en)
EP (1) EP4052188B1 (en)
WO (1) WO2021086486A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11475311B2 (en) 2019-10-30 2022-10-18 Raytheon Company Neural network instruction streaming
CN118626152A (en) * 2024-08-14 2024-09-10 北京开源芯片研究院 Method and device for generating instruction stream, electronic equipment and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5325464A (en) 1990-05-22 1994-06-28 International Business Machines Corporation Pyramid learning architecture neurocomputer
US10872290B2 (en) 2017-09-21 2020-12-22 Raytheon Company Neural network processor with direct memory access and hardware acceleration circuits
CN108958801B (en) 2017-10-30 2021-06-25 上海寒武纪信息科技有限公司 Neural network processor and method for executing vector maximum value instruction by using same
US11468332B2 (en) 2017-11-13 2022-10-11 Raytheon Company Deep neural network processor with interleaved backpropagation
US10963379B2 (en) * 2018-01-30 2021-03-30 Microsoft Technology Licensing, Llc Coupling wide memory interface to wide write back paths
US11475311B2 (en) 2019-10-30 2022-10-18 Raytheon Company Neural network instruction streaming

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HADI ESMAEILZADEH ET AL: "Neural Acceleration for General-Purpose Approximate Programs", 2014 47TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE; [PROCEEDINGS OF THE ANNUAL ACM/IEEE INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE], IEEE COMPUTER SOCIETY, 1730 MASSACHUSETTS AVE., NW WASHINGTON, DC 20036-1992 USA, 1 December 2012 (2012-12-01), pages 449 - 460, XP058014446, ISSN: 1072-4451, ISBN: 978-0-7695-3047-5, DOI: 10.1109/MICRO.2012.48 *
MOREAU THIERRY ET AL: "SNNAP: Approximate computing on programmable SoCs via neural acceleration", 2015 IEEE 21ST INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), IEEE, 7 February 2015 (2015-02-07), pages 603 - 614, XP032744078, DOI: 10.1109/HPCA.2015.7056066 *

Also Published As

Publication number Publication date
EP4052188A1 (en) 2022-09-07
US20210133579A1 (en) 2021-05-06
WO2021086486A1 (en) 2021-05-06
US11475311B2 (en) 2022-10-18

Similar Documents

Publication Publication Date Title
US10482380B2 (en) Conditional parallel processing in fully-connected neural networks
US11308398B2 (en) Computation method
JP6348561B2 (en) System and method for multi-core optimized recurrent neural networks
CN109102065B (en) Convolutional neural network accelerator based on PSoC
CN107578099B (en) Computing device and method
WO2019018375A1 (en) Neural architecture search for convolutional neural networks
EP3685319A1 (en) Direct access, hardware acceleration in neural network
US11775832B2 (en) Device and method for artificial neural network operation
CN113435682A (en) Gradient compression for distributed training
EP4052188B1 (en) Neural network instruction streaming
KR102163209B1 (en) Method and reconfigurable interconnect topology for multi-dimensional parallel training of convolutional neural network
US20160187861A1 (en) Systems and methods to adaptively select execution modes
CN111788585A (en) Deep learning model training method and system
EP3754503A1 (en) Allocation system, method and apparatus for machine learning, and computer device
CN113272854A (en) Method and system for accelerating AI training using advanced interconnection technology
CN110929854B (en) Data processing method and device and hardware accelerator
CN111343602B (en) Joint layout and task scheduling optimization method based on evolutionary algorithm
CN114430838A (en) Processing continuous inputs using neural network accelerators
JP2022102966A (en) Information processing device and information processing method
CN109272112B (en) Data reuse instruction mapping method, system and device for neural network
JP7073686B2 (en) Neural network coupling reduction
JP5907607B2 (en) Processing arrangement method and program
EP4036811A1 (en) Combining compression, partitioning and quantization of dl models for fitment in hardware processors
WO2020113459A1 (en) Intermediate representation transformation by slice operation hoist
WO2023023043A1 (en) Machine learning consolidation

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: UNKNOWN

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20220523

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: GRANT OF PATENT IS INTENDED

RIC1 Information provided on ipc code assigned before grant

Ipc: G06N 3/04 20060101ALI20231215BHEP

Ipc: G06N 3/08 20060101ALI20231215BHEP

Ipc: G06N 3/084 20230101ALI20231215BHEP

Ipc: G06N 3/063 20060101AFI20231215BHEP

RIC1 Information provided on ipc code assigned before grant

Ipc: G06N 3/045 20230101ALN20240109BHEP

Ipc: G06N 3/084 20230101ALI20240109BHEP

Ipc: G06N 3/063 20060101AFI20240109BHEP

INTG Intention to grant announced

Effective date: 20240122

GRAJ Information related to disapproval of communication of intention to grant by the applicant or resumption of examination proceedings by the epo deleted

Free format text: ORIGINAL CODE: EPIDOSDIGR1

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: GRANT OF PATENT IS INTENDED

INTC Intention to grant announced (deleted)
INTG Intention to grant announced

Effective date: 20240315

RIC1 Information provided on ipc code assigned before grant

Ipc: G06N 3/045 20230101ALN20240301BHEP

Ipc: G06N 3/084 20230101ALI20240301BHEP

Ipc: G06N 3/063 20060101AFI20240301BHEP

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE PATENT HAS BEEN GRANTED

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

REG Reference to a national code

Ref country code: CH

Ref legal event code: EP

Ref country code: GB

Ref legal event code: FG4D

REG Reference to a national code

Ref country code: DE

Ref legal event code: R096

Ref document number: 602020034923

Country of ref document: DE

REG Reference to a national code

Ref country code: IE

Ref legal event code: FG4D

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20240723

Year of fee payment: 5

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20240822

Year of fee payment: 5

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20240820

Year of fee payment: 5