WO2023007258A1

WO2023007258A1 - Hybrid 3-dimensional optical computing accelerator engine apparatus and method

Info

Publication number: WO2023007258A1
Application number: PCT/IB2022/054268
Authority: WO
Inventors: Raghavendra SWAMY H; Iven Jose
Original assignee: Swamy H Raghavendra; Iven Jose
Priority date: 2021-07-29
Filing date: 2022-05-09
Publication date: 2023-02-02

Abstract

The present invention is related to a reconfigurable multi-dimensional hybrid optical computing accelerating apparatus consisting of a vertical polarizer or horizontal polarizer plane facing each other between the Liquid crystal units aligned to light pass through and or each liquid crystal unit is incorporated with a set of the color filter unit and vertical polarizer or horizontal polarizer plane facing each other between the Liquid crystal units aligned to light pass-through wherein a function of the liquid crystal unit is made with combinational computation execution with the optical Input (Io), and the electronic color transformation function control line (Qe) to provide an optical output (Zo), further, the function of the liquid crystal unit is capable of being reconfigurable by changing the stepping value of electronic color transformation function control line (Qe) thereby functional behavior of the LCU can be changed or reconfigured without redesigning the existing circuitry.

Description

FIELD OF INVENTION

The present invention is generally related to an architectural model of computation accelerator or processor accelerator, more particularly related to a multi-dimensional hybrid optical computing accelerator. BACKGROUND

The current generation computing chips designed and fabricated by the computing industry are by using the silicon-based electronics circuits, which has the inherent limitation of packing more number transistors per unit area, increased power and heat dissipation issues, CPU frequency scaling limitation, etc... has led to the need of other computing architecture designs based on the different technologies to break free the above limitations and meet the ever-growing need of computation speed both in scientific and business class areas.

Presently, the scientific and business class applications heavily depend on the processor computing capabilities. But any increase of frequency to speed up the computation would lead to a thermal trip which would require an expensive cooling mechanism and also if the number of cores is increased to increase the speed of computation, it would lead to the increase in physical real estate with larger die area resulting in larger form factor and the cost of production.

Furthermore, the current generation silicon-based chips which have inherent limitations would ideally be replaced by the next-generation hybrid optical-based processor chips which work at the speed of light.

The US5510665 describes the unique construction of an optoelectronic circuit element unit that can be used as a basic circuit building block like a transistor or diode. The unit optoelectronic element has sandwich layers of photodetector - Light modulator - Light source - Light modulator - photodetector in the order of the construction. Here the light

source is placed at the center of the device and the two photodetectors are placed at the outer layers. The two light modulator layers are housed between the photodetector and the light source on either side. This light modulator layer is used for modulating the light.

US6804412 describes the method of optical correlator to measure the correlation of images with the collimated light source and here the Fourier transformation is used as a mathematical model. It is targeted for specialized computing and not general-purpose computation. US7747102B2 also describes the method of optical correlator to measure the correlation of images with the collimated light source and here the Fourier transformation is used as a mathematical model. It uses image production and image capture devices in the same plane to reduce the size of the computing block. It is targeted for specialized computing and not general-purpose computation. US8610839B2 describes the Optical processing system using Twisted collimated spatial light modulator and uses the Fourier full and partial derivatives transformation as a mathematical model. It is targeted for specialized computing and not general-purpose computation.

W02014087126-PAMPH-041 describes the Optical processing system using a multilayered Twisted collimated spatial light modulator and uses the Fourier transform as a mathematical model.

In US20170248734 describes the methods and systems using an optical receiver and electro-optic methods to transmit data from integrated computational elements. It uses the Time Division Multiplication and Wave Division Multiplication in the wavelength region of O-Band, S-Band, -Band, L-band, U-band, and Near Infra region. The optical computing target is specific to remote monitoring of fluids in the oil and gas industry.

In WO2021083348 describes the optical computing device construction based on the passive optical waveguides, which are used to transmit the monochromatic optical signals. Here phase modulation and amplitude modulation has been incorporated to realize multiplication and addition operation.

In WO 02/25395 A2, the optical switching elements in an array of controllable optical switching elements are controlled in accordance with an adaptive computation to be performed on inputs to produce outputs. A light that carries the inputs and outputs is emitted to and collected from the optical switching elements. Here each of the pixels on the film can be considered a light modulating or optical switching element and the film can be considered an SLM array of the pixels. The film together with the lenses, the optical fibers, and the controller can be considered an optical switch. In US10768659B2, An optical neural network is constructed based on photonic integrated circuits to perform neuromorphic computing. In the optical neural network, matrix multiplication is implemented using one or more optical interference units, which can apply an arbitrary weighting matrix multiplication to an array of input optical signals. Nonlinear activation is realized by an optical nonlinearity unit, which can be based on nonlinear optical effects, such as saturable absorption. These calculations are implemented optically, thereby resulting in high calculation speeds and low power consumption in the optical neural network.

In the article, On-chip CMOS-compatible optical signal processor” by Lin Yang et al, an optical signal processor performing matrix-vector multiplication, which is composed of laser modulator array, multiplexer, splitter, micro ring modulator matrix, and photodetector array. 8 * 10⁷ multiplications and accumulations (MACs) per second is implemented at the clock at a clock frequency of 10 MHz. All functional units can be

ultimately monolithically integrated on a chip with the development of silicon photonics and an efficient high-performance computing system is expected in the future.

It is; therefore, the need of the hour is to have a method and apparatus for designing the computing accelerator which works based on the light instead of purely silicon transistor. It is also desirable to harness the inherent parallel capabilities of light properties to perform parallel computation. It is also desirable to have a custom parallel programming software interface support so that optical computing accelerated hardware can be used for user-level programming.

SUMMARY

A method and apparatus for designing the computing accelerator which works based on the light instead of purely silicon transistor. Here the liquid crystal unit is used as the basic compute element unit in the design of the hybrid optical computing accelerator engine. This computing accelerator engine works based on the mathematical model of light behavior. This mathematical algebra is the hybrid model of electronics and optical computing systems. Here different architectural designs of the hybrid compute engines are proposed. The integration to the existing silicon system CPU and GPU processors with glue chip interface. Finally, the software stack of application programming interface OxAPI can be used to harness the capability of the hybrid optical computing accelerator engine.

DETAILED DESCRIPTION OF THE DRAWINGS

The detailed explanation can be found under the detailed explanation section, wherein: Figure 1 is an existing silicon-based computing block;

Figure 2 is an existing silicon-based Non-hUMA APU [Non-heterogeneous Unified Memory Access Accelerated Processing Unit] computing block;

Figure 3 is an existing silicon-based hUMA APU [heterogeneous Unified Memory Access Accelerated Processing Unit] computing block;

Figure 4 is an existing artifact Liquid Crystal Unit showing the on and off light control behavior using voltage support; Figure 5 is an existing artifact Liquid Crystal Unit with the color filter block;

Figure 6 is an existing artifact of single-pixel using RGB filter [RED, GREEN, BLUE] block liquid crystal units;

Figure 7 is a proposed artifact generic mathematical model of a single Liquid Crystal Unit with electronic gate control; Figure 8 is a proposed Model-1 generic artifact mathematical model of a single Liquid Crystal Unit with electronic gate control, horizontal and vertical polarizers blocks;

Figure 8A is a proposed artifact Model-1 Subtype-1 of one-dimensional LCU compute Blocks with electronic gate control, horizontal and vertical polarizers blocks;

Figure 8B is a proposed artifact Model-1 Subtype-2 of two-dimensional LCU compute Blocks with electronic gate control, horizontal and vertical polarizers blocks;

Figure 8C is a proposed artifact Model-1 Subtype-3 of three-dimensional LCU compute Blocks with electronic gate control, horizontal and vertical polarizers blocks;

Figure 9 is a proposed Model-2 generic artifact mathematical model of a single Liquid Crystal Unit with electronic gate control, horizontal, vertical polarizers, and RGB color filters blocks;

Figure 9A is a proposed artifact Model-2 Subtype-1 of one-dimensional LCU compute Blocks with electronic gate control, horizontal, vertical polarizers, and RGB color filters blocks;

Figure 9B is a proposed artifact Model-2 Subtype-2 of two-dimensional LCU compute Blocks with electronic gate control, horizontal, vertical polarizers, and RGB color filters blocks;

Figure 9C is a proposed artifact Model-2 Subtype-3 of three-dimensional LCU compute Blocks with electronic gate control, horizontal, vertical polarizers, and RGB color filters blocks; Figure 10: Generic model of 3D optical compute accelerator unit integration to CPU based system using high-speed bus interface;

Figure 11: Generic model of 3D optical compute accelerator unit integration to GPU using high-speed bus interface;

Figure 12: Model A: Subtype 1, 3D-OCE as co-processor or co-accelerator to CPU based system using high-speed bus interface with electronic gate control, and horizontal, vertical polarizers blocks;

Figure 13: Model A: Subtype 2, 3D-OCE as co-processor or co-accelerator to GPU using high-speed bus interface with electronic gate control, and horizontal, vertical polarizers blocks; Figure 14: Model B: Subtype 1, 3D-OCE as co-processor or co-accelerator to CPU based system using high-speed bus interface with electronic gate control, horizontal, vertical polarizers, and RGB color filters blocks;

Figure 15: Model B: Subtype 2, 3D-OCE as co-processor or co-accelerator to GPU using high-speed bus interface with electronic gate control, horizontal, vertical polarizers, and RGB color filters blocks

Figure 16: software stack block diagram for the 3D-HOCA;

Figure 17: Compilation model of software stack block diagram for the 3D-HOCA.

Figure 18: OxAPI Master program dispatch on CPU and Co-processor program/Slave program dispatch on the 3D-HOCA Figure 19: CPU with 4 ALU ( Arithmetic Logic Unit ) computational block

Figure 20: GPU with multi-ALU SIMD ( Single Instruction Multiple Data ) computational blocks

Figure 21: 3D-HOCA as MIMD ( Multiple Instruction Multiple Data ) computational blocks Figure 22: 3D-HOCA with high-speed RAM hosting the Instruction and Data panel pipeline.

Figure 23: The output of 3D LCU optical computing unit #1 is spilled into two parallel inputs using a 50:50 beam splitter for parallel computation.

Figure 24: Hybrid optical NOT gate Symbol Figure 25: LCU implementation of hybrid optical NOT gate.

Figure 26: Hybrid optical AND gate Symbol

Figure 27: LCU implementation of hybrid optical AND gate.

Figure 28 Hybrid optical OR gate Symbol

Figure 29: LCU implementation of hybrid optical OR gate.

DETAILED DESCRIPTION OF THE PRESENT INVENTION

FIG. 1 shows a system that could include a processor like a single central processing unit (CPU) or it can be a multi-core processor, or network processor (NP) audio accelerator (AA) or a digital signal processor (DSP), a graphics processing unit (GPU), or Accelerated Processing Unit (APU) which has both CPU and GPU integrated which are based on silicon-based technology targeting general-purpose computation or specialized computations. These devices are connected to the input/output I/O blocks console unit like keyboard, monitor, a scanner, joystick, network connection, etc... using input/output bus (I/O) and also, they connect to memory devices like Random Access Memory (RAM), Dynamic RAM (DRAM), Static RAM (SRAM), an electrically erasable programmable ROM (EEPROM) using the system bus. The power source provides the electric power to the circuitry of the system.

FIG. 2 shows a block diagram of non-heterogeneous unified memory (non-hUMA) access accelerated processing unit (APU) computing block. It has GPU and CPU fabricated on the same silicon die. Here the GPU can be programmed to perform general-purpose computations resulting general-purpose graphics processing unit (GPGPU) accelerator to speed up the computations. Since the GPU and CPU have their RAM address space modules that are disjoint and hence accessing either of the RAM blocks would result in the addition hop via CPU or GPU block.

FIG. 3 shows a block diagram of heterogeneous unified memory (hUMA) APU computing block. Here also GPU and CPU fabricated on the same silicon die, but the GPU and CPU share the common RAM address space making it efficient GPGPU coprocessor acceleration for general-purpose computing.

FIG. 4 and FIG. 5 show the existing prior art of a liquid crystal unit (LCU), the liquid crystal is sandwiched between two glass plates coated with conducting transparent electrode material like indium tin oxide (ITO). The horizontal polarizing filter is layered below as the base of this sandwich construction and the vertical polarizing filter is layered above it. The transparent electrodes are connected to the voltage source. An unpolarized white light source unit is placed below the horizontal polarizer filter. The working principle of the LCU is, when the switch S1 is open and when the light is illuminated the horizontal polarizer allows only horizontal planer waves of the light. Fig.4 The rods in the liquid crystal without the application of the voltage will easily change horizontal to vertically polarized light and pass through the vertical polarizer filter at the top. But when the switch S1 is closed, the voltage gets applied to the conducting electrodes making the rods in the liquid crystal change the alignment thereby not having any change on horizontally polarized light passing through it. This horizontally polarized light gets blocked by the vertical polarizer filter block at the top. This simple construction of LCU with the application of voltage depicts the on and off light control behavior. The polarized white source can be of a white light-emitting diode (LED), Red Green Blue

(RGB) LED, phosphorus coated light units like CFL, and any known light generating devices or units.

FIG. 5, shows the LCU with the Red (R), Green (G), Blue (B), Cyan (C), Magenta (M), Yellow (Y) filter blocks, which can be used to control and allow the specific wavelength as the LCU output.

FIG. 6, shows the 3 LCU units with the RGB filters as a single unit with optical input as (RoGoBo), electronic gate RGB inputs (ReGeBe) controlling the 3 LCUs, and optical output as (RoGoBo).

FIG. 7 and FIG.8, shows the proposed generic mathematical model of a single liquid crystal unit with Z₀ as the optical output of the LCU, l₀ is the optical input of the LCU and Qe is the electronic gate control, which is the light or color filter transformation function using equation 1

Zo= lo * Qe. (1)

Furthermore, the mathematical model of the LCU is as follows by using “Hybrid Checker Board Eclipse Algebra” (HCBEA) or “Multi-level Color Algebra” (MCA), and the Conventions used in the HCBEA are listed below,

Constants of HCBEA algebra

Suffix “o”: Optical [ light Flow / photons] Suffix “e”: Electronic [ electron Flow ]

R = RED wavelength with step values from 1 to 255 G = GREEN wavelength with step values from 1 to 255 B = BLUE wavelength with step values from 1 to 255 M = MAGENTA with step values from 1 to 255 Y = YELLOW with step values from 1 to 255 C = CYAN with step values from 1 to 255

W - WHITE with value ( R255 + G255 ⁺ B255 )

K = BLACK / NULL wavelength with value 0 IR = INFRA RED wavelength 1 to 255 A, B, X are HCBEA Variables

1_e = Electronic control filter OFF” signal, allowing the light to pass through the LCU 0_e = Electronic control filter ON” signal, blocking the light to pass through the LCU X_e = Electronic control filter input value { W_e, R_e, G_e, B_e, C_e, M_e, Y_e, K_e }

Xo = Optical output value { W₀, R₀, G₀, B₀, C₀, M₀, Y₀, K_0, IR₀ }

X = { R, G, B, C, M, Y, IR }

A_o = { R, G, B, C, M, Y }

B_e = { R, G, B, C, M, Y }

Operators of HCBEA algebra

+ = CBEA Addition operator * = CBEA Multiplication operator bar = Complementary operation ( example : X : complementary of X )

Identities of HCBEA algebra

Black “+” addition Identity

X_o = X_o + K_o For all X = {R, G, B, C, M, Y K, IR} Note 1 : When an optical data wave X₀ gets added to NULL wavelength / Black / K₀ the output is X_o

Note 2: Comer case When X_o = K_o then the equation reduces to

X_o = X_o + Ko

= Ko + Ko

300 Xo = Ko (Which is a NULL Wavelength)

Black “*” Identity

X_o * Ke = Ke For all X = {R, G, B, C, M, Y K, IR}

305 White Identity

Wo - R_o + G_o + B_o

The sample color algebra or HCBEA algebra truth table for 1_e and 0_e identity.

310 it is an important embodiment in the invention that the HCBEA algebra is a multi-level logic color algebra executed based on the behavior of the LCU with the electronic color filter transformation control and the optical input. It uses both electronic control and optical discrete step values to support multi-level compute logic.

TABLE 1

Example 1

Both FIG. 8 Model-I and FIG. 9- Model II supports the HCBEA algebra model. In the FIG.8 Model - I LCU has the horizontal and vertical polarizers and but without RGB filter blocks. In the case of FIG. 9, Model-ll LCU has RGB filters along with the horizontal and vertical polarizers enabling increasing in the step multi-step value compared to FIG.8 Model-I.

FIG. 9 Model - I, model a single LCU has the RGB LED source and electronic step gate control, henceforth following step combination are supported,

K_o = 1 step value GRAY_o = 254 step values W_o = 1 step value

Ro = Go = Bo = Co = Mo = Yo = each 255 step values

Total Step values = K₀ (1) + GRAY₀ (254) + W₀ (1) + R₀ (255) + G₀ (255) + B₀ (255) + Co (255) + Mo (255) + Y₀ (255) + IR₀ (255). The single FIG. 8 Model-I supports 1786 step values without infrared and 2041 step values with infrared support included.

FIG. 9 Model - II, model a single LCU has the RGB LED source and electronic step gate control acts as the color filter transformation function on each optical input, henceforth following step combination are supported,

K_o = 1 step value; W_o = 1 step value;

GRAY_o = 254 step values;

Ro = Go = Bo = Co = Mo = Yo = IRo each 255 step values R_e = G_e = B_e = C_e = M_e = Y_e = IRe each 255 step values on the application color filter transformation function (CFTF);

Step values using R_e CFTF = ( K₀ (1) + GRAY₀ (254) + W₀ (1) + R₀ (255) + G₀ (255) + B_o (255) + C_o (255) + M₀ (255) + Y₀ (255) ) ^* R_e (255) = (1786 ^*255 ) = 455430 step values. Hence total step values for the FIG. 9 Model II single LCU = R_e CFTF + G_e CFTF + B_e CFTF + C_e CFTF + M_e CFTF + Y_e CFTF + R_e CFTF + GRAY_e CFTF = 3643440 step combinations without IR_e, and with IR_e it supports 4098870.

The existing transistor operating based on the binary logic would require 18,21,720 transistors to simulate FIG.9 Model II single LCU steps. The two-parallel side by side LCU can simulate 13.27465503 Tera step combination and compute cube of 4 X 4 X 4 LCU can simulate 9.642417942847 X 10¹⁰⁴ Step combinations.

The FIG. 8A Model-I, Subtype-I, describes the one-dimensional hybrid optical computing block with 4 rows X 1 column vertically stacked LCUs in which the vertical polarizer planes slits of the two adjacent layers are aligned so has to light pass through them and similarly the horizontal polarizer plane slits of the two adjacent plane are aligned to light pass through them. This creates 4 X 1 computing LCU blocks which can be electronically controlled individually by using Q_ei, Q_e2, Q_e3, and Q_e4 lines to result in the collective optical computational sub outputs Z_0i, Z₀₂, Z_03, and final result Z₀₄ from the optical input l_0i based on mathematical model HCBEA. The below single RGB light source unit would drive input source l_0i and the RGB color sensor unit at the top of Z₀₄ output is used to detect optical result Z₀₄.

The FIG. 8B Model-1, Subtype-2, describes the two-dimensional hybrid optical computing block with 4 rows X 4 column vertically stacked LCUs in which the vertical polarizer planes slits of the two adjacent layers are aligned so has to light pass through them and similarly the horizontal polarizer plane slits of the two adjacent planes are aligned to light pass through them. This creates 4 X 4 computing LCU grid blocks which can be electronically controlled individually by using Q_en, Q_ei2, Qei3, Qei4, Qe2i, Qe22, Qe23, Qe24, Qe3i, Qe32, Qe33, Qe34, Qe4i, Qe42, Qe43, and Q_e44 lines to result in the collective optical computational outputs Z₀14, Z₀24, Z₀34, and Z₀44 from the optical inputs loi , lo2, lo3 and l₀4 based on mathematical model HCBEA. The 4 individual RGB light sources below the first layer of LCUs would drive input source l_0i , I₀2, l₀3, lo4, and the 4 individual RGB color sensors above the Z₀14, Z₀24, Z₀34, Z₀44 outputs are used to detect computed optical results.

The FIG. 8C Model-I, Subtype-3, describes the three-dimensional hybrid optical computing block with 4 X 4 X 4 stacked LCUs in which the vertical polarizer planes slits of the two adjacent layers are aligned so has to light pass through them and similarly the horizontal polarizer plane slits of the two adjacent planes are aligned to light pass through them. This creates 4 X 4 X 4 computing LCU grid blocks which can be electronically controlled individually by using Q_eooo, Qeooi, Q_eoo2, Q_eoo3, Qeioo, Q_eioi,

Qe102> Qe103> Qe200> Qe201 > Qe202> Qe203> Qe300> Qe301 > Qe302> Qe303> Qe010> Qe011 > Qe012> Qe013> Qe110> Qe111 > Qe112> Qe113> Qe210> Qe211 > Qe212> Qe213> Qe310> Qe311 > Qe312> Qe313> Qe020> Qe021 > Qe022> Qe023> Qe120> Qe121 > Qe122> Qe123, Qe220> Qe221 > Qe222> Qe223> Qe320> Qe321 > Qe322> Qe323> Qe030> Qe031 > Qe032> Qe033> Qe130> Qe131 > Qe132> Qe133> Qe230> Qe221 >

Qe232, Qe233, Qe33o, Qe33i , Qe332 and Q_e333 lines to result in the collective optical computational outputs Z_o030> Z_o031 , Z_o032> Z_o033> Z₀130, Z₀131 , Z₀132, Z₀133, Z_o230, Z₀231 , Z₀232 , Zo233> Z_o330, Z₀331 , Z₀332 and Z₀333 frOITI the optical inputs l_o000> Io001 > Io002> Io003> Io100> Io101. I0102, I0103, I0200, I0201 , I0202, Io203, Io3oo, Io3oi , Io302 and l_O303 based on mathematical model HCBEA. The 4 X 4 individual RGB light source below the first layer of LCUs would drive

input source l₀ooo, I0001 , I0002, I0003, I0100, I0101 , I0102, I0103, I0200, I0201 , I0202, I0203, I0300, I0301 , Io302, and Io303. The 4 X 4 individual RGB color sensor at the top of Z₀o3o, Z₀o3i , Z₀032,

Z_o033> Z₀130, Z₀131 , Z₀132, Z₀-|33, Z_o230> Z₀231 , Z₀232> Z₀233> Z_o330, Z₀331 , Z₀332, and Z₀333 outputs is used to detect computed optical results. Furthermore, by using FIG.8 Model -I as the basic building block it can be further scaled up to a multi-dimensional grid hybrid optical computational unit.

The FIG. 9A Model-2, Subtype-I, describes the one-dimensional hybrid optical computing block with 4 rows X 1 column vertically stacked using FIG. 9 Model-2: LCUs with RBG filter block included. Here the vertical polarizer planes slits of the two adjacent layers are aligned so has to light passes through them and similarly the horizontal polarizer plane slits of the two adjacent planes are aligned to light pass through them. This creates 4 X 1 computing LCU blocks which can be electronically controlled individually by using Q_ei, Q_e2, Qe3_, and Q_e4 lines to result in the collective optical computational sub outputs Z₀₁, Z₀₂, Z_03, and final result Z₀₄ from the optical input l₀₁ based on mathematical model HCBEA. The below single RGB light source would drive input source l₀₁ and the RGB color sensor above the Z₀4 output is used to detect optical result Z_o4. The FIG. 9B Model-2, Subtype-2, describes the two-dimensional hybrid optical computing block with 4 rows X 4 columns vertically stacked using FIG. 9 Model- 2: LCUs with RBG filter block included. Here the vertical polarizer planes slits of the two adjacent layers are aligned so has to light pass through them and similarly the horizontal polarizer plane slits of the two adjacent planes are aligned so as to light pass through them. This creates 4 X 4 computing LCU grid blocks which can be electronically controlled individually by using Q_e11 , Q_e12, Q_e13, Q_e14, Q_e21, Q_e22, Q_e23, Q_e24, Q_e31, Q_e32, Q_e33, Q_e34, Q_e41, Q_e42, Q_e43, and Q_e44 lines to result in the collective optical

computational outputs Zou, Z₀₂₄, Z₀₃₄, and Z₀₄₄ from the optical inputs l₀₁, I₀₂, 1₀₃ and l₀₄ based on mathematical model HCBEA. The 4 individual RGB light sources below the first layer of LCUs would drive input source l₀₁, I₀₂, l₀₃, l_o4, and the 4 individual RGB color sensors above the Z₀₁₄, Z₀24, Z₀34, Z₀44 outputs are used to detect computed optical results.

The FIG. 9C Model-2, Subtype-3, describes the three-dimensional hybrid optical computing block with 4 X 4 X 4 vertically stacked using FIG. 9 Model- 2: LCUs with RBG filter block included. Here the vertical polarizer planes slits of the two adjacent layers are aligned so has to light pass through them and similarly the horizontal polarizer plane slits of the two adjacent planes are aligned to light pass through them. This creates 4 X 4 X 4 computing LCU grid blocks which can be electronically controlled individually by using Q_e000> Qe001 > Qe002> Qe003> Qe100> Qe101 > Qe102> Qe103> Qe200> Qe201 > Qe202> Qe203> Qe300> Qe301 > Qe302> Qe303> Qe010> Qe011 > Qe012> Qe013> Qe110> Qe111 > Qe112>

Qe113i Qe210> Qe211 > Qe212> Qe213> Qe310> Qe311 > Qe312> Qe313> Qe020> Qe021 > Qe022> Qe023>

Qe120> Qe121 > Qe122> Qe123> Qe220> Qe221 > Qe222> Qe223> Qe320> Qe321 > Qe322> Qe323> Qe030>

Qe031 > Qe032> Qe033> Qe130> Qe131 > Qe132> Qe133> Qe230> Qe221 > Qe232> Qe233> Qe330> Qe331 >

Qe332 and Q_e333 lines to result in the collective optical computational outputs Z₀o3o, Z₀o3i , Z_o032, Z_o033 , Z₀130, Z₀131 , Z₀132, Z₀133, Z_o230> Z₀231 , Z₀232> Z₀233> Z_o330, Z₀331 , Z₀332 and Z₀333 from the optical inputs l_o000. Io001 > Io002> Io003> Io100, Io101. Io102, Io103> Io200> Io201 > Io202> Io203> Io3oo, Io3oi , Io302 and Io303 based on mathematical model HCBEA. The 4 X 4 individual RGB light source below the first layer of LCUs would drive input source Uoo, I₀001 , I₀002,

Io003> Io100> Io101. Io102> Io103> Io200> Io201 > Io202> Io203> Io300> Io301 > Io302, and l_o303- The 4 X 4 individual RGB color sensor above the Z₀o3o, Z₀o3i , Z₀032, Z₀033, Z₀130, Z₀i3i , Z₀132, Z₀133, Zo23o, Zo23i , Zo232, Z₀233, Z₀33o, Z₀33i , Z₀332, and Z₀333 outputs are used to detect computed optical results.

Furthermore, by using FIG.9 Model -2 as the basic building block it can be further scaled up to a multi-dimensional grid hybrid optical computational unit. The FIG.8 Model-1 and FIG.9 Model-2 LCUs which works based on the HCBEA mathematical model can be used to design basic logical blocks like AND, OR, NOT, NOR, NAND, EXOR, and EX-NOR, in turn, be used to construct complex computational blocks. This in turn can be used to build complex integer and floating-point arithmetic operators +, -, ^*, and /.

The main advantage is, the computational processor designed by using FIG.8 Model-1 and FIG.9 Model-2 type LCUs is that it can be easily reconfigured by changing the color filter transformation line Q_e(x,x,x) to perform different types of operation. This leads to the reconfigurable processor which is a unique feature when compared to the existing silicon-based processors.

FIG. 10 shows the generic model of 3D hybrid optical Compute Accelerator Unit (3D- HOCA) integration to CPU-based system using a high-speed bus interface. The silicon processor would act as the master and the 3D-HOCA unit acts as the co-accelerator. Here the silicon processor can offload the computation workload of arithmetic, logical, relational, string processing to a reconfigurable multi-dimensional hybrid optical computing LCU or 3D-HOCA unit to increase the speed of computation. The high-speed bus interface unit can be integrated with industry-standard bus architectures like Hyper transport, InfiniBand, and compute express link for high-speed communication between the silicon processor and the 3D-HOCA unit for the data and instruction transfer. The 3D-HOCA unit has three blocks, the first block is the 3D hybrid optical Compute based on either FIG.8 Model-1 or FIG.9 Model-2 or it can be based on the mix of both FIG.8 Model-1 and FIG.9 Model-2 type LCUs which is the heart of the optical computation engine. The RGB LED array would act as the optical input source and the color sensor array acts to interface light and the electronic signal conversion of the computed result. The second block is a hybrid optical processor control unit, it is responsible for the complete control of functioning of 3D-HOCA by using command instructions and it is a silicon-based block. The third block being the hybrid optical processor glue chip, which helps in the seamless conversion of optical to electronic signals and vice versa, it is also an edge-triggered module driving three optoelectronic sub-components of 3D-HOCA. The three sub-components are the 2D dimensional LED Data/Control grid panel, the 3Dimesional LCD Data/Control grid panel, and finally, a 2D color sensor array like CCD (charge-coupled device) which can convert light into electrical signals. The glue chip also incorporates a lookup table or a base (2) to base (256) and base (256) to base (2) convertor for seamless integration. The other units are the industry standard subunits as explained in FIG.1. FIG. 11 shows a generic model of 3D hybrid optical Compute Accelerator Unit (3D- HOCA) integration to GPU using a high-speed bus interface. The silicon-based Graphics Processing Unit (GPU) would act as the master and the 3D-HOCA unit acts as the co-accelerator. Here the GPU can offload the computation work to a 3D-HOCA unit to increase the speed of computation. The high-speed bus interface unit can be integrated with the industry-standard bus architectures like Hyper transport, InfiniBand, and compute express link for high-speed communication between the GPU and the 3D- HOCA unit for the data and instruction transfer. The 3D-HOCA unit has three blocks, the first block is the 3D hybrid optical Compute based on either FIG.8 Model-1 or FIG.9 Model-2 or it can be based on the mix of both FIG.8 Model-1 and FIG.9 Model-2 type LCUs which is the heart of the optical computation engine. The RGB LED array would act as the optical input source and the color sensor array acts to interface light and the electronic signal conversion of the computed result. The second block is a hybrid optical processor control unit, it is responsible for the complete control of functioning of 3D- HOCA by using command instructions and it is a silicon-based block. The third block being the hybrid optical processor glue chip, which helps in the seamless conversion of optical to electronic signals and vice versa, it is also a silicon-based block. The interesting aspect of the GPU interface is used GPU MEMORY which stores the frame buffer block that is will be used to display on the visual display unit like monitors, the

same RAM content in the form of frame buffer can be used as the data input computations source for 3D-HOCA.

The FIG.12 Model A subtype-1 uses the FIG.8 Model-1 LCUs which is the heart of the optical computation engine. The RGB LED array would act as the optical input source and the color sensor array acts to interface light and the electronic signal conversion of the computed result. The second block is a hybrid optical processor control unit, it is responsible for the complete control of functioning of 3D-HOCA by using command instructions and it is a silicon-based block. The third block being the hybrid optical processor glue chip, which helps in the seamless conversion of optical to electronic signals and vice versa, it is also a silicon-based block.

The FIG.13 Model A subtype-2 uses the FIG.8 Model-1 type LCUs which is the heart of the optical computation engine. The RGB LED array would act as the optical input source and the color sensor array acts to interface light and the electronic signal conversion of the computed result. The second block is a hybrid optical processor control unit, it is responsible for the complete control of functioning of 3D-HOCA by using command instructions and it is a silicon-based block. The third block being the hybrid optical processor glue chip, which helps in the seamless conversion of optical to electronic signals and vice versa, it is also a silicon-based block.

The FIG.14 Model B subtype-1 uses the FIG.9 Model-2 type LCUs which is the heart of the optical computation engine. The RGB LED array would act as the optical input source and the color sensor array acts to interface light and the electronic signal conversion of the computed result. The second block is a hybrid optical processor control unit, it is responsible for the complete control of functioning of 3D-HOCA by using command instructions and it is a silicon-based block. The third block being the hybrid optical processor glue chip, which helps in the seamless conversion of optical to electronic signals and vice versa, it is also a silicon-based block.

Furthermore, a mix of both FIG.8 Model-1 and FIG.9 Model-2 type LCUs which is the heart of the optical computation engine.

The FIG 15 Model B subtype-2 uses the FIG.9 Model-2 type LCUs which is the heart of the optical computation engine. The RGB LED array would act as the optical input source and the color sensor array acts to interface between light and the electronic signal conversion of the computed result. The second block is a hybrid optical processor control unit, it is responsible for the complete control of functioning of 3D-HOCA by using command instructions and it is a silicon-based block. The third block being the hybrid optical processor glue chip, which helps in the seamless integration of optical to electronic signals and vice versa, it is also a silicon-based block.

Programming the 3D-HOCA model is a challenge to the existing developer to program and utilize the new hardware feature with a uniform interface. Hence new software API extensions are added to the existing programming model of the software stack for programming. The software development kit [ SDK’s] and tools can be enhancing with the additional Ox software API support for ease of programming which is similar to APU acceleration by using Open Computing language (OpenCL) work items / Heterogeneous System Architecture (HSA) work items which uses the master and slave programming model. The master program runs on the CPU to control the co-accelerator and the slave program is generally a data-intensive program run on the co-accelerator initiated via the master program. FIG.16 explains the current model and the software programming stack with the new proposed model of Optical extensions APIs [ OxAPIs ]. This OxAPIs based application program is run on CPU and is called “master programs”, which invokes the OxAPIs support library to dispatch control commands and data send/receive to/from 3D-HOCA co-processor by using software driver for the 3D-HOCA hardware. These OxAPIs library entry points are written in a generic programming

language like C/C++, which in turn invokes low-level system calls to control the hardware driver functionalities.

Software API’s For Ox Programming, the following are the proposed software Optical Programming extensions for application Programmer Interfaces [OxAPI] that can be used for programming the 3D-HOCA unit using the master program.

Therefore the line numbers from 175 to 555 the following are discussed in short, the computing accelerator architecture design by using generic LCU as the basic switching element which uses the Hybrid Checker Board Eclipse Algebra mathematical model and custom switching reconfigurable design compute block with 1 Dimension, 2 Dimension, 3 Dimension, and multi-dimension design is proposed which can interface to exiting CPU and GPU with the high-speed bus. Different models based on the type of LCU are proposed and the accelerator can be programmed by using the software application programmer interface ΌcARI'.

From the above, it is clear that the following embodiments are addressed from the above descriptions. it is an important embodiment that a reconfigurable multi-dimensional hybrid optical computing accelerator consisting of a set of liquid crystal units with vertical polarizer plane slits and horizontal polarizer plane slits are aligned so as to light pass through and a set of Red, Green, Blue filter or RGB filter panels for each liquid crystal unit. Here a set of hybrid optical processor glue chips may be used. And the liquid crystal unit is incorporated with a set of random access memory (RAM) to store the data for the optical input unit (l₀), electronic color transformation function control line (Q_e) to provide an optical output signal (Z₀) converted and stored

The liquid crystal unit is characterized by combinational computation execution of the optical Input (l₀), the electronic color transformation function control line (Q_e) to provide

an optical output (Z₀). Wherein the function of the liquid crystal unit is capable of being reconfigurable by changing the stepping value of the electronic color transformation function control line (Q_e) thereby functional behavior of can be changed or reconfigured without redesigning the existing circuitry. it is another important embodiment in the invention that the hybrid optical processor glue chip is integrated to perform conversion of optical signals to electronic signals and vice versa.

Yet the other embodiment in the invention is that the reconfigurable multi-dimensional hybrid optical computing accelerator is supported by Optical extensions software APIs [ OxAPIs ] wherein glue chip is activated by a common clock signal for synchronization in the conversion of optical signals to electronic signals and vice versa during the execution of an instruction.

And the reconfigurable multi-dimensional hybrid optical computing accelerator may be integrated with a beam splitter to split the single optical signal into multiple optical signals.

Using the hybrid optical logic gates based on the HCBEA algebra, complex combinatorial circuits can be formed like adders, subtractors, etc... It requires periodic repeaters / optical amplifiers to overcome the LCU light attenuation.

The OxAPI software API list,

OxDiscoverSystemBuffer()

OxMarkReserveSystemBuffer()

OxlnitializeSystemBuffer()

OxDiscoverDevices()

OxMapDiscoveredDevicesToSystemBuffer()

OxMapOnlySpecificDevicesToSystemBuffer()

OxGetDeviceUsingSystemBufferQ OxSetDeviceUsingSystemBuffer()

OxGetDeviceControlWordUsingSystemBufferO

OxGetDeviceDataWordllsingSystemBufferO

OxGetDeviceStatusWordllsingSystemBufferO

OxSetDeviceControlWordUsingSystemBufferO

OxSetDeviceDataWordUsingSystemBuffer()

OxResetDevice()

OxStopDevice()

OxUnmapDeviceFromSystemBuffer()

OxUnmapAIIDeviceFromSystemBufferO

OxUnmapDeviceRangeFromSystemBuffer()

OxReleaseSystemBuffer()

The details of the API description are listed below,

OxDiscoverSystemBuffer()

This API is used to discover the System Buffer available in the machine.

General Syntax: status=OxDiscoverSystemBuffer(HardwareFeaturePtr)

Where,

Status - API status

HardwareFeaturePrt - Gets hardware features details of System buffer else returns NULL.

OxMarkReserveSystemBuffer():

This API is used to reserve the SystemBuffer available in the machine.

General Syntax: status=OxMarkReserveSystemBuffer(device, size, Hard wareFeaturePtr,Oxenable,SystemBufferPtr)

Where, Status - API status Device -3D-OHCA Size- SystemBufferSize [integer]

HardwareFeaturePtr-pointer which holds the device capabilities of “Device”.

OxEnable,- Enable / Disable 3D-OHCA feature [ Boolean type ]

SystemBufferPtr - Pointer to System buffer. HardwareFeaturePrt - Gets hardware features details of System buffer else returns NULL.

OxlnitializeSystemBuffer():

This API is used to initialize the SystemBuffer available in the machine. General Syntax: status=OxlnitializeSystemBuffer(SystemBufferPtr, Value)

Where,

Status - API status

SystemBufferPtr - Pointer to System buffer.

Value - Value used to initialize the System buffer.

OxDiscoverDevices()

This API is used to discover all the [Ox] devices that can be mapped to the System buffer.

General Syntax: status= OxDiscoverDevices(^*list)

Where,

Status - API status

^*list - All the Ox enabled devices discovered.

OxMapDiscoveredDevicesToSystemBuffer()

This API is used to map the discovered 3D-HOCA devices to System buffer.

General Syntax: status=OxMapDiscoveredDevicesToSystemBuffer(^*list,

SystemBufferPtr)

Where,

Status - API status

^*list - All the Ox enabled devices discovered. SystemBufferPtr - Pointer to System buffer.

OxMapOnlySpecificDevicesToSystemBuffer()

This API is used to map the selected 3D-HOCA devices to the System buffer.

General Syntax: status=OxMapOnlySpecificDevicesToSystemBuffer(^*li st, SystemBufferPtr, DeviceNumber,X,Y,Z)

Where,

Status - API status

^*list - All the Ox enabled devices discovered.

SystemBufferPtr - Pointer to System buffer.

DeviceNumber - Corresponds to direct cell address memory location in System buffer, it is of integer type. [ 0..N]

X, Y, Z -X, Y, and Z location mapping to System buffer to map a corresponding device to System buffer.

Note :

If X, Y, and Z are provided, the device number is marked [1]

If device number is provided then X, Y, Z is [-1]

OxGetDeviceUsingSystemBuffer()

This API is used to get the selected 3D-HOCA device's info into the System buffer. GeneralSyntax: status=OxGetDeviceUsingSystemBuffer(SystemBufferP tr, DeviceNumber, C,U,Z) Where,

Status - API status

SystemBufferPtr - Pointer to System buffer. DeviceNumber- Corresponds to direct cell address memory location in System buffer, it is of integer type. [ 0..N]

X, Y, Z - X, Y, and Z location mapping to System buffer to map a corresponding device to System buffer.

Note : If X Y and Z are provided, the device number is marked [1]

If device number is provided then X, Y and Z is [-1]

OxSetDeviceUsingSystemBuffer()

This API is used to set the info packet into Systembuffer and it will be transferred to the 3D-HOCA device. The info packet consists of the control and the data register file. General Syntax: status=OxSetDeviceUsingSystemBuffer(SystemBufferPt r, DeviceNumber, X,Y,Z,lnfopacket)

Where,

Status - API status SystemBufferPtr - Pointer to System buffer.

DeviceNumber- Corresponds to direct cell address memory location in System buffer, it is of integer type. [ 0..N]

Info packet - The device status, control, and the data register file of the target device.

Note :

If X, Y, and Z are provided, the device number is marked [1]

If device number is provided then X, Y, Z is [-1] OxGetDeviceControlWordUsingSystemBufferO This API is used to get the control word info packet into Systembuffer. It will be transferred from the 3D-HOCA device to Systembuffer.

General Syntax: status= OxGetDeviceControlWordUsingSystemBuffer

(SystemBufferPtr,DeviceNumber,X,Y, Z, ControlWord) Where,

Status - API status

SystemBufferPtr - Pointer to System buffer.

DeviceNumber- Corresponds to direct cell address memory location in System buffer, it is of integer type. [ 0..N] ControlWord - The control word of the target device.

Note :

If X, Y, and Z are provided the device number is marked [1 ]. If device number is provided then X, Y, Z is [-1]

OxGetDeviceDataWordUsingSystemBuffer()

This API is used to get the data word info packet into Systembuffer. It will be transferred from the 3D-HOCA device to Systembuffer.

General Syntax: status= OxGetDeviceDataWordUsingSystemBuffer

(SystemBufferPtr, DeviceNumber, C,U,Z, DataWord)

Where,

Status - API status

SystemBufferPtr - Pointer to System buffer.

DeviceNumber- Corresponds to direct cell address memory location in System buffer, it is of integer type. [ 0..N] DataWord - The data of the target device.

Note :

If X, Y, and Z are provided the device number is marked [1 ].

If device number is provided then X, Y and Z is [-1]

OxGetDeviceStatusWordUsingSystemBuffer()

This API is used to get the status word info packet into Systembuffer. It will be transferred from the 3D-HOCA device to Systembuffer.

General Syntax: status= OxGetDeviceStatusWordllsingSystemBuffer (System BufferPtr, DeviceNumber, X,Y, Z, StatusWord)

Where,

Status - API status

SystemBufferPtr - Pointer to System buffer.

DeviceNumber- Corresponds to direct cell address memory location in System buffer, it is of integer type. [ 0..N] StatusWord - The status word of the target device.

X, Y, Z -X, X, and Z location mapping to System buffer to map a corresponding device to System buffer.

Note :

If X, Y, and Z are provided the device number is marked [1 ].

If device number is provided then X, Y, Z is [-1]

OxSetDeviceControlWordUsingSystem Buffer()

This API is used to set the control word info packet into Systembuffer. It will be transferred from Systembuffer into the 3D-HOCA device.

General Syntax: status= OxSetDeviceControlWordUsingSystemBuffer (SystemBufferPtr,DeviceNumber,X,Y,Z, ControlWord)

Where,

Status - API status SystemBufferPtr - Pointer to System buffer.

ControlWord - The Control word package or “eop” slave program code file of the target device. X, Y, Z -X, Y, and Z location mapping to System buffer to map a corresponding device to System buffer.

Note :

If X, Y, Z are provided, the device number is marked [1]

If device number is provided then X, Y, Z is [-1]

OxSetDeviceDataWordUsingSystemBuffer()

This API is used to set the data word info packet into Systembuffer. It will be transferred from Systembuffer into the 3D-HOCA device.

General Syntax: status= OxSetDeviceDataWordUsingSystemBuffer (SystemBufferPtr, DeviceNumber, C,U,Z, DataWord)

Where,

Status - API status

SystemBufferPtr - Pointer to System buffer.

DeviceNumber- Corresponds to direct cell address memory location in System buffer, it is of integer type. [ 0..N] DataWord - The data word of the target device.

Note :

If X, Y, Z are provided, the device number is marked [1]

If device number is provided then X, Y, Z is [-1]

OxResetDevice()

This API is used to reset the target device. This is a subset of the “OxSetDeviceDControlWordllsingSystemBuffer” of API.

General Syntax: status=OxResetDevice(SystemBufferPtr,DeviceNumber, C,U,Z, ControlWord)

Where,

Status - API status

SystemBufferPtr - Pointer to System buffer.

ControlWord - The control word of the target device.

X, Y, Z - X, Y, Z location mapping to System buffer to map a corresponding device to System buffer.

Note :

If X, Y, Z are provided the device number is marked [1]

If device number is provided then X, Y, Z is [-1]

OxStopDevice()

This API is used to stop the target device. This is a subset of the “OxSetDeviceDControlWordllsingSystemBuffer” of API.

General Syntax: status=OxResetDevice(SystemBufferPtr, DeviceNumber, C,U,Z, ControlWord)

Where,

Status - API status SystemBufferPtr - Pointer to System buffer.

ControlWord - The control word of the target device. X, Y, Z - X, Y, Z location mapping to System buffer to map a corresponding device to System buffer.

Note :

If X, Y, Z are provided the device number is marked [1]

If device number is provided then X, Y, Z is [-1]

OxUnmapDeviceFromSystemBuffer()

This API is used to unmap the OxDevices from the Systembuffer.

General Syntax: status=OxUnmapDeviceFromSystemBuffer(SystemBufferPtr, DeviceNumber, C,U,Z) Where,

Status - API status

SystemBufferPtr - Pointer to System buffer.

DeviceNumber- Corresponds to direct cell address memory location in System buffer, it is of integer type. [ 0..N] X, Y, Z - X, Y and Z location mapping to System buffer to map a corresponding device to System buffer.

Note :

If X, Y, Z are provided the device number is marked [1]

If device number is provided then X, Y, Z is [-1]

OxUnmapAIIDeviceFromSystemBufferO

This API is used to unmap all the OxDevices from the Systembuffer.

General Syntax: status=OxUnmapAIIDeviceFromSystemBuffer(System BufferPtr) Where,

Status - API status

SystemBufferPtr - Pointer to System buffer.

OxUnmapDeviceRangeFromSystemBuffer() This API is used to unmap the range OxDevices from the Systembuffer.

General Syntax: status=OxUnmapDeviceRangeFromSystemBuffer(SystemBufferPtr,Devicel_istRange,X, Y, Z, TotalDevices)

Where, Status - API status

SystemBufferPtr - Pointer to System buffer.

DeviceListRange - Range of devices [ Start Range to EndRange]

X[],Y[]. Z[] - Integer Array of X, Y, Z cell address.

TotalDevices - Count of devices.

OxReleaseSystemBuffer()

This API is used to release the Systembuffer.

General Syntax: status=OxReleaseSystemBuffer(SystemBufferPtr)

Where, Status - API status

SystemBufferPtr - Pointer to System buffer.

Example:2

FIG. 17 describes the proposed software architecture for the compilation of slave program or co-processor program which runs on the 3D-HOCA accelerator. It is a two- step approach, the source program written in a generic programming language like C or 3D-HOCA GUI components are translated into the intermediate IR like LLVM IR or (Heterogeneous System Architecture) HSA IR or GCC’s GIMPLE IR or 3D-HOCA IR by using “e-Opto” compiler. At the runtime, these IRs are translated to 3D-HOCA native “e- op” Electronic-Optical hybrid native instructions and dispatched to 3D-HOCA for the computation. Here the e-opto compiler translates high-level sources to IRs and the e- opto assembler, assembles the “eop instructions” to machine instructions to be deployed on 3D-HOCA.

The electronic-Optical “eop” Instruction Set Architecture [ISA] is proposed, that can be used for computation on 3D-HOCA unit. The “eop” list,

Control Operation “eop” instruction:

1. eop-RESET < 3D-HOCA-ID >

2. eop-STOP < 3D-HOCA-ID >

Arithmetic Operation “eop” instructions:

1. eop-ADD < 3D-HOCA-ID > <Data Unit #1> <Data Unit #2>

2. eop-SUB < 3D-HOCA-ID > <Data Unit #1> <Data Unit #2>

2. eop-MUL < 3D-HOCA-ID > <Data Unit #1> <Data Unit #2>

3. eop-DIV < 3D-HOCA-ID > <Data Unit #1> <Data Unit #2>

Logical Operation “eop” instructions:

1. eop- AND < 3D-HOCA-ID > <Data Unit #1> <Data Unit #2>

2. eop-OR < 3D-HOCA-ID > <Data Unit #1> <Data Unit #2>

3. eop-NOT < 3D-HOCA-ID > <Data Unit #1>

4. eop-EXOR < 3D-HOCA-ID > <Data Unit #1> <Data Unit #2>

Data Transfer Operation “eop” instructions:

1. eop- LOAD < 3D-HOCA-ID : Address > <Data Unit #1>

2. eop-STORE < 3D-HOCA-ID : Address > <Data Unit #1> The details of the eop instruction description are listed below,

Control Operation “eop” instruction:

1. eop-RESET < 3D-HOCA-ID >

This eop code is used to reset the target 3D-HOCA device with the given device ID. It will reinitialize the LCU

Where,

< 3D-HOCA-ID > - Is the target 3D-HOCA device ID.

2. eop-STOP < 3D-HOCA-ID >

This eop code is change the target 3D-HOCA with the given device ID to stop state.

Where,

< 3D-HOCA-ID > - Is the target 3D-HOCA device ID.

Arithmetic Operation “eop” instructions: eop-ADD < 3D-HOCA-ID > <Data Unit #1> <Data Unit #2>

This eop code is to perform arithmetic addition on the target 3D-HOCA with the given device ID. Where,

< 3D-HOCA-ID > - Is the target 3D-HOCA device ID.

< Data Unit #1 > - The data unit can be Single operand, 1-Dimention, 2-Dimention or 3- Dimention operands.

< Data Unit #2 > - The data unit can be Single operand, 1-Dimention, 2-Dimention or 3- Dimention operands. eop-SUB < 3D-HOCA-ID > <Data Unit #1> <Data Unit #2>

This eop code is to perform arithmetic subtraction on the target 3D-HOCA with the given device ID.

Where,

< 3D-HOCA-ID > - Is the target 3D-HOCA device ID.

< Data Unit #2 > - The data unit can be Single operand, 1-Dimention, 2-Dimention or 3- Dimention operands. eop-MUL < 3D-HOCA-ID > <Data Unit #1> <Data Unit #2>

This eop code is to perform arithmetic multiplication on the target 3D-HOCA with the given device ID.

Where,

< 3D-HOCA-ID > - Is the target 3D-HOCA device ID.

< Data Unit #2 > - The data unit can be Single operand, 1-Dimention, 2-Dimention or 3- Dimention operands. eop-DIV < 3D-HOCA-ID > <Data Unit #1> <Data Unit #2>

This eop code is to perform arithmetic divide on the target 3D-HOCA with the given device ID. Where,

< 3D-HOCA-ID > - Is the target 3D-HOCA device ID.

< Data Unit #2 > - The data unit can be Single operand, 1-Dimention, 2-Dimention or 3- Dimention operands.

Logical Operation “eop” instructions: eop- AND < 3D-HOCA-ID > <Data Unit #1> <Data Unit #2>

This eop code is to perform logical AND on the target 3D-HOCA with the given device ID. Where,

< 3D-HOCA-ID > - Is the target 3D-HOCA device ID.

eop-OR < 3D-HOCA-ID > <Data Unit#1> <Data Unit#2>

This eop code is to perform logical OR on the target 3D-HOCA with the given device ID. Where,

< 3D-HOCA-ID > - Is the target 3D-HOCA device ID.

< Data Unit #2 > - The data unit can be Single operand, 1-Dimention, 2-Dimention or 3- Dimention operands. eop-NOT < 3D-HOCA-ID > <Data Unit#1>

This eop code is to perform logical NOT on the target 3D-HOCA with the given device ID.

Where,

< 3D-HOCA-ID > - Is the target 3D-HOCA device ID.

< Data Unit #1 > - The data unit can be Single operand, 1-Dimention, 1-Dimention, 2- Dimention or 3-Dimention operands. eop-EXOR < 3D-HOCA-ID > <Data Unit#1> <Data Unit#2>

This eop code is to perform logical EXOR on the target 3D-HOCA with the given device ID.

Where,

< 3D-HOCA-ID > - Is the target 3D-HOCA device ID.

< Data Unit #2 > - The data unit can be Single operand, 1-Dimention, 2-Dimentionor 3- Dimention operands. Data Transfer Operation “eop” instructions: eop- LOAD < 3D-HOCA-ID : Address > <Data Unit #1>

This eop code is to perform LOADING data from memory into the target 3D-HOCA with 990 the given device ID < 3D-HOCA-ID : Address >.

Where,

< 3D-HOCA-ID > - Is the target 3D-HOCA device ID.

< Data Unit #1 > - The data unit can be Single data, 1 -Dimention, 2-Dimention or 3- Dimention data.

995 eop-STORE < 3D-HOCA-ID : Address > <Data Unit #1>

This eop code is to perform STORING data from 3D-HOCA into the target memory marked < 3D-HOCA-ID : Address >.

Where,

1000 < 3D-HOCA-ID : Address > - Is the target 3D-HOCA device ID with the address.

As an example, for the master program and the associated slave program for simple addition is described below. The software template of the master program using OxAPIs 1005 Pseudo Code Program: Master Program running on the CPU to control, issue commands for data manipulation operations and data transfer on the 3D-HOCA is describe as follows,

#in elude <OxCompute.h>

1010 main ()

{

// Identify system memory. status=OxDiscoverSystemBuffer(HardwareFeaturePtr); 1015 // Setup system memory. status=OxMarkReserveSystemBuffer(device,size,HardwareFeaturePtr,Oxenabie,Syste mBufferPtr) status=OxlnitializeSystemBuffer( System BufferPtr, Value)

1020 //Identify 3D-HOCA. status= OxDiscoverDevices(^*Hst)

//Setup 3D-HOCA.

1025 status=OxMapDiscoveredDevicesToSystemBuffer(^*iist, System BufferPtr) status=OxMapOnlySpecificDevicesToSystemBuffer(^*Hst,

SystemBufferPtr,DeviceNumber,X,Y,Z)

//Issue the following

1030 // Store the Data#1 package : A ( can be unit data, 1 -Dimension, 2-Dimension and 3- Dimension ) in to 3D-HOCA status=OxSetDeviceDataWordUsingSystemBuffer(SystemBufferPtr,DeviceNumber,X,Y,

Z, A)

1035 // Store the Data#2 package : B ( can be unit data, 1 -Dimension, 2-Dimension and 3- Dimension ) in to 3D-H0CA status=OxSetDeviceDataWordUsingSystemBuffer(SystemBufferPtr,DeviceNumber,X,Y,

Z, B)

1040 //program list contains Result = A + B

sta tus= OxSe tDe vice ControlWordUsingSystemBuffer( SystemBufferPtr, De viceNumber,X, Y,Z, program.1st)

//getting the computed operation status. status=

1045 OxGetDeviceStatusWordUsingSystemBuffer(SystemBufferPtr,DeviceNumber,X, Y, Z, Statu sWord)

//getting the computed result in DataWord back to CPU RAM. status= OxGetDeviceData WordUsingSystemBuffer

1050 (SystemBufferPtr, DeviceNumber,X,Y,Z, Result)

//Free 3D-H0CA sta tus=OxUnmapDe viceFromSystemBuffer( SystemBufferPtr, De viceNumber,X, Y,Z)

//Free ALL 3D-H0CA

1055 status=OxUnmapDeviceRangeFromSystemBuffer(SystemBufferPtr,DeviceListRange,X, Y, Z, TotalDevices)

// Free Memory status=OxReieaseSystemBuffer(SystemBufferPtr)

1060

}

Pseudo Code Program for Slave or Co-Accelerator: Program running on the 3D-

FIOCA for data computation operations is describe as follows,

1065

//Pseudo Code of Program : program.1st add () { result = A + B; //A, B and Result can be single data item, 1-Dimesion, 2-Dimesion or 1070 3-Dimesion. “+”is the “eop-ADD”

} eop Assembly code of the ADDER Program : program.1st eop-RESET < 3D-HOCA-ID > eop- LOAD < 3D-HOCA-ID : Address > A 1075 eop- LOAD < 3D-HOCA-ID : Address > B eop- ADD < 3D-HOCA-ID : Address > A, B eop-STORE < 3D-HOCA-ID : Address > RESULT eop-STOP < 3D-HOCA-ID >

1080 The FIG. 18 shows the deployment of the master program running on the CPU as master device and the co-processor program running on the 3D-HOCA accelerator.

The FIG.19 shows the CPU architecture with ALU blocks, generally CPU workloads are more towards the control flow intensive programs, i.e., task-parallel programs. The CPU also supports Advanced Vector engines for vector operations. But here when the data 1085 _Set is huge, the CPU with few ALU blocks becomes squeezed up and experiences lag during computations.

FIG.20 shows the GPU architecture with SIMD (Single Instruction Multiple Data) blocks. Here a single operation is performed on the multiple data items. Due to this architecture, generally, GPU workloads are more towards the data-parallel intensive programs. 1090 where for a single clock cycle GPU can drive out more data-parallel compute results when compared to CPU compute for the same input dataset. The general purpose GPU compute can increase the performance 2X to 300X and the main reason is due to the architectural difference.

1095 FIG.21 shows the 3D-HOCA architecture with a stack of LCU blocks. Here multiple operations are performed on the multiple data items in parallel, where for a single clock cycle the 3D-HOCA can drive out more data-parallel compute results when compared to CPU and GPU compute for the same input dataset. The main reason is due to the architectural difference with massive MIMD ( Multiple Instruction Multiple Data ) block

1100 designs, where each LCU unit in the 3D-HOCA engine acts as Compute Element ( COXEL ) cell and can independently perform computation. Hence the proposed model of 3D-HOCA which is based on the MIMD mode where each LCU cell performs 1 unit of independent operation and if the size of the 3D-HOCA engine is N x N x N LCU compute units, it can perform N x N x N operations in a single clock cycle, even though

1105 silicon chip is driving the clock. Due to this architecture, this can be best suited for both the workloads task-parallel and data-parallel intensive programs.

Example: 3

FIG.22 shows the 3D-HOCA architecture with high-speed RAM hosting the Instruction and Data panel pipelines. The [Q_e]i - [Q_e]_n is the array of electronic pipeline input with N

1110 x N grid size of each panel. Each cell in the panel can host parallel data and/or instruction intended for the target computation within the 3D-HOCA is referred to as “voxel” (Compute Element). On the same lines, the [X₀]i - [X₀]_n is the array of optical pipeline input with the N x N grid size of each panel. Each cell in the panel can host in parallel data and/or instruction intended for the target computation within the 3D-HOCA

1115 voxel. The individual dedicated high-speed bus lanes #1, #2 and #3 are proposed directly linking the 3D-HOCA driver blocks so that memory bottleneck is mitigated. This high-speed RAM interfaces the 3D-HOCA by using a high-speed LED driver unit, high speed LCD driver unit, and High-speed CCD driver unit. This high-speed bus interface is also between the glue chip and the master unit. Since all these components are opt-

1120 electronic units, they are connected and synchronized by a common clock, and to mitigate the difference in switching time between them appropriate buffers are added

along with the wait cycles to maintain the synchronization with a high-speed bus architecture. This kind of pipelined architecture in the MIMD would accelerate the speed of computation.

1125 FIG.23 shows the parallel extension of a reconfigurable multi-dimensional hybrid optical computing accelerator or a 3D-HOCA architecture model for high-speed parallel computing. The reconfigurable multi-dimensional LCU or 3D LCU optical computing unit #1 does not have a CCD sensor array to capture computed optical output result, but instead, it is fed into an X: X or X: Y based beam splitter or 50:50 beam splitter to split 1130 into two parallel inputs for the 3D LCU optical computing unit #2 and 3D LCU optical computing unit #3. The 3D LCU optical computing unit #2 and 3D LCU optical computing unit #3 do not have the 2D LED array unit, but they directly use the two parallel inputs from the 50:50 beam splitter to compute the respective outputs. It can also be noted that both the 3D LCU optical computing unit #2 and 3D LCU optical 1135 computing unit #3 have their own unique set of data/instruction panel inputs for further computation in parallel. This model can be extended by using an array of beam splitters to map the outputs to an array of 3D LCU optical computing unit #N for computation in parallel. To address the problem of attenuation of the signal, periodic optical repeaters and optical amplifier models are added. This design acts as a data/instruction LCU 1140 pipeline. By striped of CCD sensor array and LED 2D array, now the LCU panels act as true optical register files and optical memory files.

FIG. 24 represents the symbol of hybrid optical NOT gate. FIG. 25 describes the implementation of a hybrid optical NOT gate using LCU. The optical output Z₀ is given by the HCBEA algebra,

1145 A, B, X are HCBEA Variables

1_e = Electronic control filter OFF” signal, allowing the light to pass through the LCU 0_e = Electronic control filter ON” signal, blocking the light to pass through the LCU The LCU implementation of hybrid optical NOT gate is given by the equation,

_ 11

Zo = (Xo Xe) = (¾ ) 1150

Where, X_e = { ON / OFF }

X_e is OFF” then Z₀ = X_0./ Light passes through LCU / 1₀ X_e is ON” then Z₀ = K₀ / Light blocked / 0_o

1155 Xo ⁼ { Ro, Go, Bo, Co, Mo, Yo, Ko }

EXAMPLE

Truth Table: Hybrid Optical NOT gate

EXAMPLE: 4

1160 FIG. 26 represents the symbol of hybrid optical AND gate. FIG. 27 describes the implementation of a hybrid optical AND gate using LCU. The optical output Z₀ is given by the HCBEA algebra,

A, B, X are HCBEA Variables

1_e = Electronic control filter OFF” signal, allowing the light to pass through the LCU 1165 0_e = Electronic control filter ON” signal, blocking the light to pass through the LCU The LCU implementation of hybrid optical AND gate is given by the equation,

Zo = (XoXel) (Xe2)

Where,

Xo - { Ro, Go, Bo }

1170 Xe - { Re, G_e, B_e }

Ke = { Re, Ge, B_e } for all X₀ ¹ X_e Truth Table: Hybrid Optical AND gate

EXAMPLE: 5

1175 FIG. 28 represents the symbol of a hybrid optical OR gate. FIG. 29 describes the implementation of a hybrid optical OR gate using LCU. The optical output Z₀ is given by the HCBEA algebra,

A, B, X are HCBEA Variables

1_e = Electronic control filter OFF” signal, allowing the light to pass through the LCU

1180 0_e = Electronic control filter ON” signal, blocking the light to pass through the LCU The LCU implementation of hybrid optical AND gate is given by the equation,

Z_o = A_o + B_o

Where,

Ae, = Pass through LCU

1185 B_e = Pass through LCU C_e = Pass through LCU

Ao ⁼ Bo ⁼ Xo ⁼ { Ro, Go, Bo, Wo, Co, Mo, Yo, K₀ )

Ao = Bo = Xo = Should be of same filter type Truth Table: Hybrid Optical OR gate

1190 The following description of the present invention is well described to express the best method of performing the invention from Example 2 of line number 840 to 1150 that

- The software framework which uses master and slave architecture, and to program the 3D-HOCA accelerator OxAPI are described in detail.

- Here the application program which runs on the 3D-HOCA accelerator uses the e-opto 1195 compiler, assembler or 4th generation GUI objects for e-opto code generation.

- Example: 1. Hybrid optical NOT Gate

2. Hybrid optical AND Gate

3. Hybrid optical OR Gate

- Comparative architecture of existing multicore, GPU SMID blocks and proposed 3D- 1200 HOCA MIMD model pipelines architecture is summarized.

While the present subject matter has been described in detail with respect to specific embodiments thereof, it will be appreciated that those skilled in the art, upon attaining an understanding of the foregoing may readily produce alterations to, variations of, and equivalents to such embodiments. Accordingly, the scope of the present disclosure is by 1205 way of example rather than by way of limitation, and the subject disclosure does not preclude inclusion of such modifications, variations and/or additions to the present subject matter as would be readily apparent to one of ordinary skill in the art.

Claims

1. A reconfigurable multi-dimensional hybrid optical computing accelerating apparatus consisting of

-a set of liquid crystal units with vertical polarizer plane slits and horizontal polarizer plane slits are aligned to light pass through and

- a set of silicon-based hybrid optical processor glue chip units activated by a common clock signal to drive LED driver, LCD driver, COLOR sensor driver, and buffer units to provide appropriate synchronization.

-a set of silicon-based hybrid optical control units for controlling the hybrid optical computing accelerating apparatus

-a set of RGB LED sources /White light sources/infrared source

-a set of CCD sensors/lnfra-red sensors

-a set of color sensor array or Infra-red array or both used to convert the computed light output result into the electronic signal

- wherein the said liquid crystal unit is incorporated with a vertical polarizer or horizontal polarizer plane facing each other between the Liquid crystal units aligned to light pass through or/and

- wherein each liquid crystal unit is incorporated with a set of the color filter unit and vertical polarizer or horizontal polarizer plane facing each other between the Liquid crystal units aligned to light pass-through,

-wherein the liquid crystal unit is incorporated with a set of random access memory (RAM) to store the data for optical input unit (l₀), electronic color transformation function control line (Q_e) to provide an optical output signal (Z₀) converted and stored,

-wherein, a function of the liquid crystal unit is characterized with combinational computation execution of the optical Input (l₀), and the electronic color transformation function control line (Q_e) to provide an optical output (Z₀)

-wherein the function of the liquid crystal unit is capable of being reconfigurable by changing the stepping value of the electronic color transformation function control line (Q_e) thereby functional behavior of the LCU can be changed or reconfigured without redesigning the existing circuitry.

2. The reconfigurable multi-dimensional hybrid optical computing accelerating apparatus as claimed in claim 1, wherein the said silicon-based hybrid optical processor glue chip is integrated to perform conversion of optical signals to electronic signals and vice versa

3. The reconfigurable multi-dimensional hybrid optical computing accelerating apparatus as claimed in claim 1, wherein the said reconfigurable multi-dimensional hybrid optical computing accelerating apparatus is further characterized to interface Optical extensions API platform for driving the said accelerator [OxAPIs ]

4. The reconfigurable multi-dimensional hybrid optical computing accelerator as claimed in claim 1, wherein the said reconfigurable multi-dimensional hybrid optical computing accelerator is integrated with an X: X or X: Y based beam splitter to split the single input optical signal into multiple optical signals.