WO2022093118A1

WO2022093118A1 - Network, method and memory cell array for region proposal

Info

Publication number: WO2022093118A1
Application number: PCT/SG2021/050649
Authority: WO
Inventors: Sumon Kumar BOSE; Arindam Basu
Original assignee: Nanyang Technological University
Priority date: 2020-10-28
Filing date: 2021-10-26
Publication date: 2022-05-05

Abstract

A region proposal network comprises: an array of memory cells configured to store a binary image, each memory cell of the array of memory cells comprising a state storage circuit comprising a plurality of transistors configured to store a binary state and a projection circuit comprising a plurality of transistors configured to allow horizontal and vertical read out of binary states of multiple cells along columns and rows of the array of memory cells, the array of memory cells further comprising: a plurality of vertical read control lines, each vertical read control line being coupled to the projection circuit of each memory cell of a respective row of the array of memory cells; a plurality of horizontal read lines, each horizontal read line being coupled to the projection circuit of each memory cell of a respective row of the array of memory cells; a plurality of horizontal read control lines, each horizontal read control line being coupled to the projection circuit of each memory cell of a respective column of the array of memory cells; and a plurality of vertical read lines, each vertical read line being coupled to the projection circuit of each memory cell of a respective column of the array of memory cells; and a controller configured to apply signals to the vertical read control lines and to determine a projection of objects in the binary image onto a horizonal axis from the resulting signals on the vertical read lines, and to apply signals to the horizontal read control lines and to determine a projection of objects in the binary image onto a vertical axis from the resulting signals on the horizontal read lines, and to generate an indication of at least one region of interest in the binary image from the projection of objects in the binary image onto the horizonal axis and the projection of objects in the binary image onto the vertical axis.

Description

NETWORK, METHOD AND MEMORY CELL ARRAY FOR REGION PROPOSAL

TECHNICAL FIELD

The present disclosure relates to in memory and near memory image processing and in particular to region proposal.

BACKGROUND

In recent years, we have witnessed a rapid proliferation of the internet of things (loT) accelerated by the emerging 5G communication technology. Among all sensors, camera contributes the most to the enormous data generation due to its ubiquity and profuseness of information. As a consequence, the processing of data from the camera provides unique challenges and opportunities. Moreover, there is a trend of offloading the processing task from the cloud to the edge of the network to lower the energy consumption by prohibiting frequent data transmission. Nevertheless, processing the data on edge demands higher computing energy. Hence, a paradigm shift is necessary for the architecture of sensors and sense-makers for energy-efficient sensing and processing.

As an alternative to the traditional camera, retinal-inspired neuromorphic vision sensors (NVS) have gained popularity due to their lower energy and bandwidth requirement and ability to separate the foreground moving objects from the less informative, redundant background. The vision sensor outputs the coordinates of a pixel following the address event representation (AER) protocol if the intensity change of that particular pixel crosses a predefined threshold. Even though the NVS finds application in gesture recognition, robotics, traffic surveillance, and object tracking, the processor needs to be always on due to the asynchronous, continuous event stream. To address this challenge, one option is to aggregate the event into the sensor memory and sample the events in a regular interval to form an event-based binary image (EBB I), which allows the processor to operate in a more energy-efficient duty- cycled manner. End applications such as gesture recognition, object detection, and tracking involve energy-intensive, dedicated deep neural network (DNN) hardware due to its state-of- the-art performance. Furthermore, EBBI very often contains noisy pixels without a valid object, or the valid object occupies minimal area compared to the full-frame size. Following these, there is a need for energy-efficient, dedicated hardware to detect the region of interests (ROIs), which triggers DNN once a valid region is detected and reduces the computation in the subsequent stages confining computing in the ROIs.

Region proposal is one of the most important image processing steps in the application of traffic surveillance, people monitoring, etc. Processing tasks like object detection and tracking on edge devices demand faster execution owing to the live video streaming and lower energy consumption to elongate the longevity of the battery- operated system.

A region proposal algorithm finds out a bounding box (bbox) encapsulating an object. Since an NVS completely separates the moving object from the stationary background, it allows us to deploy simple region proposal algorithms for the region proposal without carrying out costly CNN operations. For a binary image, the region proposal can be developed using a variant of the connecting component levelling (CCL) algorithm. However, CCL scans the image row by row (raster scan) fashion to calculate the ROIs. Hence, longer execution time and higher computing energy are still a bottleneck due to the Von Neumann architecture of the conventional processor, which involves enormous data movement between the storage and computing unit. To reduce the cost, a histogram-based approach for region proposal may be implemented.

Nevertheless, the histogram-based approach also suffers from false bounding boxes due to multiple regions on both horizontal and vertical axes. Besides, it does not provide exact coordinates of the boxes due to overlap among the projections of the objects.

SUMMARY

According to a first aspect of the present disclosure, a region proposal network is provided. The region proposal network comprises: an array of memory cells configured to store a binary image, each memory cell of the array of memory cells comprising a state storage circuit comprising a plurality of transistors configured to store a binary state and a projection circuit comprising a plurality of transistors configured to allow horizontal and vertical read out of binary states of multiple cells along columns and

5 rows of the array of memory cells, the array of memory cells further comprising: a plurality of vertical read control lines, each vertical read control line being coupled to the projection circuit of each memory cell of a respective row of the array of memory cells; a plurality of horizontal read lines, each horizontal read line being coupled to the projection circuit of each memory cell of a respective row of the array of memory cells;

10 a plurality of horizontal read control lines, each horizontal read control line being coupled to the projection circuit of each memory cell of a respective column of the array of memory cells; and a plurality of vertical read lines, each vertical read line being coupled to the projection circuit of each memory cell of a respective column of the array of memory cells; and a controller configured to apply signals to the vertical read

15 control lines and to determine a projection of objects in the binary image onto a horizonal axis from the resulting signals on the vertical read lines, and to apply signals to the horizontal read control lines and to determine a projection of objects in the binary image onto a vertical axis from the resulting signals on the horizontal read lines, and to generate an indication of at least one region of interest in the binary image from the

20 projection of objects in the binary image onto the horizonal axis and the projection of objects in the binary image onto the vertical axis.

In an embodiment, the controller is further configured to iteratively conduct a search for objects in the binary image by identifying projections of objects onto the vertical

25 axis and then determining the projection of each identified projection onto the horizontal axis.

In an embodiment, the controller comprises an x-sense module configured to identify projections of objects onto the horizontal axis and a y-sense module configured to

30 identify projections of objects onto the vertical axis.

In an embodiment, the controller is configured to apply signals to the horizontal read control lines based on the projections of objects onto the horizontal axis and to apply

3 signal to the vertical read control lines based on the projections of objects onto the vertical axis.

In an embodiment, the x-sense module comprises an x-sense edge detector and the

5 y-sense module comprises a y-sense edge detector.

In an embodiment, the x-sense module comprises plurality of x-sensing amplifiers provided before the x-sense edge detector and the y-sense module comprises a plurality of y-sensing amplifiers provided before the y-sense edge detector.

10

In an embodiment, each of the x-sense module and the y-sense module comprise a plurality of transistors between the output of consecutive x-sensing amplifiers and between the output of consecutive pairs of y-sensing amplifiers, the transistors configured to fill gaps caused by fragmentation of objects in the binary image.

15

In an embodiment, each memory cell of the array of memory cells is configured as a static random-access memory cell.

In an embodiment, each cell comprises at least 9 transistors.

20

In an embodiment, each cell is configured as a 9-transistor static random-access memory cell.

In an embodiment, the state storage circuit is configured as a pair of inverters in a

25 cross coupled configuration and the projection circuit is connected to one inverter of the pair of inverters.

According to a second aspect of the present disclosure, a region proposal method is provided. The region proposal method comprises: applying signals to vertical read

30 control lines of a memory array storing a binary image; determining a projection of objects in the binary image onto a horizonal axis of the binary image from the resulting signals on the vertical read lines of the memory array; applying signals to horizontal read control lines of the memory array; determining a projection of objects in the binary image onto a vertical axis from resulting signals on the horizontal read lines of the

4 memory array; and generating an indication of at least one region of interest in the binary image from the projection of objects in the binary image onto the horizonal axis and the projection of objects in the binary image onto the vertical axis.

5 In an embodiment, the method further comprises iteratively conducting a search for objects in the binary image by identifying projections of objects onto the vertical axis and then determining the projection of each identified projection onto the horizontal axis.

10 In an embodiment, iteratively conducting a search for objects in the binary image by identifying projections of objects onto the vertical axis and then determining the projection of each identified projection onto the horizontal axis comprises to apply signals to the horizontal read control lines of the memory array based on the projections of objects onto the horizontal axis and applying signal to the vertical read

15 control lines of the memory array based on the projections of objects onto the vertical axis.

In an embodiment, determining projections of objects in the binary image onto a horizonal axis of the binary image from the resulting signals on the vertical read lines

20 of the memory array comprises detecting edges in the resulting signals on the vertical read lines of the memory array and determining a projection of objects in the binary image onto a vertical axis from resulting signals on the horizontal read lines of the memory array comprises detecting edges in the resulting signals on the horizontal read lines of the memory array.

25

In an embodiment, the method further comprises filling gaps due to fragmentation of objects in the binary image in the resulting signals on the vertical read lines of the memory array prior to detecting edges in the resulting signals on the vertical read lines of the memory array and / or filling gaps due to fragmentation of objects in the binary

30 image in the resulting signals on the horizonal read lines of the memory array prior to detecting edges in the resulting signals on the horizonal read lines of the memory array.

According to a third aspect of the present disclosure, memory array configured to store a binary image is provided. The memory array comprises a plurality of memory cells,

5 each memory cell comprising a state storage circuit comprising a plurality of transistors configured to store a binary state and a projection circuit comprising a plurality of transistors configured to allow horizontal and vertical read out of binary states of multiple cells along columns and rows of the memory array, the memory array further

5 comprising: a plurality of vertical read control lines, each vertical read control line being coupled to the projection circuit of each memory cell of a respective row of the memory array; a plurality of horizontal read lines, each horizontal read line being coupled to the projection circuit of each memory cell of a respective row of the memory array; a plurality of horizontal read control lines, each horizontal read control line being coupled

10 to the projection circuit of each memory cell of a respective column of the memory array; and a plurality of vertical read lines, each vertical read line being coupled to the projection circuit of each memory cell of a respective column of the memory array.

In an embodiment, each memory cell of the memory array cells is configured as a

15 static random-access memory cell.

In an embodiment, each cell comprises at least 9 transistors.

In an embodiment, each cell is configured as a 9-transistor static random-access

20 memory cell.

In an embodiment, the state storage circuit is configured as a pair of inverters in a cross coupled configuration and the projection circuit is connected to one inverter of the pair of inverters.

25

BRIEF DESCRIPTION OF THE DRAWINGS

In the following, embodiments of the present invention will be described as non-limiting examples with reference to the accompanying drawings in which:

30

FIG.1 is a block diagram showing an image processing workflow incorporating a region proposal network according to an embodiment of the present invention;

6 FIG.2 is a block diagram showing a region proposal network according to an embodiment of the present invention;

FIG.3A is a circuit diagram showing a plurality of memory cells of an array of memory cells according to an embodiment of the present invention;

FIG.3B is a circuit diagram showing a memory cell of an array of memory cells according to an embodiment of the present invention;

FIG.4A is a circuit diagram showing an 11T-SRAM memory cell according to an embodiment of the present invention;

FIG.4B is a circuit diagram showing a 10T-SRAM memory cell according to an embodiment of the present invention;

FIG.5 is a binary image illustrating fragmentation of an object;

FIG.6A shows a sense circuit for edge detection and region filling according to an embodiment of the present invention;

FIG.6B shows image region filling by the sense circuit shown in FIG.6A;

FIG.7A shows a rising edge detection circuit used in an embodiment of the present invention;

FIG.7B shows edge detection on an image by the rising edge detection circuit shown in FIG.7A;

FIG.8 is a flow chart showing a method of searching for objects in a binary image according to an embodiment of the present invention;

FIG.9A to 9F illustrate an example of the application of the method shown in FIG.8 to an image having overlapping objects; FIG.10 is a graph showing energy consumption of a memory array according to an embodiment of the present invention for different numbers of objections;

FIG.11 is a graph showing energy consumption of a memory array according to an embodiment of the present invention single objects of varying sizes; and

FIG.12 shows a simulation of a region proposal network according to an embodiment of the present invention.

DETAILED DESCRIPTION

The present disclosure relates to in memory and near memory computing-based region proposal networks. FIG.1 is a block diagram showing an image processing workflow incorporating a region proposal network according to an embodiment of the present invention.

As shown in FIG.1 , the image processing workflow 10 comprises a sensor 20 which is coupled to a region proposal network 100. The region proposal network 100 is coupled to an end application 30. The sensor 20 is an image sensor and may be a neuromorphic vision sensor (NVS) which generates binary images such as event based binary images (EBBIs). The region proposal network 100 comprises a memory for storing the binary images and controller which accesses the memory and detects regions of interest (ROIs) in the binary images. The region proposal network 100 outputs indications of the ROIs, for example as bounding boxes indicating locations of objects in the binary images to the end application 30. The end application then performs image processing tasks such as gesture recognition, object detection and tracking on using the indications of ROIs received from the region proposal network 100. The end application may be implemented as a dedicated deep neural network (DNN).

FIG.2 is a block diagram showing a region proposal network according to an embodiment of the present invention. The region proposal network 100 comprises an array 110 of memory cells 120 and a controller 150. The memory cells 120 in the array 110 are configured to store binary image data received from the sensor 20 such that each memory cell 120 stores a binary value corresponding to a pixel of the binary image. The controller 150 is configured to determine projections of objects in the binary image and determine indications of regions of interest in the binary image corresponding to objects. The indications of regions of interest which may comprise bounding boxes are output by the controller 150 to the end application 30.

The array 110 of memory cells comprises W columns and H rows of cells. Each cell 120 of the array 110 is a static random access memory (SRAM) cell. Each column is coupled to a bit line (BL) and an inverse bit line (BLB) and each row is coupled to a word line (WL). The word line (WL) together with the bit line (BL) and inverse bit line (BLB) allow reading and writing of individual memory cells 120 of the array 110. Each row of the array 110 is coupled to a vertical read control (VRDC) line and each column of the array is coupled to a vertical read (VRD) line. Each column of the array 110 is coupled to a horizontal read control (HRDC) and each row of the array 110 is coupled to a horizontal read (HRD) line.

The controller 150 comprises an X-sense module 152 which is connected to the vertical read (VRD) lines of the array 110. A VRDC driver module 154 controls the signals applied to the vertical read control (VRDC) lines. The X-sense module 152 is configured to read discharge patterns of the VRD lines and determine projections of objects on a horizontal axis 112. These projections which comprise detection of a rising edge xi and a falling edge Xh are stored in a x-coordinates register 156. The controller 150 further comprises a Y-sense module 162 which is connected to the horizonal read (HRD) lines of the array 110. A HRDC driver module 164 controls the signals applied to the horizontal read control (HRDC) lines. The Y-sense module 162 is configured to read discharge patterns of the HRD lines and determine projections of objects on a vertical axis 114. These projections which comprise detection of a rising edge yi and a falling edge yh are stored in a y-coordinates register 166. The x- coordinates register 156 is coupled to the HRDC driver module 164 and feeds the x- coordinate values to the HRDC driver module 164 as part of a search procedure to identify overlapping objects in the binary image which is described in more detail below. Similarly, the y-coordinates register 166 is coupled to the VRDC driver module 154 and feeds the y-coordinate values to the as part of the search procedure. A temporary memory 170 stores x-coordinate values, y-coordinates values and pointers during the search procedure.

While proposing region of interest, the X-sense module 152 charges VRD lines to VDD in a first cycle and keeps them floating once charged. In the next period, the VRDC- driver module 154 enables VRDC lines, and the VRD lines get discharged based on the presence of the objects. The discharged pattern of the VRD lines yields a projection of the objects of an image on the horizontal axis 112, which is subsequently sensed by the X-sense module 152. Furthermore, it detects the rising and falling edge of the projected objects to determine the coordinates. Likewise, the Y-sense module 162 and HRDC-driver module 164 generate a vertical projection of the objects. The projection on the horizontal axis 112 projectionX, and on the vertical axis 114 projectionY can be expressed as:

where I(i,j) represents a pixel value at the

location.

FIG.3A is a circuit diagram showing a plurality of memory cells of an array of memory cells according to an embodiment of the present invention. As shown in FIG.3A, each cell of the array 110 comprises state storage circuit 122 and a projection circuit 124. The state storage circuit 122 stores a binary state corresponding to a pixel value of the binary image. The projection circuit 124 couples the state storage circuit 122 to the vertical read control (VRDC) line and horizontal read (HRD) line corresponding to the row of the array and the horizontal read control (HRDC) line and the vertical read (VRD) line corresponding to the column of the array.

FIG.3B is a circuit diagram showing a memory cell of an array of memory cells according to an embodiment of the present invention. As shown in FIG.3B, the memory cell comprises a state storage circuit 122 and a projection circuit 124. The projection circuit comprises three transistors: a first transistor N1 which is coupled to the state storage circuit 122, a second transistor N2 coupled to the vertical read control (VRDC) line and the vertical read (VRD) line, and a third transistor N3 coupled to the horizontal read control (HRDC) line and the horizontal read (HRD) line. The state storage circuit 122 comprises a first inverter U1 and a second inverter U2 which are connected in a cross coupled arrangement that allows storage two stable states which are used to denote 0 and 1 . Two access transistors N4 and N5 couple the stored state to the word line (WL) and the bit line (BL) and the word line (WL) and the inverse bit line (BLB) respectively. Since the first inverter U1 and the second inverter U2 are each formed from two transistors, the storage circuit 122 may be referred to as a six transistor (6T) SRAM cell. Thus, the memory cell 120 may be referred to as nine transistor (9T) SRAM cell.

It is noted that the projection circuit 124 is coupled to only one side of the storage cell formed by the inverters U1 and U2. Thus, the transistor N1 is connected to the side of the storage cell coupled to the bit line (BL) via the transistor N4, but not to the side of the storage cell coupled to the inverse bit line (BLB) via the transistor N5.

As described above, the memory cell 120 may be considered as a 9T-SRAM for vertical and horizontal read (projection of objects). It can be thought of as an extension of the standard 6T-SRAM with three extra added transistors (N1 , N2, N3) to enable readout along rows and columns. Transistors N1 and N2 enable horizontal read along the columns, and N1 and N3 enable vertical read along the rows. In general, any SRAM cell consisting of greater than or equal to 9T and capable of horizontal and vertical read can be used to calculate proposals region using the methods of the present disclosure. The cell shown in FIG.3B is the most efficient way of doing this since it uses the smallest number of transistors. Examples of other possible memory cells are shown in FIG.4A for 11T-SRAM and FIG.4B for 10T-SRAM capable of vertical and horizontal read to demonstrate that other SRAM cells can be modified to work with the region proposal algorithm. The memory cells shown in FIG.4A and FIG.4B are extensions of conventional 9T-SRAM and 8T-SRAM cells respectively.

FIG.4A is a circuit diagram showing an 11T-SRAM memory cell according to an embodiment of the present invention. As shown in FIG.4A, the memory cell 410 comprises a circuit for vertical read 412 comprising two transistors N1 and N2, and a circuit for horizontal read 414 comprising three transistors N3, N4 and N5. The circuit for vertical read 412 is coupled to connected to the side of the storage cell coupled to the bit line (BLH), but not to the side of the storage cell coupled to the inverse bit line (BLBH). The circuit for horizontal read 414 is coupled to both sides of the storage cell, by the transistors N3 and N4 respectively.

FIG.4B is a circuit diagram showing a 10T-SRAM memory cell according to an embodiment of the present invention. As shown in FIG.4B, the memory cell 420 comprises a circuit for vertical read 422 comprising two transistors N1 and N2, and a circuit for horizontal read 424 comprising two transistors N3 and N4. The circuit for vertical read 422 is coupled to connected to the side of the storage cell coupled to the inverse bit line (BLB),). The circuit for horizontal read 424 is coupled to the side of the storage cell coupled to the bit line (BL). Connections of 422 and 424 to the storage nodes can be interchanged.

One of the challenges with the neuromorphic vision sensor is object fragmentation. The fragmentation can happen due to the plane surface of an object, which does not generate many events due to a lack of contrast.

FIG.5 is a binary image illustrating fragmentation of an object. As shown in FIG.5, fragmentation 510 may occur in images 520 of cars are due to the plane surface. The fragmentation leads to an increase in the number of objects to be processed in the image frame and leads to incorrect tracking. To address the object fragmentation, an in-memory compute (IMC) based region filling is proposed to be executed while sensing the horizontal and vertical projections.

FIG.6A shows a sense circuit for edge detection and region filling according to an embodiment of the present invention. The circuit 600 forms the X-sense module 152 or the Y-sense module 162 of the controller 150. As shown in FIG.6A, the circuit 600 comprises sense amplifiers 610 the outputs of which are connected to PMOS transistors 612 to buffers 614. The buffers 614 are connected to an edge detector 620. In addition to detecting falling and rising edges of a projection and searching the object in a specified zone, the sense circuit fills fragmented objects. The fragmentation can happen due to the plane surface of an object, which does not generate many events. A prchrg signal is used to pre-charge VRD lines to VDD during horizontal projection and HRD lines to VDD during vertical projection. When the prchrg signal goes low, VRD or HRD lines get charged to VDD via the transistor 614 in the sense amp 610, and output nodes of the sense amps get discharged to 0 potential. In the next clock cycle, VRDC signals are made high. Although node A gets charged to VDD due to the discharge of the VRD line, node B remains at 0. To fill the fragmented regions, we introduce a PMOS transistor 612 between two consecutive sense amps output. If enabled, node B gets charged to VDD. The PMOS transistor 612 shorts the outputs of the sense amps instead of VRD lines to lessen the filling time and reduce the discharged energy since VRD lines are associated with larger parasitic capacitors. The number of consecutive sense amps to be connected can be programmed.

A fill signal is used to short the outputs of the consecutive sense amplifiers during the horizontal or vertical search. When fill goes low, it shorts the outputs of the consecutive sense amplifiers.

FIG.6B shows image region filling by the sense circuit shown in FIG.6A. As shown in FIG.6B, the image 630 prior to the region filling includes a fragmentation, the operation of the sense circuit fills the fragmentation to give the non-fragmented image 640.

FIG.7A shows a rising edge detection circuit used in an embodiment of the present invention. The rising edge detector 620 shown in FIG.7A corresponds to the edge detector 620 shown in FIG.6A and receives projections pj[0:W-1] of a binary image as inputs which are the outputs of the buffers 614 shown in FIG.6A. Searching zone signals (sz[0:W-1]) are stored in the temporary memory 170 and indicate searching zones of projections. The projections pj[0:W-1 ] are combined with searching zone signals (sz[0:W-1 ]) by AND gates 710 which also combine inverted and right shifted versions of the projections and searching zone signals. On the rising edge of the enXsense signal, rising edges of the projection get captured using D flip flops (DFFs) 720. A priority encoder 730 generates the lower address, xl of the object, which is stored in a register file, x-coordinates. Once the first address is read, DFF pointing that address is reset in the next clock cycle to calculate the following addresses. A decoder 740 generates the reset signal for that particular DFF. Similarly, the controller calculates other lower addresses of the objects. The circuit-level realization of a falling edge (higher coordinate, xh) detector is the same. However, the projection, pj[0:W-1 ] is combined using an AND gates with its inverted and left-shifted version. Since rising (xl) and falling edge (xh) detectors work in parallel, the whole process takes few clock cycles to complete. Likewise, lower and higher coordinates (yl, yh), along the vertical axis, are determined.

FIG.7B shows edge detection on an image by the rising edge detection circuit shown in FIG.7A. Space (horizontal axis) time representation of different signals is presented in FIG.7B to detect the rising edges of the cars. Signals Q[1 ] and Q[13] capture the lower coordinates (xl) of the cars. At the first iteration, the priority encoder generates xl=1 . Once the first DFF gets reset in the next cycle, it outputs xl=13. This process goes on until all the addresses are read.

Simple horizontal and vertical projections do not always yield exact coordinates due to the overlaps of the projections. Furthermore, they do not provide a precise number of objects if the projections are overlapped. For instance, two objects having overlapped projections generate a single rising and falling edge on the horizontal axis and two rising and falling edges on the vertical axis or vice versa. On the contrary, rising and falling edge counts of the projections on both axes are two if there is no overlap between the projections of the objects. To address this issue an iterative and selective search of objects is proposed.

FIG.8 is a flowchart illustrating a method of searching for objects in a binary image according to an embodiment of the present invention. The method 800 shown in FIG.8 is carried out by the controller 150 shown in FIG.2.

In step 802, all of the VRD lines are pre-charged by the X-sense module 152. Then in step 804, the VRDC-driver 154 enables all VRDC lines. In step 806, the projection of objects onto the horizontal axis 112 is identified by the X-sense module 152. The lower and higher (xl, xh) coordinates of the object are found out based on the rising and falling edges of the projection. In step 808, the coordinates are stored in the x- coordinates register 156. Subsequently, HRD lines get pre-charged in step 810 by the Y-sense module 162 and in step 812, the HRDC-driver 164 enables the HRDC lines corresponding to each projected object stored in x-coordinates register 156.

In step 814, the Y-sense module 162 identifies the coordinates of each object projected on the vertical axis 114. The lower and higher (yl, yh) vertical coordinates of the objects are determined using rising and falling edge detectors. In step 816 the coordinates are stored in the y-coordinates register 166.

In step 818, it is determined whether HRDC lines for all of the objects projected onto the horizontal axis have been enabled. If not, the method returns to step 810 and steps 810-816 are repeated for any remaining objects. If the HRDC lines for all objects have been enabled, the method moves to step 820 in which a horizontal search is started.

In step 820, VRD lines are charged by the Y-sense module 162, and in step 822, the VRDC-driver 154 reads out the content of the y-coordinates register 166 and enables VRDC lines corresponding to each projected object on the vertical axis 114. Subsequently, in step 824 the X-sense module 152 searches the coordinates of each object projected on the horizontal axis 112. In step 826, the coordinates are stored in the x-coordinates register 156.

In step 828, it is determined whether VRDC lines for all of the objects projected onto the vertical axis have been enabled. If not, the method returns to step 820 and steps 820-826 are repeated for any remaining objects. If the VRDC lines for all objects have been enabled, the method moves to step 830.

Previous values of x-coordinates are loaded in temp-memory when a horizontal projection is searched, which indicates the searching zones on the horizontal axis. The same applies to the vertical projection. At each search cycle, the X-sense and Y-sense module narrow down their searching zones (sz) based on the values stored in the temp-memory. In step 830, it is determined whether the number of projected objects on both axes are the same. If the number of projected objects does not match, then the method returns to step 810 and the process repeats for the next iteration. If the number of projected objects on the horizontal axis matches the number of projected objects on the vertical axis, the method moves to step 832 which is the end of the region proposal and the coordinates of the objects are outputted. The number of iterations depends on how deeply the projections of the objects are overlapped among themselves.

5 Pseudocode for an algorithm for horizonal search is set out below:

Algorithm 1 Horizontal Search

1: procedure HORIZONTAL SEARCH

10 2: Inputs: x-coordinates, y-coordinates, x-ptr;

3: Outputs: x-coordinates, y-ptr;

15

20

25

30

35

40

45

16 40: else

41 : indexX <— indexX + 1 ;

42: firstlteration <— 1;

43: goto state 4;

44: state 6

45 : if y-coordinatesh\indexY ] = 0 then

46: if indexX = indexY - 1 then

47: rpDone <— 1;

48: else

49: xSearchDone <— 1;

50: goto Vertical Search,'

51 : else

52: goto state 7;

53: End

The pseudocode in Algorithm 1 outlines a state machine for the horizontal search of the objects. The algorithm starts with initializing all the variables used in the horizontal search to 0 except prchrg at stateO. Also, the value of x-coordinates is copied into temp-memory at stateO which indicates the searching zones for horizontal search. The pointer, x-ptr, indicates the searching location along the horizontal axis. In statel, prchrg is made 0 which precharges VRD lines to VDD and pointer to the one of the searching zones is copied in indexT.

In state2, VRDC lines corresponding to one of the projected objects on vertical axis is enabled and searching zone pointed by indexT is stored in sz. EnXsense is made high to captured the projection on horizontal axis in DFF 720 at state3.

In state4, the lower and higher x-coordinates of a single object are calculated and stored. Also, y-ptr gets updated for the next vertical search. The DFF 720 for the calculated object in state4 are reset in states. Also, no object condition is checked in state5. If the projection of other objects exists, the algorithm moves to state4.

In state6, the algorithm compares the number of objects projected on both axes. If indexY-1 equates with indexX, the rpDone signal goes high, indicating the end of region proposal. If not, the algorithm checks other VRDC lines need to be enabled. If so, it moves to statel. Otherwise, the state machine for the vertical search of the objects begins. The algorithm initializes the first address of y-coordinates and x-coordinates to (0, H- 1 ) and (0, W-1 ), respectively, before entering the region proposal mode. The initial values enable all the VRDC lines along the vertical axis and search all the VRD lines along the horizontal axis for the first horizontal projection. The state machine ends when the priority encoder generates Xh = 0 (state 5), and there is no more object to enable VRDC lines along the vertical axis, y-coordinateSh[indexY] = 0 (state 6).

Furthermore, Xh = 0 at the very first iteration (state 5) indicates no object. Once the horizontal search is over, it compares the number of objects projected on both axes. If indexY-1 equates with indexX, the rpDone signal goes high, indicating the end of region proposal (state 6). Otherwise, the state machine for the vertical search of the objects begins. The algorithm for vertical search is similar to the horizontal search, as described in Algorithm 1. However, noObject detection can be ignored since the horizontal search happens first.

An example of the application of the method shown in FIG.8 and the algorithm described above will now be described with reference to FIG.9A to FIG.9F.

FIG.9A to 9F illustrate an example of the application of the method shown in FIG.8 to an image having overlapping objects. As shown in FIG.9A, the image includes three objects which are in this case, cars. A first car 910 is located in the lower half of the left-hand side of the frame, a second car 920 is located in the upper half of the righthand side of the frame and a third card 930 is located in the lower half of the righthand side of the frame.

The controller 150 initializes, x-coordinates register 156, and y-coordinates register 166 with the minimum and maximum addressable (0, 23) and (0, 12) locations of an image along the horizontal and vertical axes, respectively. We consider the top-left most position as an origin (0, 0). At the very beginning of a horizontal or vertical search, the controller 150 copies the values of the x-coordinates register 156 and y- coordinates register 166, respectively, into the temporary memory 170. During a horizontal search, x-ptr points to a location in the temporary memory 170 while y-ptr gets updated. Likewise, the same applies to y-ptr and x-ptr in a vertical search. The values of the temporary memory 170 determine the searching zone (sz) of a projection. FIG.9A illustrates a horizonal search at the beginning of the region proposal process. The VRDC lines corresponding to the values stored in the y-coordinates register 166 are enabled. Since this is the start of the process, all of the VRDC lines are enabled. Subsequently, the projection of the objects is searched along all the VRD lines indicated by the temporary memory and x-ptr. To get an overview of the locations of the object along the horizontal axis and start the searching procedure, we initialize the x-coordinates and y-coordinates with the minimum and maximum addressable locations along the horizontal and vertical axis. At the end of the first horizontal search, the controller updates the x-coordinates with the horizontal positions of two objects. One of the cars gets lost due to the overlap of the projection along the vertical axis. Nevertheless, the location of that car is found out in the subsequent searches.

Since the first horizontal search indicates two objects in the image, the algorithm demands two vertical searches to determine the y-coordinates of the objects. These are shown in FIG.9B and FIG.9C. HRDC lines corresponding to one of the objects gets enabled in each search, and y-coordinates are registered. Along with the y- coordinates, x-ptr gets updated to point out the searching zone in the next horizontal searches. At the end of the two vertical searches, y-coordinates registers the location of three objects.

In the next iteration, three horizontal searches are required. These are shown in FIG.9D, FIG.9E and FIG.9F. VRDC lines corresponding to the three objects are enabled separately, and x-coordinates of the objects are calculated. At the end of the horizontal search of FIG.9F, the number of objects in x-coordinates and y-coordinates converges the same value indicating the end of the region proposal, and they hold the final coordinates of the three cars.

Table 1 below shows for storing the addresses of the objects and pointers while executing the region proposal algorithm (N=number of objects, Image dimension = W x H).

Max{.} calculates the maximum values of its arguments. The ceiling operator, [.] rounds a number to its nearest upper integer. Since we need to store the lower and upper address of the objects, the memory requirement for x-coordinates and y- coordinates are 2N0 log200 and 2N0 log200, respectively. The total memory requirement for the proposed method can be written as:

Since the CCL algorithm labels each pixel, including the background pixels in the frame, it requires [Iog2(/V + 1)] bits to store the label of each pixel. However, the modified CCL algorithm for calculating the coordinates of the objects requires to keep the labels for two consecutive rows (present row and previous row). Therefore, the memory requirements for the CCL algorithm is expressed as:

Memories for x-coordinates and y-coordinates, temp-memory, x-ptr, y-ptr, and labels are implemented using register files. It can be seen from the above equations that the memory requirements for both approaches are almost equal.

A SPICE simulation shows that the execution time of the proposed approach is T_e = 11/V + 10 cycles for N objects having non-overlapping projections. On the contrary, the traditional CCL based-region proposal requires T_e-cct « 2WH + 6/V < W_obj >< H_obj > clock cycles to calculate the region of interests where <W_Obj> and <H_Obj> represent the average object size along the horizontal and vertical direction, respectively. 2WH clock cycles in the T_e-cci calculation involves reading and checking the value of each pixel. If the pixel belongs to an object, the state machine goes ahead to read the 4-neighborhood labels (left, top-left, top, and top-right), update the bounding box of the object and label of the pixel. This requires approximately 6/V < W_obJ- >< H_obj > clock cycles. On the other hand, if the pixel belongs to the background, the CCL controller reads the next pixel.

At W = 320, H = 240, N = 15, < W_obj >= 0.1W and < H_obj >= 0.1 H, the proposed approach takes 175 clock cycles for region proposal. Whereas, the clock cycle for CCL controller is 222720. Therefore, the proposed approach is~1272 times faster than the CCL implementation.

Table 2 below shows the number of different lines required to charge while proposing the regions ((N=number of objects, Image dimension = WxH), and <W_obj> and <H_obj> denote the average object size along the horizontal and vertical direction).

Note that if there is no overlap among the projections of the objects in the frame, the 1^st vertical and 1^st horizontal search is enough to calculate the bounding box. The same applies to the objects having overlapped projection only on the vertical axis. In the 1^st horizontal search, W VRD lines are precharged to VDD. In the next clock cycle, H VRDC lines are enabled to get the projection of the objects on the horizontal axis, and their locations are sensed after that. Similarly, in the 1^st vertical search, H HRD lines are precharged initially, and <W_obj> HRDC lines corresponding to the first object is enabled. This generates the projection of the first object on the vertical axis. Since asserting the selective <W_obj> HRDC lines discharges <H_obj> HRD lines corresponding to the first object size, in the next object search, the controller needs to precharge <H_obj> HRD lines. The l-RPN controller enables N objects separately to get their projections on the vertical axis. The total energy consumed by the memory to calculate the region of interests can be estimated as:

where C represents the unit (per SRAM cell) capacitance associated with VRD, VRDC, HRD, and HRDC lines. V is the operating supply voltage of the system.

Table 3 below shows the number of different lines required to charge while proposing the regions. Projections of the objects are overlapped on the horizontal axis.

In this scenario, Projections of the objects are overlapped on the horizontal axis, and the l-RPN controller needs 2^nd horizontal search to get the coordinates of the objects. In this case, the total energy consumed by the memory to calculate the region of interests can be calculated as:

FIG.10 is a graph showing energy consumption of a memory array according to an embodiment of the present invention for different numbers of objections. The graph shows the results of a SPICE simulation of energy consumption of a 20x32 SRAM array while proposing region for a different number of objects. Each object size in this simulation is 4x4. As expected, energy consumption increases with more objects in the image frame, and the energy consumption is higher for the objects having overlapped projection on the horizontal axis. The energy consumption of the objects having overlapped projection on the vertical axis and no overlap projection is the same.

FIG.11 is a graph showing energy consumption of a memory array according to an embodiment of the present invention single objects of varying sizes. A SPICE simulation of energy consumption and estimation of energy consumption at C=2.25fF and V=1 ,2v of a 20x32 SRAM memory array while proposing regions for a single object of various sizes. It can be seen from the figure that the energy consumption increases with the object size.

Considering the energy consumption of the l-RPN controller, the overall energy consumption can be estimated as:

E_t = E_cT_e + E_mem

The CCL algorithm-based controller reads the binary pixel and assigns a label to that pixel. Since the controller needs to store the labels of two consecutive rows, the controller needs only 2V/log2[(/V + 1)] bit memory location, which can be implemented using register files. Hence, in the total energy calculation of the CCL controller, we consider only the SRAM read energy. Therefore, the total energy of the controller can be estimated as:

E_t-ccl « E_c-cclT_e-ccl + (WH)C(W + H)V² where E_c-cci and T_e-cci denote the energy per cycle of the CCL controller and the number of clock cycles required to find out the coordinates of the objects, respectively. C is the unit capacitance associated with an SRAM cell bit line and wordline. Therefore, to read a bit, the total capacitance needs to be charged is C(W + H).

For a fair comparison and simplicity, we assume E_c-cct and E_c are equal. The microcontroller core, along with 2KB memory, consumes 33pJ/cycle and the energy number with 0.5KB memory is 14pJ/cycle at 0.45V. This translates to 7.6pJ/cycle consumed by the microcontroller core. The energy is measured at 0.45V and 180nm process node. At 1.2V and 65nm process node, the estimated energy number is 7.04pJ/cycle (energy oc V² and oc process node²).

At W=320, H=240, N=15, < W_obj >=0.1W, < H_obj >=0.1 H, C=2.25fF, E_c-ccl = E_c = 7.04pJ/cycle, the proposed approach dissipates 2.501 nJ. Whereas the CCL based controller requires 1707nJ for region proposal. Therefore, the proposed in-memory based region proposal network achieves ~682 times energy savings for calculating the coordinates of 15 objects. The estimated clock cycle, T_e and T_e-cct are 175 and 222720, respectively.

FIG.12 shows a simulation of a region proposal network according to an embodiment of the present invention. A mixed-signal SPICE simulation of the proposed region proposal network was carried out for a 20x32 SRAM memory array. The SRAM memory array is designed at 65nm CMOS process, and the digital controller is implemented in a Verilog code. Initially, an image frame having two objects at {(xl,yl),(xh,yh)}={(1 ,2), (4, 5)} and {(xl,yl),(xh,yh)}={(6,7), (9, 10)} locations are written in the SRAM memory. The size of each object is 4x4. At the first horizontal search, the X_SENSE signal goes high, which captures the projection of the object, and subsequently, the x-coordinates of the objects are calculated. Once the horizontal search is over, the l-RPN controller moves to the vertical search, where y-coordinates of the objects are calculated. At the vertical search, y-coordinates of each object are calculated separately, which can be seen from Y_SENSE signal, which is going high twice. Once the region proposal is over, the REGION_DONE signal goes high, and NUM_OBJECT<3:0> indicates the number of objects in the frame.

As described above, the proposed network exhibits the benefits of near and in-memory computing due to its highly parallel processing capability. Firstly, the proposed inmemory computing achieves ~682 times of energy-saving. Secondly, the proposed approach is ~1272 times faster than the CCL algorithm-based implementation for calculating the region of interests of 15 objects.

Whilst the foregoing description has described exemplary embodiments, it will be understood by those skilled in the art that many variations of the embodiments can be made within the scope and spirit of the present invention.

Claims

1. A region proposal network comprising: an array of memory cells configured to store a binary image, each memory cell of the array of memory cells comprising a state storage circuit comprising a plurality of transistors configured to store a binary state and a projection circuit comprising a plurality of transistors configured to allow horizontal and vertical read out of binary states of multiple cells along columns and rows of the array of memory cells, the array of memory cells further comprising: a plurality of vertical read control lines, each vertical read control line being coupled to the projection circuit of each memory cell of a respective row of the array of memory cells; a plurality of horizontal read lines, each horizontal read line being coupled to the projection circuit of each memory cell of a respective row of the array of memory cells; a plurality of horizontal read control lines, each horizontal read control line being coupled to the projection circuit of each memory cell of a respective column of the array of memory cells; and a plurality of vertical read lines, each vertical read line being coupled to the projection circuit of each memory cell of a respective column of the array of memory cells; and a controller configured to apply signals to the vertical read control lines and to determine a projection of objects in the binary image onto a horizonal axis from the resulting signals on the vertical read lines, and to apply signals to the horizontal read control lines and to determine a projection of objects in the binary image onto a vertical axis from the resulting signals on the horizontal read lines, and to generate an indication of at least one region of interest in the binary image from the projection of objects in the binary image onto the horizonal axis and the projection of objects in the binary image onto the vertical axis.

2. A region proposal network according to claim 1 , wherein the controller is further configured to iteratively conduct a search for objects in the binary image by identifying projections of objects onto the vertical axis and then determining the projection of each identified projection onto the horizontal axis.

3. A region proposal network according to any preceding claim wherein the controller comprises an x-sense module configured to identify projections of objects

25 onto the horizontal axis and a y-sense module configured to identify projections of objects onto the vertical axis.

4. A region proposal network according to claim 3, wherein the controller is configured to apply signals to the horizontal read control lines based on the projections of objects onto the horizontal axis and to apply signal to the vertical read control lines based on the projections of objects onto the vertical axis.

5. A region proposal network according to claim 3 or claim 4 wherein the x-sense module comprises an x-sense edge detector and the y-sense module comprises a y- sense edge detector.

6. A region proposal network according to claim 5, wherein the x-sense module comprises plurality of x-sensing amplifiers provided before the x-sense edge detector and the y-sense module comprises a plurality of y-sensing amplifiers provided before the y-sense edge detector.

7. A region proposal network according to claim 6, wherein each of the x-sense module and the y-sense module comprise a plurality of transistors between the output of consecutive x-sensing amplifiers and between the output of consecutive pairs of y- sensing amplifiers, the transistors configured to fill gaps caused by fragmentation of objects in the binary image.

8. A region proposal network according to any preceding claim, wherein each memory cell of the array of memory cells is configured as a static random-access memory cell.

9. A region proposal network according to claim 8, wherein each cell comprises at least 9 transistors.

10. A region proposal network according to claim 8, wherein each cell is configured as a 9-transistor static random-access memory cell.

26

11. A region proposal network according to any preceding claim, wherein the state storage circuit is configured as a pair of inverters in a cross coupled configuration and the projection circuit is connected to one inverter of the pair of inverters.

12. A region proposal method comprising: applying signals to vertical read control lines of a memory array storing a binary image; determining a projection of objects in the binary image onto a horizonal axis of the binary image from the resulting signals on the vertical read lines of the memory array; applying signals to horizontal read control lines of the memory array; determining a projection of objects in the binary image onto a vertical axis from resulting signals on the horizontal read lines of the memory array; and generating an indication of at least one region of interest in the binary image from the projection of objects in the binary image onto the horizonal axis and the projection of objects in the binary image onto the vertical axis.

13. A method according to claim 12, further comprising iteratively conducting a search for objects in the binary image by identifying projections of objects onto the vertical axis and then determining the projection of each identified projection onto the horizontal axis.

14. A method according to claim 13, wherein iteratively conducting a search for objects in the binary image by identifying projections of objects onto the vertical axis and then determining the projection of each identified projection onto the horizontal axis comprises applying signals to the horizontal read control lines of the memory array based on the projections of objects onto the horizontal axis and applying signals to the vertical read control lines of the memory array based on the projections of objects onto the vertical axis.

15. A method according to any one of claims 12 to 14, wherein determining projections of objects in the binary image onto a horizonal axis of the binary image from the resulting signals on the vertical read lines of the memory array comprises detecting edges in the resulting signals on the vertical read lines of the memory array

27 and determining a projection of objects in the binary image onto a vertical axis from resulting signals on the horizontal read lines of the memory array comprises detecting edges in the resulting signals on the horizontal read lines of the memory array.

16. A method according to claim 15, further comprising filling gaps due to fragmentation of objects in the binary image in the resulting signals on the vertical read lines of the memory array prior to detecting edges in the resulting signals on the vertical read lines of the memory array and / or filling gaps due to fragmentation of objects in the binary image in the resulting signals on the horizonal read lines of the memory array prior to detecting edges in the resulting signals on the horizonal read lines of the memory array.

17. A memory array configured to store a binary image, the memory array comprising a plurality of memory cells, each memory cell comprising a state storage circuit comprising a plurality of transistors configured to store a binary state and a projection circuit comprising a plurality of transistors configured to allow horizontal and vertical read out of binary states of multiple cells along columns and rows of the memory array, the memory array further comprising: a plurality of vertical read control lines, each vertical read control line being coupled to the projection circuit of each memory cell of a respective row of the memory array; a plurality of horizontal read lines, each horizontal read line being coupled to the projection circuit of each memory cell of a respective row of the memory array; a plurality of horizontal read control lines, each horizontal read control line being coupled to the projection circuit of each memory cell of a respective column of the memory array; and a plurality of vertical read lines, each vertical read line being coupled to the projection circuit of each memory cell of a respective column of the memory array.

18. A memory array according to claim 17, wherein each memory cell of the memory array cells is configured as a static random-access memory cell.

19. A memory array according to claim 18, wherein each cell comprises at least 9 transistors.

28

20. A memory array according to claim 18, wherein each cell is configured as a 9- transistor static random-access memory cell.

21. A memory array according to any one of claims 17 to 20, wherein the state storage circuit is configured as a pair of inverters in a cross coupled configuration and the projection circuit is connected to one inverter of the pair of inverters.

29