WO2019227322A1 - 池化装置和池化方法 - Google Patents

池化装置和池化方法 Download PDF

Info

Publication number
WO2019227322A1
WO2019227322A1 PCT/CN2018/088959 CN2018088959W WO2019227322A1 WO 2019227322 A1 WO2019227322 A1 WO 2019227322A1 CN 2018088959 W CN2018088959 W CN 2018088959W WO 2019227322 A1 WO2019227322 A1 WO 2019227322A1
Authority
WO
WIPO (PCT)
Prior art keywords
pooling
temporary
input image
pixels
image
Prior art date
Application number
PCT/CN2018/088959
Other languages
English (en)
French (fr)
Inventor
高明明
谷骞
杨康
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to CN201880011430.XA priority Critical patent/CN110383330A/zh
Priority to PCT/CN2018/088959 priority patent/WO2019227322A1/zh
Publication of WO2019227322A1 publication Critical patent/WO2019227322A1/zh
Priority to US16/952,911 priority patent/US20210073569A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0813Multiuser, multiprocessor or multiprocessing cache systems with a network or matrix configuration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/90Dynamic range modification of images or parts thereof
    • G06T5/92Dynamic range modification of images or parts thereof based on global image properties
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/12Edge-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • G06V10/955Hardware or software architectures specially adapted for image or video understanding using specific electronic processors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0875Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/45Caching of specific data in cache memory
    • G06F2212/454Vector or matrix data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/45Caching of specific data in cache memory
    • G06F2212/455Image or video data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20092Interactive image processing based on input by user
    • G06T2207/20104Interactive definition of region of interest [ROI]

Definitions

  • This application relates to the field of artificial intelligence (AI), and more specifically, to a pooling device and a pooling method.
  • AI artificial intelligence
  • CNN convolutional neural networks
  • CNNs usually include neural network layers such as convolutional layers and pooling layers.
  • the pooling layer can be used to perform pooling operations.
  • the pooling operation may include general pooling and region of interest (ROI) pooling, and the pooling operation includes maximum pooling and average pooling. Different pooling operations and / or pooling operations do not have exactly the same hardware requirements, resulting in complex hardware designs.
  • ROI region of interest
  • the application provides a pooling device and a pooling method, which can simplify the hardware design of the pooling process.
  • a pooling device configured to perform a pooling operation on an input image to generate a pooled output image.
  • the pooling device includes: one or more first processing circuits for calculating a temporary pooling result of the input image in a row direction or a column direction; and one or more second processing circuits for using the input according to the input.
  • the temporary pooling result of the image in the row direction or the column direction generates the output image.
  • a pooling method is provided.
  • the pooling method is used to perform a pooling operation on an input image to generate a pooled output image.
  • the pooling method includes: calculating the input image along a row direction or A temporary pooling result in the column direction; and the output image is generated according to the temporary pooling result in the row direction or the column direction of the input image.
  • This application first performs a pooling operation on the input image along the row direction (or column direction) of the input image, and then generates the final pooling result of the input image (that is, the pixels of the output image) based on the calculated temporary pooling result.
  • the pooling method has universality, which can make the hardware design of the pooling process simple.
  • FIG. 1 is a schematic structural diagram of a pooling device according to an embodiment of the present application.
  • FIG. 2 is a schematic diagram of a method for calculating an input image by a first processing circuit according to an embodiment of the present application.
  • FIG. 3 is a schematic diagram of another calculation manner of the input image by the first processing circuit according to the embodiment of the present application.
  • FIG. 4 is a diagram illustrating an example of a connection relationship between a first processing circuit and an on-chip cache provided in an embodiment of the present application.
  • FIG. 5 is an exemplary diagram of a structure of an on-chip cache provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a neural network processor according to an embodiment of the present application.
  • FIG. 7 is a schematic flowchart of a pooling method according to an embodiment of the present application.
  • the CNN may include one or more of the following neural network layers: a pre-processing layer, a convolutional layer, an activation layer, a pooling layer, and a fully connected layer.
  • the pooling layer is mainly used to perform pooling operations.
  • the pooling layer usually performs a pooling operation on the input feature image in units of a pooling window.
  • the width of the pooling window can be used to identify the number of columns of pixels contained in a pooling window. Accordingly, the height of the pooling window can be used to identify the number of rows of pixels contained in a pooling window.
  • the width and height of the pooling window can be the same or different. The specific values can be selected according to actual needs, which is not limited in the embodiments of the present application. Pooling windows are also sometimes referred to as sliding windows or pooling cores for pooling operations.
  • Average pooling can be used to calculate the average of the pixels contained in the pooling window; maximum pooling can be used to calculate the maximum of the pixels contained in the pooling window.
  • maximum pooling can be used to calculate the maximum of the pixels contained in the pooling window.
  • the pixel values of the pixels in the pooling window can be accumulated first, and then the average value of these pixels can be calculated.
  • the maximum pooling as an example, the pixel values of the pixels in the pooling window can be compared pair by pair, and the final comparison result is the maximum value of the pixels in the pooling window.
  • the pooling operation needs to process each pixel in the pooling window in sequence. After the pixels in the pooling window are processed, the final pooling result can be generated. Until the final pooling result is obtained, the pooling operation generally produces a temporary pooling result.
  • the temporary pooling result in the row direction refers to the temporary pooling result obtained by processing the row pixels of the input image.
  • the number of temporary pooling results corresponding to one row of pixels of the input image is equal to the number of columns of the output image that need to be obtained after the input image passes through the pooling layer.
  • the temporary pooling result in the column direction refers to the temporary pooling result obtained by processing the column pixels of the input image.
  • the number of temporary pooling results corresponding to one column of pixels of the input image is equal to the number of rows of the output image that need to be obtained after the input image passes the pooling layer.
  • the temporary pooling result in the row direction of the input image can refer to the accumulated pixel value of pixels belonging to a pooling window in the row pixels of the input image
  • the temporary pooling result in the column direction of the input image can refer to The cumulative pixel value of the pixels in the column pixels of the input image that belong to a pooling window
  • the temporary pooling result in the row direction of the input image can refer to a pooling in the row pixels of the input image
  • the temporary pooling result in the column direction of the input image may refer to the maximum pixel value of the pixels belonging to a pooled window among the column pixels of the input image.
  • the pooling process corresponding to the pooling layer can be divided into general pooling and ROI pooling.
  • general pooling it usually performs a pooling operation on the entire input feature image.
  • ROI pooling it mainly pools one or more image blocks in the entire input feature image, and the one or more image blocks may be referred to as ROIs.
  • ROI pooling it is usually necessary to analyze the position of the ROI in the input feature image (such as the row and column coordinates of the ROI in the input feature image), and extract the ROI from the input feature image according to the analyzed position of the ROI The image data in is used as the input image to be pooled.
  • ROI pooling Different ROIs are located at different positions of the feature image, and the length and / or width of different ROIs are usually changed. Therefore, for ROI pooling, the size of the images targeted by them is usually changed, and the hardware design is difficult. Therefore, in the traditional technology, ROI pooling is usually implemented by software.
  • the embodiment of the present application provides a universal pooling device.
  • the pooling device can be used to realize general pooling and ROI pooling.
  • the pooling operation in CNN is taken as an example for illustration, but the application scenarios of the pooling device provided in the embodiments of the present application are not limited to this, and can be applied to any other need to perform the pooling operation. occasion.
  • the pooling device provided in the embodiment of the present application is described in detail below with reference to FIG. 1.
  • the pooling device 10 may be configured to perform a pooling operation on an input image to generate a pooled output image.
  • the pooling device 10 may be a hardware circuit (or a chip), for example, a field programmable gate array (FPGA), or an application-specific integrated circuit (ASIC).
  • FPGA field programmable gate array
  • ASIC application-specific integrated circuit
  • the input image may be part or all of the feature image input by the convolution layer.
  • the input image may be a part or all of a certain ROI of the feature image input by the convolution layer. For example, when the size of an image in a ROI is large, the image in the ROI may be further divided into many small images as the input image.
  • the pooling device 10 may include one or more first processing circuits 12 and one or more second processing circuits 14.
  • the one or more first processing circuits 12 may be used to calculate a temporary pooling result of the input image in a row direction or a column direction.
  • the first processing circuit 12 may also be referred to as a line processing circuit.
  • the first processing circuit 12 may also be referred to as a column processing circuit.
  • the one or more second processing circuits 14 may be configured to generate an output image according to a temporary pooling result of the input image in the row direction or the column direction.
  • the one or more second processing circuits 14 may be configured to process the temporary pooling result output by the first processing circuit 12 in a direction perpendicular to the processing direction of the first processing circuit 12 to obtain an output image.
  • the traditional pooling process usually requires calculation by the pooling window, that is, the final pooling result of the current pooling window is calculated before the calculation of the next pooling window.
  • the embodiment of the present application breaks the above calculation method of the traditional pooling process. First, the input image is pooled along the row direction (or column direction) of the input image, and then the final calculation of the input image is generated based on the calculated temporary pooling result. The result of the pooling (that is, the pixels of the output image).
  • This pooling method is universal and can make the hardware design of the pooling process simple.
  • the first processing circuit 12 and the second processing circuit 14 may be independent hardware circuits, or they may share the same circuit. Alternatively, the second processing circuit 14 may multiplex the first processing circuit 12. The first processing circuit 12 and the second processing circuit 14 sharing the same circuit can simplify the structure of the pooling device 10 and reduce the cost of the pooling device 10.
  • the first processing circuit 12 can process an operation corresponding to one pixel (that is, a single-point operation) per clock cycle, and can also process an operation corresponding to multiple pixels.
  • the type of operation corresponding to a pixel is related to factors such as the type of the pooling operation and the position of the pixel in the image, which is not specifically limited in this embodiment of the present application.
  • a pixel corresponding operation may include a comparison of pixel values between the pixel and an adjacent pixel, an accumulation of pixel values of the pixel and an adjacent pixel, a boundary division operation when the pixel is located at the boundary of an image block, and a temporary corresponding to the pixel. Storage of pooling results, etc.
  • the first processing circuit 12 processes operations corresponding to multiple pixels every clock cycle, multiple operation instructions corresponding to the multiple pixels need to be input to the first processing circuit 12, which is more complicated to implement. In contrast, if the first processing circuit 12 is controlled to perform a single-point operation every clock cycle, the logic control of the pooling device 10 becomes simple.
  • the number of the first processing circuits 12 included in the pooling device 10 is not specifically limited.
  • the pooling device 10 may include only one first processing circuit 12.
  • the first processing circuit 12 may perform row-by-row or column-by-column processing on the input image.
  • the pooling device 10 may include a plurality of first processing circuits 12.
  • the plurality of first processing circuits 12 can calculate the temporary pooling results corresponding to multiple rows of pixels or multiple columns of pixels of the input image in parallel. Parallel computing of multiple rows of pixels or multiple columns of pixels can improve the computing efficiency of the pooling device.
  • the number of the first processing circuits 12 included in the pooling device 10 can be matched with the number of clock cycles required for one first processing circuit 12 to process the target pixel.
  • the target pixel is a pixel to be processed received by a first processing circuit 12 within one clock cycle.
  • the number of the first processing circuits 12 included in the pooling device 10 may be set to N. Assume that the pooling device 10 transmits the target pixel to the 1st to Nth first processing circuits 12 in the kth to k + Nth clock cycles, respectively. Since one first processing circuit 12 needs N clock cycles to process the target pixel, then When the k + N + 1th clock cycle comes, the first first processing circuit 12 that first receives the target pixel just finishes processing the previously received target pixel, and then the k + N + 1 clock Receive new target pixels periodically.
  • configuring the number of the first processing circuits 12 included in the pooling device 10 to match the number of clock cycles required for one first processing circuit 12 to process the target pixel can make the processing process of each first processing circuit Achieve tight flow and improve the parallelism and computing efficiency of pooling devices.
  • the following uses FIG. 2 as an example to describe the first processing circuit 12 as a line processing circuit and inputting pixels of the input image to the pooling device along the line direction as an example.
  • the clock frequency of the system Assuming that the system to which the pooling device 10 according to the embodiment of the present application belongs has a main frequency of 1 GHz, a bus bit width of 128 bits, and each pixel contains 8-bit pixel data, the system can report to the pooling device 10 in one clock cycle.
  • a line processing circuit inputs 16 pixels (corresponding to the above-mentioned target pixels) consecutively in the row direction.
  • the number of row processing circuits in the pooling device 10 may be set to sixteen.
  • FIG. 2 illustrates that the pixels of the input image are input to the pooling device along the row direction as an example, but the embodiment of the present application is not limited thereto, and the pixels of the input image may also be input to the pooling device along the column direction.
  • the 16 pixels input in one clock cycle belong to the 16 lines of the input image, so as shown in FIG. 3, the 16 pixels can be input to the 16 line processing circuits in each clock cycle. Make each line process get 8-bit pixel data.
  • the temporary pooling result calculated by the first processing circuit 12 may be stored in an on-chip cache, or may be stored in an external memory through a system bus, which is not limited in this embodiment of the present application.
  • An optional storage method of the temporary pooling result is given below in conjunction with FIG. 4.
  • the pooling device 10 may further include a plurality of on-chip caches 16.
  • the plurality of on-chip caches 16 may correspond to the plurality of first processing circuits 12 in a one-to-one manner, and each of the on-chip caches 16 may be specifically used to store temporary pooling results calculated by the corresponding first processing circuit 12.
  • a dedicated on-chip cache 16 is provided for each first processing circuit 12, so that the calculation process of each temporary pooling result of each first row processing circuit 12 can be completed on the chip as much as possible, reducing the pooling process.
  • the data interaction between the pooling device and the external storage can improve the computing efficiency of the pooling device.
  • the capacity of the on-chip cache 16 may be configured so that the capacity of the on-chip cache 16 can accommodate temporary pooling results corresponding to one row or a column of pixels of the input image.
  • a storage address 161 of the on-chip cache 16 may be used to store a temporary pooling result among the temporary pooling results corresponding to one row or a column of pixels of the input image.
  • the temporary pooling results stored at the same storage address of the multiple on-chip caches 16 may correspond to the same column direction or the same row direction of the input image.
  • the temporary pooling results stored by the same storage address of the multiple on-chip caches 16 correspond to the same column direction of the input image; when the first processing circuit 12 When calculating the temporary pooling results of the input image along the row direction, the temporary pooling results stored by the same storage address of the multiple on-chip caches 16 correspond to the same row direction of the input image.
  • the input data of the second processing circuit 14 may be formed by splicing the temporary pooling results stored in the same storage address of multiple on-chip caches 16.
  • the foregoing configuration manner of the storage address of the on-chip cache 16 enables the second processing circuit 14 to obtain input data through a simple data splicing operation, without performing a complicated addressing operation, thereby simplifying the implementation of the pooling device.
  • the depth of on-chip cache 16 is 64. If the number of temporary pooling results corresponding to one row or column of pixels in the input image is more than 64, one processing method is to increase the depth of on-chip cache 16 so that it can accommodate one row or column of pixels. The corresponding temporary pooling results (such as increasing the depth of the on-chip cache to 512) to meet most applications; another processing method is to split the input image to obtain multiple input images with a smaller size, and then use The pooling device performs a pooling operation on the plurality of input images, respectively.
  • the second processing circuit 14 generates an output image based on the temporary pooling result output by the first processing circuit 12.
  • the second processing circuit 14 may wait for the first processing circuit 12 to process all rows or columns of the input image, and then generate an output image based on the temporary pooling result output by the first processing circuit 12.
  • the first processing circuit 12 may control the second processing circuit 14 to start processing each time a pixel of a row or a column of the input image is processed, that is, the first processing circuit 12 and the second processing circuit The processing process of 14 is performed alternately.
  • the advantage of this processing method is that there is no need to store all temporary pooling results of the input image at the same time, and the requirement for the buffer capacity will be lower.
  • the pooling device 10 may include N first processing circuits 12 (N is a positive integer greater than 1).
  • the pooling device 10 may further include a control circuit.
  • the control circuit can be used to perform the following operations: if the height or width of the pooling window is less than or equal to N, whenever the N first processing circuits store the temporary pooling results corresponding to N rows or N columns of pixels into N on-chip buffers , Controlling the second processing circuit 14 to generate partial pixels of the output image according to the temporary pooling results stored in the N on-chip caches.
  • control circuit may be further configured to store at least a part of the temporary pooling results stored in the N on-chip caches 16 into other on-chip caches or external memories if the height or width of the pooling window is greater than N And control the second processing circuit 14 to generate some or all pixels of the output image according to the temporary pooling result corresponding to the pixels in M rows or M columns, where M is a positive integer greater than or equal to the height or width of the pooling window, and M rows
  • the temporary pooling results corresponding to the pixels in column M or M include the temporary pooling results stored in other on-chip caches or external memories.
  • the pooling device 10 includes 16 line processing circuits as an example.
  • the pooling device 10 can calculate the line processing circuit according to the size of the pooling window and the temporary pooling of the output of the line processing circuit. The way results are stored is controlled.
  • the column processing circuit can reuse the row processing circuit, that is, the same circuit as the row processing circuit) performs serial processing on the temporary pooling results corresponding to the 16 rows of pixels, To obtain the final pooling result corresponding to the 16 rows of pixels.
  • the temporary pooling results output by the pending processing circuit can complete a complete
  • the data is read from other on-chip caches or external storage, and these data are processed using a column processing unit.
  • pooling when pooling ⁇ 16, it can also be processed in a similar manner to the processing mode of pooling> 16.
  • the advantage of this is that no matter what the size of the pooling window is, the processing method of the pooling device 10 remains the same. Design a universal circuit.
  • the input image may be an image in the ROI
  • the pooling device may be used to perform ROI pooling.
  • the analysis of the ROI can be configured to the pooling device 10 by software, or the pooling device 10 can perform self-analysis.
  • the pooling device 10 may further include an analysis circuit 19.
  • the analysis circuit 19 may be used to receive the feature image and ROI parameters output by the convolution layer; determine the position of the ROI in the feature image according to the ROI parameters; and use the image in the ROI as an input image to transmit to one or more first processing circuits 16 .
  • the analysis method of the position of the ROI in the feature image refer to the conventional technology, which will not be described in detail here.
  • the neural network processor 60 may include a convolution device 62 and a pooling device 10.
  • the pooling device 10 may be used to perform a pooling operation on the feature images output by the convolution device 62.
  • FIG. 7 is a schematic flowchart of a pooling method according to an embodiment of the present application.
  • the pooling method shown in FIG. 7 may be used to perform a pooling operation on an input image to generate a pooled output image.
  • the method in FIG. 7 may include steps 710 and 720.
  • step 710 a temporary pooling result of the input image in a row direction or a column direction is calculated.
  • step 720 the output image is generated according to a temporary pooling result of the input image in a row direction or a column direction.
  • step 710 may include: using a plurality of first processing circuits to calculate a temporary pooling result of multiple rows or multiple columns of the input image in parallel.
  • the number of the first processing circuits matches the number of clock cycles required by one first processing circuit to process a target pixel, and the target pixel is one of the first processing circuits within one clock cycle Received pixels for processing.
  • the method of FIG. 7 may further include: storing temporary pooling results calculated by a plurality of the first processing circuits into a plurality of on-chip caches corresponding to the plurality of the first processing circuits, respectively.
  • the capacity of the on-chip cache can accommodate temporary pooling results corresponding to one row or column of pixels of the input image.
  • a storage address of the on-chip cache is used to store a temporary pooling result in a temporary pooling result corresponding to a row or a column of pixels of the input image.
  • a plurality of temporary pooling results stored at the same storage address of the on-chip caches correspond to the same column direction or the same row direction of the input image.
  • the method of FIG. 7 may further include: splicing the temporary pooling results stored by the same storage address of the plurality of on-chip caches.
  • step 720 may include: if the height or width of the pooling window is less than or equal to N, each of the N first processing circuits stores temporary pooling results corresponding to N rows or N columns of pixels into N After the on-chip cache, a part of the pixels of the output image are generated according to N temporary pooling results stored in the on-chip cache, where N represents the number of the first processing circuits, and N is a positive integer greater than 1.
  • the method of FIG. 7 may further include: if the height or width of the pooling window is greater than N, storing at least a part of the temporary pooling results stored in the N on-chip caches in addition to a plurality of the On-chip cache or external memory other than the on-chip cache; step 720 may include: generating some or all pixels of the output image according to the temporary pooling result corresponding to the pixels of M rows or M columns, where M is greater than or equal to the A positive integer of the height or width of the pooling window, and the temporary pooling results corresponding to the pixels in M rows or M columns include the temporary pooling results stored in the other on-chip caches or external memories.
  • the output image is calculated based on one or more second processing circuits, and at least one of the first processing circuits and at least one of the second processing circuits share the same circuit.
  • the first processing circuit processes an operation corresponding to one pixel every clock cycle.
  • the pooling device is a field programmable gate array or a special-purpose integrated circuit.
  • the input image is an image in a region of interest (ROI).
  • ROI region of interest
  • the method of FIG. 7 may further include: receiving a feature image and a ROI parameter output by the convolution layer; determining a position of the ROI in the feature image according to the ROI parameter; and using the image in the ROI as the target image.
  • the input image is described.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be from a website site, a computer, a server, or a data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, or the like that includes one or more available medium integration.
  • the available medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a digital video disc (DVD)), or a semiconductor medium (for example, a solid state disk (SSD)).
  • the disclosed systems, devices, and methods may be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the unit is only a logical function division.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, which may be electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each of the units may exist separately physically, or two or more units may be integrated into one unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Neurology (AREA)
  • Image Processing (AREA)

Abstract

提供一种池化装置和方法。该池化装置包括第一处理电路和第二处理电路。第一处理电路用于计算输入图像沿行方向或列方向的临时池化结果;第二处理电路用于根据输入图像沿行方向或列方向的临时池化结果,生成输出图像。先沿输入图像的某个方向对输入图像进行池化运算,再根据计算出的临时池化结果生成计算输入图像的最终池化结果,这种池化方式具有通用性,可以使得池化过程的硬件设计变得简单。

Description

池化装置和池化方法
版权申明
本专利文件披露的内容包含受版权保护的材料。该版权为版权所有人所有。版权所有人不反对任何人复制专利与商标局的官方记录和档案中所存在的该专利文件或者该专利披露。
技术领域
本申请涉及人工智能(artificial intelligence,AI)领域,并且更为具体地,涉及一种池化装置和池化方法。
背景技术
随着AI的发展,卷积神经网络(convolutional neural networks,CNN)在图像分类、图像分割取得了不错的成绩。
目前,各大厂商开始对CNN的运算过程进行硬件化,希望可以以芯片的形式实现CNN的片上运算。
CNN通常包含卷积层、池化(pooling)层等神经网络层,池化层可用于执行池化运算。池化运算可以包括一般池化以及感兴趣区域(region of interest,ROI)池化,池化操作包括最大池化和平均池化。不同池化运算和/或池化操作对硬件的要求并不完全相同,导致硬件的设计复杂。
发明内容
本申请提供一种池化装置和池化方法,能够简化池化过程的硬件设计。
第一方面,提供一种池化装置,所述池化装置用于对输入图像进行池化操作以生成池化后的输出图像。所述池化装置包括:一个或多个第一处理电路,用于计算所述输入图像沿行方向或列方向的临时池化结果;一个或多个第二处理电路,用于根据所述输入图像沿行方向或列方向的临时池化结果,生成所述输出图像。
第二方面,提供一种池化方法,所述池化方法用于对输入图像进行池化操作以生成池化后的输出图像,所述池化方法包括:计算所述输入图像沿行方向或列方向的临时池化结果;根据所述输入图像沿行方向或列方向的临时池化结果,生成所述输出图像。
本申请先沿输入图像的行方向(或列方向)对输入图像进行池化运算,再根据计算出的临时池化结果生成计算输入图像的最终池化结果(即输出图像的像素),这种池化方式具有通用性,可以使得池化过程的硬件设计变得简单。
附图说明
图1是本申请实施例提供的池化装置的示意性结构图。
图2是本申请实施例提供的第一处理电路对输入图像的一种计算方式的示意图。
图3是本申请实施例提供的第一处理电路对输入图像的另一计算方式的示意图。
图4是本申请实施例提供的第一处理电路和片上缓存的连接关系示例图。
图5是本申请实施例提供的片上缓存的结构的示例图。
图6是本申请实施例提供的神经网络处理器的示意性结构图。
图7是本申请实施例提供的池化方法的示意性流程图。
具体实施方式
CNN可以包括以下神经网络层中的一种或多种:预处理层,卷积层,激活层,池化层,以及全连接层。
池化层主要用于执行池化操作。池化层通常会以池化窗口为单位对输入的特征图像进行池化操作。池化窗口的宽度可用于标识一个池化窗口所包含的像素的列数,相应地,池化窗口的高度可用于标识一个池化窗口所包含的像素的行数。池化窗口的宽度和高度可以相同,也可以不同,其具体数值可以根据实际需要选择,本申请实施例对此并不限定。池化窗口有时也可称为池化操作的滑动窗口或池化核。
池化操作的种类可以有多种,如平均池化(average pooling)和最大值池化(max pooling)。平均池化可用于计算池化窗口所包含的像素的平均值;最大值池化可用于计算池化窗口所包含的像素的最大值。以平均池化为例,可以先将池化窗口中的像素的像素值累加,然后再计算这些像素的平均值。以最大值池化为例,可以将池化窗口中的像素的像素值两两进行比较,最终 的比较结果即为池化窗口中的像素的最大值。
池化操作需要对池化窗口中的各像素依次进行处理,当池化窗口中的各像素均处理完毕之后即可产生最终的池化结果。在得到最终的池化结果之前,池化操作一般会产生临时池化结果。行方向的临时池化结果指的是对输入图像的行像素处理得到的临时池化结果。输入图像的一行像素对应的临时池化结果的数量与该输入图像经过池化层后需要得到的输出图像的列数相等。同理,列方向的临时池化结果指的是对输入图像的列像素处理得到的临时池化结果。输入图像的一列像素对应的临时池化结果的数量与该输入图像经过池化层后需要得到的输出图像的行数相等。以平均池化为例,输入图像的行方向的临时池化结果可以指输入图像的行像素中的属于一个池化窗口的像素的像素累加值,输入图像的列方向的临时池化结果可以指输入图像的列像素中的属于一个池化窗口的像素的像素累加值;以最大值池化为例,输入图像的行方向的临时池化结果可以指输入图像的行像素中的属于一个池化窗口的像素的像素最大值,输入图像的列方向的临时池化结果可以指输入图像的列像素中的属于一个池化窗口的像素的像素最大值。
按照池化层的池化对象的不同,池化层对应的池化过程可以分为一般池化和ROI池化。对于一般池化而言,其通常对输入的整个特征图像进行池化操作。对于ROI池化而言,其主要对输入的整个特征图像中的一个或多个图像块(block)进行池化,该一个或多个图像块可以称为ROIs。在进行ROI池化之前,通常需要先对ROI在输入的特征图像中的位置(如ROI在输入特征图像中的行列坐标)进行解析,并根据解析出的ROI的位置从输入特征图像中取出ROI中的图像数据,作为待池化的输入图像。不同ROI位于特征图像的不同位置,且不同ROI的长度和/或宽度通常也是变化的,因此,对于ROI池化而言,其针对的图像的尺寸通常是变化的,硬件设计难度较大。因此,传统技术中,ROI池化通常采用软件的方式实现。
本申请实施例提供一种通用的池化装置。该池化装置既可用于实现一般池化,也可用于实现ROI池化。
需要说明的是,上文是以CNN中的池化操作为例进行举例说明的,但本申请实施例提供的池化装置的应用场合不限于此,可应用于需要执行池化操作的任意其他场合。下面结合图1,对本申请实施例提供的池化装置进行详细说明。
如图1所示,本申请实施例提供的池化装置10可用于对输入图像进行池化操作以生成池化后的输出图像。池化装置10可以为硬件电路(或芯片),例如可以是现场可编程门阵列(field programmable gate array,FPGA),也可以是特定用途集成电路(application specific integrated circuits,ASIC)。以池化装置10用于执行一般池化为例,该输入图像可以是卷积层输入的特征图像的部分或全部图像。以池化装置10用于执行ROI池化为例,该输入图像可以是卷积层输入的特征图像的某个ROI中的部分或全部图像。例如,当某个ROI中的图像的尺寸较大,可以将该ROI中的图像进一步分割成许多小的图像,作为上述输入图像。
池化装置10可以包括一个或多个第一处理电路12以及一个或多个第二处理电路14。
该一个或多个第一处理电路12可用于计算输入图像沿行方向或列方向的临时池化结果。当该一个或多个第一处理电路12用于计算输入图像沿行方向的临时池化结果时,该第一处理电路12也可称为行处理电路。同理,当该一个或多个第一处理电路12用于计算输入图像沿列方向的临时池化结果时,该第一处理电路12也可称为列处理电路。
该一个或多个第二处理电路14可用于根据输入图像沿行方向或列方向的临时池化结果,生成输出图像。
例如,该一个或多个第二处理电路14可用于沿与第一处理电路12的处理方向相垂直的方向对第一处理电路12输出的临时池化结果进行处理,得到输出图像。
传统池化过程通常需要逐池化窗口计算,即先计算出当前池化窗口的最终池化结果,再对下一池化窗口进行计算。本申请实施例打破了传统池化过程的上述计算方式,先沿输入图像的行方向(或列方向)对输入图像进行池化运算,再根据计算出的临时池化结果生成计算输入图像的最终池化结果(即输出图像的像素),这种池化方式具有通用性,可以使得池化过程的硬件设计变得简单。
第一处理电路12和第二处理电路14可以是相互独立的硬件电路,也可以共用同一电路。或者,第二处理电路14可以复用第一处理电路12。第一处理电路12和第二处理电路14共用同一电路可以简化池化装置10的结构,降低池化装置10的成本。
第一处理电路12每个时钟周期可以处理一个像素对应的运算(即单点运算),也可以处理多个像素对应的运算。像素对应的运算的类型与池化操作的类型、像素在图像中的位置等因素有关,本申请实施例对此不做具体限定。例如,一个像素对应的运算可以包括该像素与相邻像素之间的像素值比较、该像素与相邻像素的像素值的累加、该像素位于图像块边界时的边界划分操作,像素对应的临时池化结果的存储等。
如果第一处理电路12每个时钟周期处理多个像素对应的运算,则需要向第一处理电路12输入该多个像素对应的多条运算指令,这样实现起来比较复杂。相比而言,如果控制第一处理电路12每个时钟周期进行单点运算,则会使得池化装置10的逻辑控制变得简单。
本申请实施例对池化装置10包含的第一处理电路12的数量不做具体限定。可选地,在一些实施例中,池化装置10可以仅包括一个第一处理电路12。在这种情况下,该第一处理电路12可以对输入图像进行逐行或逐列处理。
可选地,在另一些实施例中,池化装置10可以包括多个第一处理电路12。该多个第一处理电路12可以并行地计算输入图像的多行像素或多列像素对应的临时池化结果,多行像素或多列像素的并行计算可以提高池化装置的计算效率。
进一步地,可以将池化装置10所包括的第一处理电路12的数量与一个第一处理电路12处理目标像素所需的时钟周期的数量相匹配。其中,目标像素为一个第一处理电路12在一个时钟周期内接收到的待处理像素。
假设一个处理电路12处理目标像素需要N个时钟周期,则可以将池化装置10包括的第一处理电路12的数量设置为N。假设池化装置10在第k至第k+N个时钟周期分别向第1至第N个第一处理电路12传输目标像素,由于一个第一处理电路12处理目标像素需要N个时钟周期,则当第k+N+1个时钟周期来临时,最先接收到目标像素的第1个第一处理电路12刚好将之前接收到的目标像素处理完毕,进而可以在第k+N+1个时钟周期接收新的目标像素。因此,将池化装置10所包括的第一处理电路12的数量配置成与一个第一处理电路12处理目标像素所需的时钟周期的数量相匹配,可以使得每个第一处理电路的处理过程实现紧密流水,提高池化装置的并行度和计算效率。
为了便于理解,下面结合图2,以第一处理电路12为行处理电路,输入图像的像素沿行方向输入至池化装置为例进行更为详细的举例说明。首先,在硬件设计时,通常会在系统的时钟频率、总线位宽以及系统的成本等因素之间进行权衡。假设本申请实施例提供的池化装置10所属的系统的主频为1GHz,总线位宽为128比特,每个像素包含8比特的像素数据,则系统在一个时钟周期可以向池化装置10的一个行处理电路输入沿行方向连续的16个像素(对应于上述目标像素)。假设一个行处理电路一个时钟周期针对一个像素进行单点运算,则一个行处理电路处理完16个像素需要16个时钟周期。在这种情况下,可以将池化装置10中的行处理电路的数量设置为16。
经过上述设置,假设系统满带宽运行,则对于每个行处理电路而言,经过16个周期可以处理完128比特的像素数据,等128比特的像素数据处理完成之后的下一时钟周期恰好有新的16个像素被输入至该行处理电路,从而可以实现每个行处理电路的紧密流水,提高了系统的并行度。
图2是以输入图像的像素沿行方向输入至池化装置为例进行说明的,但本申请实施例不限于此,输入图像的像素也可以沿列方向输入至池化装置。在这种情况下,一个时钟周期输入的16个像素分别属于输入图像的16行,因此,如图3所示,可以在每个时钟周期将该16个像素分别输入至16个行处理电路,使得每个行处理得到8比特的像素数据。
第一处理电路12计算得到的临时池化结果可以存入片上缓存,也可以通过系统总线存入外部的存储器,本申请实施例对此并不限定。下面结合图4,给出临时池化结果的一种可选的存储方式。
如图4所示,池化装置10还可包括多个片上缓存16。该多个片上缓存16可以与多个第一处理电路12一一对应,其中每个片上缓存16可专门用于存储相应第一处理电路12计算得到的临时池化结果。
本申请实施例为各第一处理电路12设置了专门的片上缓存16,可以使得每个第一行处理电路12的每个临时池化结果的计算过程尽可能在片上完成,降低池化过程中池化装置与外部存储器之间的数据交互,这样可以提高池化装置的计算效率。
可选地,可以对片上缓存16的容量进行配置,使得片上缓存16的容量能够容纳输入图像的一行或一列像素对应的临时池化结果。
可选地,如图5所示,片上缓存16的一个存储地址161可用于存储输 入图像的一行或一列像素对应的临时池化结果中的一个临时池化结果。多个片上缓存16的同一存储地址存储的临时池化结果可对应输入图像的相同列方向或相同行方向。具体地,当第一处理电路12计算输入图像沿行方向的临时池化结果时,多个片上缓存16的同一存储地址存储的临时池化结果对应输入图像的相同列方向;当第一处理电路12计算输入图像沿行方向的临时池化结果时,多个片上缓存16的同一存储地址存储的临时池化结果对应输入图像的相同行方向。在本实施例中,第二处理电路14的输入数据可以由多个片上缓存16的同一存储地址存储的临时池化结果拼接而成。
片上缓存16的存储地址的上述配置方式使得第二处理电路14通过简单的数据拼接操作即可获得输入数据,无需进行复杂的寻址操作,从而简化了池化装置的实现。
假设片上缓存16的深度为64,如果输入图像的一行或一列像素对应的临时池化结果的数量多于64,一种处理方式是增大片上缓存16的深度,使其能够容纳一行或一列像素对应的临时池化结果(如将片上缓存的深度增加至512),以满足绝大多数应用;另一种处理方式是将输入图像进行拆分,得到尺寸较小的多个输入图像,然后利用池化装置对该多个输入图像分别进行池化运算。
第二处理电路14基于第一处理电路12输出的临时池化结果生成输出图像。作为一种可能的实现方式,第二处理电路14可以等第一处理电路12将输入图像的所有行或列处理完毕之后,再基于第一处理电路12输出的临时池化结果生成输出图像。作为另一种可能的实现方式,第一处理电路12每处理完输入图像的部分行或部分列的像素,即可控制第二处理电路14开始处理,即第一处理电路12与第二处理电路14的处理过程交替进行,这种处理方式的优点在于无需同时存储输入图像的所有临时池化结果,对缓存容量的要求会低一些。
可选地,池化装置10可以包括N个第一处理电路12(N为大于1的正整数)。池化装置10还可包括控制电路。控制电路可用于执行如下操作:如果池化窗口的高度或宽度小于或等于N,则每当N个第一处理电路将N行或N列像素对应的临时池化结果存入N个片上缓存之后,控制第二处理电路14可以根据N个片上缓存存储的临时池化结果生成输出图像的部分像素。
可选地,控制电路还可用于如果池化窗口的高度或宽度大于N,将N个 片上缓存16存储的至少部分临时池化结果存入除多个片上缓存之外的其他片上缓存或外部存储器,并控制第二处理电路14根据M行或M列像素对应的临时池化结果,生成输出图像的部分或全部像素,其中M为大于或等于池化窗口的高度或宽度的正整数,M行或M列像素对应的临时池化结果包括其他片上缓存或外部存储器存储的临时池化结果。
以第一处理电路为行处理电路,池化装置10包括16个行处理电路为例,池化装置10可以根据池化窗口的尺寸对行处理电路的计算方式以及行处理电路输出的临时池化结果的存储方式进行控制。
以pooling≤16(pooling≤16表示池化窗口的宽度和高度小于或等于16,如pooling=2或pooling=16)为例,每当16个行处理电路处理完输入图像的16行像素,可以控制列处理电路(对应于上文的第二处理电路,列处理电路可以复用行处理电路,即与行处理电路共用同一电路)对该16行像素对应的临时池化结果进行串行处理,以获取该16行像素对应的最终池化结果。
以pooling>16(如pooling=32)为例,由于16行像素对应的临时池化结果不能完成完整的池化操作,则可以先将片上缓存中缓存的数据拼接,并将拼接后的输入存储到其他片上缓存(如片上的更大的临时缓存)或外部存储器中(如片外的双倍速率(double data rate,DDR)中),待行处理电路输出的临时池化结果能够完成完整的池化操作之后,再从其他片上缓存或外部存储器中读取数据,并采用列处理单元对这些数据进行处理。
当然,当pooling≤16时,也可以采用与pooling>16的处理方式类似的方式进行处理,这样做的优点在于无论池化窗口的尺寸是多少,池化装置10的处理方式保持一致,仅需要设计一套通用电路即可。
上文指出,输入图像可以是ROI中的图像,池化装置可用于执行ROI池化。ROI的解析可以通过软件配置给池化装置10,也可以由池化装置10进行自解析。
例如,池化装置10还可包括解析电路19。解析电路19可用于接收卷积层输出的特征图像和ROI参数;根据ROI参数确定ROI在特征图像中的位置;并将ROI中的图像作为输入图像,传输至一个或多个第一处理电路16。ROI在特征图像中的位置的解析方式可以参见传统技术,此处不再详述。
本申请实施例还提供一种神经网络处理器。如图6所示,该神经网络处理器60可以包括卷积装置62和池化装置10。池化装置10可用于对卷积装 置62输出的特征图像进行池化操作。
上文结合图1至图6,详细描述了本申请的装置实施例,下面结合图7,详细描述本申请的方法实施例。应理解,方法实施例的描述与装置实施例的描述相互对应,因此,未详细描述的部分可以参见前面装置实施例。
图7是本申请实施例提供的池化方法的示意性流程图。图7所示的池化方法可用于对输入图像进行池化操作以生成池化后的输出图像,图7的方法可包括步骤710和步骤720。
在步骤710中,计算所述输入图像沿行方向或列方向的临时池化结果。
在步骤720中,根据所述输入图像沿行方向或列方向的临时池化结果,生成所述输出图像。
可选地,步骤710可包括:利用多个第一处理电路并行地计算所述输入图像的多行或多列像素的临时池化结果。
可选地,所述第一处理电路的数量与一个所述第一处理电路处理目标像素所需的时钟周期的数量相匹配,所述目标像素为一个所述第一处理电路在一个时钟周期内接收到的待处理的像素。
可选地,图7的方法还可包括:将多个所述第一处理电路计算得到的临时池化结果分别存入与多个所述第一处理电路一一对应的多个片上缓存。
可选地,所述片上缓存的容量能够容纳所述输入图像的一行或一列像素对应的临时池化结果。
可选地,所述片上缓存的一个存储地址用于存储所述输入图像的一行或一列像素对应的临时池化结果中的一个临时池化结果。多个所述片上缓存的同一存储地址存储的临时池化结果对应所述输入图像的相同列方向或相同行方向。在步骤720之前,图7的方法还可包括:对多个所述片上缓存的同一存储地址存储的临时池化结果进行拼接。
可选地,步骤720可包括:如果池化窗口的高度或宽度小于或等于N,则每当N个所述第一处理电路将N行或N列像素对应的临时池化结果存入N个所述片上缓存之后,根据N个所述片上缓存存储的临时池化结果生成所述输出图像的部分像素,其中N表示所述第一处理电路的数量,N为大于1的正整数。
可选地,在步骤720之前,图7的方法还可包括:如果池化窗口的高度或宽度大于N,将N个所述片上缓存存储的至少部分临时池化结果存入除多 个所述片上缓存之外的其他片上缓存或外部存储器;步骤720可包括:根据M行或M列像素对应的临时池化结果,生成所述输出图像的部分或全部像素,其中M为大于或等于所述池化窗口的高度或宽度的正整数,M行或M列所述像素对应的临时池化结果包括所述其他片上缓存或外部存储器存储的临时池化结果。
可选地,所述输出图像是基于一个或多个第二处理电路计算得到的,且至少一个所述第一处理电路和至少一个所述第二处理电路共同同一电路。
可选地,所述第一处理电路每个时钟周期处理一个像素对应的运算。
可选地,所述池化装置为现场可编程门阵列或特定用途集成电路。
可选地,所述输入图像为感兴趣区域ROI中的图像。
可选地,图7的方法还可包括:接收卷积层输出的特征图像和ROI参数;根据所述ROI参数,确定ROI在所述特征图像中的位置;将所述ROI中的图像作为所述输入图像。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其他任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如数字视频光盘(digital video disc,DVD))、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
需要说明的是,在不冲突的前提下,本申请描述的各个实施例和/或各个实施例中的技术特征可以任意的相互组合,组合之后得到的技术方案也应落入本申请的保护范围。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (27)

  1. 一种池化装置,其特征在于,所述池化装置用于对输入图像进行池化操作以生成池化后的输出图像,
    所述池化装置包括:
    一个或多个第一处理电路,用于计算所述输入图像沿行方向或列方向的临时池化结果;
    一个或多个第二处理电路,用于根据所述输入图像沿行方向或列方向的临时池化结果,生成所述输出图像。
  2. 根据权利要求1所述的池化装置,其特征在于,所述池化装置包括多个所述第一处理电路,多个所述第一处理电路用于并行地计算所述输入图像的多行或多列像素的临时池化结果。
  3. 根据权利要求2所述的池化装置,其特征在于,所述池化装置包括的第一处理电路的数量与一个所述第一处理电路处理目标像素所需的时钟周期的数量相匹配,所述目标像素为一个所述第一处理电路在一个时钟周期内接收到的待处理的像素。
  4. 根据权利要求2或3所述的池化装置,其特征在于,所述池化装置还包括:
    多个片上缓存,与多个所述第一处理电路一一对应,其中每个所述片上缓存专门用于存储相应第一处理电路计算得到的临时池化结果。
  5. 根据权利要求4所述的池化装置,其特征在于,所述片上缓存的容量能够容纳所述输入图像的一行或一列像素对应的临时池化结果。
  6. 根据权利要求4或5所述的池化装置,其特征在于,所述片上缓存的一个存储地址用于存储所述输入图像的一行或一列像素对应的临时池化结果中的一个临时池化结果,多个所述片上缓存的同一存储地址存储的临时池化结果对应所述输入图像的相同列方向或相同行方向,所述第二处理电路的输入数据由多个所述片上缓存的同一存储地址存储的临时池化结果拼接而成。
  7. 根据权利要求4-6中任一项所述的池化装置,其特征在于,所述池化装置包括N个所述第一处理电路,N为大于1的正整数,
    所述池化装置还包括:
    控制电路,用于:
    如果池化窗口的高度或宽度小于或等于N,则每当N个所述第一处理电路将N行或N列像素对应的临时池化结果存入N个所述片上缓存之后,控制所述第二处理电路根据N个所述片上缓存存储的临时池化结果生成所述输出图像的部分像素。
  8. 根据权利要求7所述的池化装置,其特征在于,所述控制电路还用于:
    如果池化窗口的高度或宽度大于N,将N个所述片上缓存存储的至少部分临时池化结果存入除多个所述片上缓存之外的其他片上缓存或外部存储器,并控制所述第二处理电路根据M行或M列像素对应的临时池化结果,生成所述输出图像的部分或全部像素,其中M为大于或等于所述池化窗口的高度或宽度的正整数,M行或M列所述像素对应的临时池化结果包括所述其他片上缓存或外部存储器存储的临时池化结果。
  9. 根据权利要求1-8中任一项所述的池化装置,其特征在于,至少一个所述第一处理电路和至少一个所述第二处理电路共用同一电路。
  10. 根据权利要求1-9中任一项所述的池化装置,其特征在于,所述输入图像为感兴趣区域ROI中的图像。
  11. 根据权利要求10所述的池化装置,其特征在于,所述池化装置还包括:
    解析电路,用于接收卷积层输出的特征图像和ROI参数;根据所述ROI参数,确定ROI在所述特征图像中的位置;将所述ROI中的图像作为所述输入图像,传输至一个或多个所述第一处理电路。
  12. 根据权利要求1-11中任一项所述的池化装置,其特征在于,所述第一处理电路每个时钟周期处理一个像素对应的运算。
  13. 根据权利要求1-12中任一项所述的池化装置,其特征在于,所述池化装置为现场可编程门阵列或特定用途集成电路。
  14. 一种神经网络处理器,其特征在于,包括:
    卷积装置;以及
    如权利要求1-13中任一项所述的池化装置,用于对所述卷积装置输出的特征图像进行池化操作。
  15. 一种池化方法,其特征在于,所述池化方法用于对输入图像进行池化操作以生成池化后的输出图像,
    所述池化方法包括:
    计算所述输入图像沿行方向或列方向的临时池化结果;
    根据所述输入图像沿行方向或列方向的临时池化结果,生成所述输出图像。
  16. 根据权利要求15所述的池化方法,其特征在于,所述计算所述输入图像沿行方向或列方向的临时池化结果,包括:
    利用多个第一处理电路并行地计算所述输入图像的多行或多列像素的临时池化结果。
  17. 根据权利要求16所述的池化方法,其特征在于,所述第一处理电路的数量与一个所述第一处理电路处理目标像素所需的时钟周期的数量相匹配,所述目标像素为一个所述第一处理电路在一个时钟周期内接收到的待处理的像素。
  18. 根据权利要求16或17所述的池化方法,其特征在于,所述池化方法还包括:
    将多个所述第一处理电路计算得到的临时池化结果分别存入与多个所述第一处理电路一一对应的多个片上缓存。
  19. 根据权利要求18所述的池化方法,其特征在于,所述片上缓存的容量能够容纳所述输入图像的一行或一列像素对应的临时池化结果。
  20. 根据权利要求18或19所述的池化方法,其特征在于,所述片上缓存的一个存储地址用于存储所述输入图像的一行或一列像素对应的临时池化结果中的一个临时池化结果,多个所述片上缓存的同一存储地址存储的临时池化结果对应所述输入图像的相同列方向或相同行方向;
    在所述根据所述输入图像沿行方向或列方向的临时池化结果生成所述输出图像之前,所述池化方法还包括:
    对多个所述片上缓存的同一存储地址存储的临时池化结果进行拼接。
  21. 根据权利要求18-20中任一项所述的池化方法,其特征在于,所述根据所述输入图像沿行方向或列方向的临时池化结果,生成所述输出图像,包括:
    如果池化窗口的高度或宽度小于或等于N,则每当N个所述第一处理电路将N行或N列像素对应的临时池化结果存入N个所述片上缓存之后,根据N个所述片上缓存存储的临时池化结果生成所述输出图像的部分像素,其 中N表示所述第一处理电路的数量,N为大于1的正整数。
  22. 根据权利要求21所述的池化方法,其特征在于,在所述根据所述输入图像沿行方向或列方向的临时池化结果生成所述输出图像之前,所述池化方法还包括:
    如果池化窗口的高度或宽度大于N,将N个所述片上缓存存储的至少部分临时池化结果存入除多个所述片上缓存之外的其他片上缓存或外部存储器;
    所述根据所述输入图像沿行方向或列方向的临时池化结果,生成所述输出图像,包括:
    根据M行或M列像素对应的临时池化结果,生成所述输出图像的部分或全部像素,其中M为大于或等于所述池化窗口的高度或宽度的正整数,M行或M列所述像素对应的临时池化结果包括所述其他片上缓存或外部存储器存储的临时池化结果。
  23. 根据权利要求16-22中任一项所述的池化方法,其特征在于,所述输出图像是基于一个或多个第二处理电路计算得到的,且至少一个所述第一处理电路和至少一个所述第二处理电路共用同一电路。
  24. 根据权利要求16-23中任一项所述的池化方法,其特征在于,所述第一处理电路每个时钟周期处理一个像素对应的运算。
  25. 根据权利要求15-24中任一项所述的池化方法,其特征在于,所述池化装置为现场可编程门阵列或特定用途集成电路。
  26. 根据权利要求15-25中任一项所述的池化方法,其特征在于,所述输入图像为感兴趣区域ROI中的图像。
  27. 根据权利要求26所述的池化方法,其特征在于,所述池化方法还包括:
    接收卷积层输出的特征图像和ROI参数;
    根据所述ROI参数,确定ROI在所述特征图像中的位置;
    将所述ROI中的图像作为所述输入图像。
PCT/CN2018/088959 2018-05-30 2018-05-30 池化装置和池化方法 WO2019227322A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201880011430.XA CN110383330A (zh) 2018-05-30 2018-05-30 池化装置和池化方法
PCT/CN2018/088959 WO2019227322A1 (zh) 2018-05-30 2018-05-30 池化装置和池化方法
US16/952,911 US20210073569A1 (en) 2018-05-30 2020-11-19 Pooling device and pooling method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/088959 WO2019227322A1 (zh) 2018-05-30 2018-05-30 池化装置和池化方法

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/952,911 Continuation US20210073569A1 (en) 2018-05-30 2020-11-19 Pooling device and pooling method

Publications (1)

Publication Number Publication Date
WO2019227322A1 true WO2019227322A1 (zh) 2019-12-05

Family

ID=68248358

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/088959 WO2019227322A1 (zh) 2018-05-30 2018-05-30 池化装置和池化方法

Country Status (3)

Country Link
US (1) US20210073569A1 (zh)
CN (1) CN110383330A (zh)
WO (1) WO2019227322A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3869413A1 (en) * 2020-02-24 2021-08-25 STMicroelectronics S.r.l. Pooling unit for deep learning acceleration background
US11586907B2 (en) 2018-02-27 2023-02-21 Stmicroelectronics S.R.L. Arithmetic unit for deep learning acceleration
US11610362B2 (en) 2018-02-27 2023-03-21 Stmicroelectronics S.R.L. Data volume sculptor for deep learning acceleration
US11687762B2 (en) 2018-02-27 2023-06-27 Stmicroelectronics S.R.L. Acceleration unit for a deep learning engine

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR3089664A1 (fr) * 2018-12-05 2020-06-12 Stmicroelectronics (Rousset) Sas Procédé et dispositif pour réduire la charge de calcul d’un microprocesseur destiné à traiter des données par un réseau de neurones à convolution
CN112313673A (zh) * 2019-11-15 2021-02-02 深圳市大疆创新科技有限公司 感兴趣区域-池化层的计算方法与装置、以及神经网络系统
CN111429334A (zh) * 2020-03-26 2020-07-17 光子算数(北京)科技有限责任公司 一种数据处理方法、装置、存储介质及电子设备
KR102368075B1 (ko) * 2021-06-04 2022-02-25 오픈엣지테크놀로지 주식회사 고효율 풀링 방법 및 이를 위한 장치
CN113255897B (zh) * 2021-06-11 2023-07-07 西安微电子技术研究所 一种卷积神经网络的池化计算单元
KR102395743B1 (ko) * 2021-11-09 2022-05-09 오픈엣지테크놀로지 주식회사 1차원 어레이 풀링 방법 및 이를 위한 장치
KR102403277B1 (ko) * 2021-12-24 2022-05-30 오픈엣지테크놀로지 주식회사 어레이 풀링 방법 및 이를 위한 장치

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080131001A1 (en) * 2004-07-06 2008-06-05 Yoram Hofman Multi-level neural network based characters identification method and system
CN106855944A (zh) * 2016-12-22 2017-06-16 浙江宇视科技有限公司 行人标志物识别方法及装置
CN107784322A (zh) * 2017-09-30 2018-03-09 东软集团股份有限公司 异常数据检测方法、装置、存储介质以及程序产品
CN107862650A (zh) * 2017-11-29 2018-03-30 中科亿海微电子科技(苏州)有限公司 加速计算二维图像cnn卷积的方法
CN107918794A (zh) * 2017-11-15 2018-04-17 中国科学院计算技术研究所 基于计算阵列的神经网络处理器

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04295980A (ja) * 1991-03-25 1992-10-20 Eastman Kodak Japan Kk 画像読み取り装置
US6157751A (en) * 1997-12-30 2000-12-05 Cognex Corporation Method and apparatus for interleaving a parallel image processing memory
JP4219887B2 (ja) * 2004-12-28 2009-02-04 富士通マイクロエレクトロニクス株式会社 画像処理装置及び画像処理方法
US8929601B2 (en) * 2007-12-05 2015-01-06 John Caulfield Imaging detecting with automated sensing of an object or characteristic of that object
US20170076195A1 (en) * 2015-09-10 2017-03-16 Intel Corporation Distributed neural networks for scalable real-time analytics
JP2018005389A (ja) * 2016-06-29 2018-01-11 株式会社リコー 画像変形回路、画像処理装置、及び画像変形方法
US10510146B2 (en) * 2016-10-06 2019-12-17 Qualcomm Incorporated Neural network for image processing
CN107729986B (zh) * 2017-09-19 2020-11-03 平安科技(深圳)有限公司 驾驶模型训练方法、驾驶人识别方法、装置、设备及介质
CN107749044A (zh) * 2017-10-19 2018-03-02 珠海格力电器股份有限公司 图像信息的池化方法及装置
CN107832844A (zh) * 2017-10-30 2018-03-23 上海寒武纪信息科技有限公司 一种信息处理方法及相关产品

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080131001A1 (en) * 2004-07-06 2008-06-05 Yoram Hofman Multi-level neural network based characters identification method and system
CN106855944A (zh) * 2016-12-22 2017-06-16 浙江宇视科技有限公司 行人标志物识别方法及装置
CN107784322A (zh) * 2017-09-30 2018-03-09 东软集团股份有限公司 异常数据检测方法、装置、存储介质以及程序产品
CN107918794A (zh) * 2017-11-15 2018-04-17 中国科学院计算技术研究所 基于计算阵列的神经网络处理器
CN107862650A (zh) * 2017-11-29 2018-03-30 中科亿海微电子科技(苏州)有限公司 加速计算二维图像cnn卷积的方法

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11586907B2 (en) 2018-02-27 2023-02-21 Stmicroelectronics S.R.L. Arithmetic unit for deep learning acceleration
US11610362B2 (en) 2018-02-27 2023-03-21 Stmicroelectronics S.R.L. Data volume sculptor for deep learning acceleration
US11687762B2 (en) 2018-02-27 2023-06-27 Stmicroelectronics S.R.L. Acceleration unit for a deep learning engine
US11977971B2 (en) 2018-02-27 2024-05-07 Stmicroelectronics International N.V. Data volume sculptor for deep learning acceleration
EP3869413A1 (en) * 2020-02-24 2021-08-25 STMicroelectronics S.r.l. Pooling unit for deep learning acceleration background
US11507831B2 (en) 2020-02-24 2022-11-22 Stmicroelectronics International N.V. Pooling unit for deep learning acceleration
US11710032B2 (en) 2020-02-24 2023-07-25 Stmicroelectronics International N.V. Pooling unit for deep learning acceleration

Also Published As

Publication number Publication date
US20210073569A1 (en) 2021-03-11
CN110383330A (zh) 2019-10-25

Similar Documents

Publication Publication Date Title
WO2019227322A1 (zh) 池化装置和池化方法
US20200134435A1 (en) Computation apparatus, circuit and relevant method for neural network
US11734554B2 (en) Pooling processing method and system applied to convolutional neural network
KR102147356B1 (ko) 캐시 메모리 시스템 및 그 동작방법
JP2019071056A (ja) 映像イメージをセグメンテーションする方法及びこれを利用した装置
CN110622214B (zh) 基于超体素的时空视频分割的快速渐进式方法
US10070134B2 (en) Analytics assisted encoding
CN108304925B (zh) 一种池化计算装置及方法
US20210011860A1 (en) Data storage device, data processing system, and acceleration device therefor
US11494646B2 (en) Neural network system for performing learning, learning method thereof, and transfer learning method of neural network processor
US8520147B1 (en) System for segmented video data processing
CN116934573A (zh) 数据读写方法、存储介质及电子设备
CN101930593B (zh) 单一物体影像萃取系统及方法
US20230009202A1 (en) Image processing method and device, electronic apparatus and readable storage medium
TWI586144B (zh) 用於視頻分析與編碼之多重串流處理技術
WO2022068551A1 (zh) 裁剪视频的方法、装置、设备以及存储介质
CN104776919B (zh) 基于fpga的红外焦平面阵列条带状非均匀性校正系统和方法
US9679222B2 (en) Apparatus and method for detecting a feature in an image
US20210182656A1 (en) Arithmetic processing device
CN110996005A (zh) 一种实时数字图像增强方法及系统
CN113222831B (zh) 一种图像条带噪声去除的特征记忆遗忘单元、网络及系统
RU2820172C1 (ru) Способ обработки данных посредством нейронной сети, подвергнутой декомпозиции с учетом объема памяти вычислительного устройства (варианты), и компьютерно-читаемый носитель
CN112099737B (zh) 存储数据的方法、装置、设备和存储介质
CN111199268B (zh) 一种全连接层的实现方法、装置、电子设备及计算机可读存储介质
TWI692739B (zh) 影像深度解碼器及計算機裝置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18920319

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18920319

Country of ref document: EP

Kind code of ref document: A1