WO2021037042A1 - Pooling processing method and apparatus, and storage medium - Google Patents

Pooling processing method and apparatus, and storage medium Download PDF

Info

Publication number
WO2021037042A1
WO2021037042A1 PCT/CN2020/111277 CN2020111277W WO2021037042A1 WO 2021037042 A1 WO2021037042 A1 WO 2021037042A1 CN 2020111277 W CN2020111277 W CN 2020111277W WO 2021037042 A1 WO2021037042 A1 WO 2021037042A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
pooling
memory addresses
pooled
memory
Prior art date
Application number
PCT/CN2020/111277
Other languages
French (fr)
Chinese (zh)
Inventor
蒋燚
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2021037042A1 publication Critical patent/WO2021037042A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • This application relates to the field of image processing, and in particular to a pooling processing method, device, and storage medium.
  • the pooling method can include calculating the average or maximum value of a feature in an image area. Using this pooling method can retain useful information and remove redundant information, which is conducive to subsequent extraction of effective information .
  • NCHW neural networks
  • NHWC exemplary memory layout
  • the exemplary memory layout is (RRRR GGGG BBBB).
  • the exemplary memory layout of the NHWC format is (RGB RGB RGB RGB)
  • the interval is required when pooling the NHWC. Acquiring memory, which in turn causes the hit rate of the memory cache to decrease, and the pooling speed decreases.
  • the embodiments of the present application provide a pooling processing method, device, and storage medium, which can increase the pooling speed.
  • the embodiment of the present application provides a pooling processing method, the method includes:
  • N Is the number of pictures
  • C is the number of channels
  • H is the height of the picture
  • W is the width of the picture
  • the multiple memory addresses are sequentially interleaved to determine multiple sets of memory addresses corresponding to the multiple memory addresses, and multiple sets of data are obtained from the multiple sets of memory addresses.
  • One memory address corresponds to a set of memory addresses, and the number of one set of data in the multiple sets of data is determined by the bit width of the storage unit;
  • the multiple channels include the target channel, and one channel of the multiple channels corresponds to a set of data to be pooled;
  • corresponding pooling processing is performed on a group of data to be pooled corresponding to any one of the multiple channels.
  • the step of sequentially interleaving the multiple memory addresses to determine multiple groups of memory addresses corresponding to the multiple memory addresses includes:
  • bit width of the storage unit and the bit width of the data in the memory address sequentially determining the first number of a group of memory addresses corresponding to the first memory address among the plurality of memory addresses;
  • the multiple sets of memory addresses are determined according to multiple sets of first quantity data corresponding to the multiple memory addresses.
  • the method further includes:
  • the multiple sets of data to be pooled corresponding to the multiple channels are respectively stored in multiple storage units, and one storage unit of the multiple storage units stores a set of data to be pooled corresponding to one channel.
  • the performing corresponding pooling processing on a group of data to be pooled corresponding to any one of the multiple channels according to the pooling window includes:
  • the size of the pooling window is smaller than or equal to the storage capacity of the memory address of the storage unit.
  • the storage unit is a register, and according to the bit width of the storage unit and the bit width of the data in the memory address, a group of memory corresponding to the first memory address among the plurality of memory addresses is sequentially determined.
  • the first number of addresses including:
  • bit width of the register and the bit width of the data in the memory address determine the first amount of data stored in the memory address at a time by the register
  • the interleaved access to the first memory address using the single instruction, multiple data stream SIMD extended structure NEON instruction to obtain a set of memory addresses of the first quantity includes:
  • the method further includes:
  • the storage unit is a register; the register stores a memory address or stores specific data in the memory address.
  • the pooling processing includes: maximum pooling processing and average pooling processing; and performing corresponding pooling processing on the target pooling data includes:
  • Maximum pooling or average pooling is performed on the target pooled data, and data other than the target pooled data in a set of to-be-pooled data is eliminated.
  • An embodiment of the application provides a pooling processing device, and the device includes:
  • the acquiring part is configured to acquire multiple memory addresses in the target channel of the picture to be pooled, and the number of the multiple memory addresses is the same as the side length of the pooling window, wherein the picture to be pooled is in accordance with the NHWC layout type Perform memory layout, N is the number of pictures, C is the number of channels, H is the height of the picture, and W is the width of the picture;
  • the interleaving part is configured to sequentially interleave the multiple memory addresses, determine multiple sets of memory addresses corresponding to the multiple memory addresses, and obtain multiple sets of data from the multiple sets of memory addresses, so One memory address in the multiple memory addresses corresponds to a group of memory addresses, and the number of one group of data in the multiple groups of data is determined by the bit width of the storage unit;
  • the dividing part is configured to divide the multiple sets of data into multiple sets of data to be pooled corresponding to multiple channels, the multiple channels include the target channel, and one channel of the multiple channels corresponds to a group of to be pooled ⁇ data;
  • the pooling part is configured to perform corresponding pooling processing on a group of data to be pooled corresponding to any one of the multiple channels according to the pooling window.
  • the device further includes: a determining part
  • the determining part is configured to sequentially determine the first number of a group of memory addresses corresponding to the first memory address among the plurality of memory addresses according to the bit width of the storage unit and the bit width of the data in the memory address; according to Multiple sets of data of the first quantity corresponding to the multiple memory addresses determine the multiple sets of memory addresses;
  • the interleaving part is also configured to use a single instruction multiple data stream instruction NEON instruction to perform interleaving access to the first memory address to obtain a set of memory addresses of the first quantity.
  • the device further includes: a storage part;
  • the storage part is configured to store the multiple sets of data to be pooled corresponding to the multiple channels in multiple storage units, and one storage unit of the multiple storage units stores a set of to-be-pooled data corresponding to one channel. Pooled data.
  • the determining part is further configured to determine the target pooling data from the set of data to be pooled according to the size of the pooling window;
  • the pooling part is further configured to perform corresponding pooling processing on the target pooling data.
  • the size of the pooling window is smaller than or equal to the storage capacity of the memory address of the storage unit.
  • the determining part is further configured to determine the first amount of data stored in the memory address in the register at one time according to the bit width of the register and the bit width of the data in the memory address;
  • the storage part is also configured to store a group of data in a group of memory addresses of a channel that is cross-read according to the first memory address into a register.
  • the interleaving part is further configured to interleave a group of memory addresses in a plurality of channels of the first memory address by using the interleaving vld3q_f32 of the NEON instruction;
  • the acquiring part is also configured to acquire a group of data from a group of memory addresses.
  • the storage unit is a register; the register stores a memory address or stores specific data in the memory address.
  • the pooling processing includes: maximum pooling processing and average pooling processing,
  • the pooling part is also configured to perform maximum pooling or average pooling on the target pooled data, and remove data from a group of to-be-pooled data except for the target pooled data.
  • the pooling processing device includes a processor, a memory, and a communication bus; when the processor executes an operating program stored in the memory, the method according to any one of the above is implemented.
  • the embodiment of the present application provides a storage medium on which a computer program is stored, which is applied to a pooling processing device, and when the computer program is executed by a processor, the method as described in any one of the above is implemented.
  • the embodiments of the present application provide a pooling processing method and device, and a storage medium.
  • the method includes: acquiring multiple memory addresses in a target channel of a picture to be pooled, the number of multiple memory addresses, and the margins of the pooling window.
  • the length is the same, where the pictures to be pooled are arranged in memory according to the NHWC layout type, N is the number of pictures, C is the number of channels, H is the picture height, and W is the picture width; multiple memory addresses are sequentially interleaved to determine Multiple sets of memory addresses correspond to multiple sets of memory addresses, and multiple sets of data are obtained from multiple sets of memory addresses.
  • One memory address of multiple sets of memory addresses corresponds to one set of memory addresses.
  • the number of sets of data in multiple sets of data is determined by The bit width of the storage unit is determined; multiple sets of data are divided into multiple sets of data to be pooled corresponding to multiple channels, multiple channels include the target channel, and one channel of the multiple channels corresponds to a set of data to be pooled; according to pooling Window, a group of data to be pooled corresponding to any one of the multiple channels is pooled accordingly.
  • multiple memory addresses are read by interleaving to obtain multiple sets of memory addresses.
  • the multiple data in the multiple sets of memory addresses are divided into multiple sets of data to be pooled corresponding to multiple channels .
  • One-time interleaving can obtain the data to be pooled for pooling, thereby improving the hit rate and pooling speed of the memory cache.
  • FIG. 1 is a first flowchart of a pooling processing method provided by an embodiment of the application
  • FIG. 2 is an exemplary schematic diagram of obtaining three memory addresses in the first column of the R channel according to an embodiment of the application;
  • FIG. 3 is an exemplary schematic diagram of interleaving A1 according to an embodiment of the application.
  • FIG. 4 is an exemplary schematic diagram of interleaving A1, A2, and A3 provided by an embodiment of the application;
  • FIG. 5 is an exemplary schematic diagram of performing maximum processing on 12 values of 3 ⁇ 4 of the R channel according to an embodiment of the application;
  • FIG. 6 is an exemplary schematic diagram of setting the value of the fourth digit to the minimum value provided by an embodiment of the application.
  • FIG. 7 is a first structural diagram of a pooling processing device provided by an embodiment of the application.
  • FIG. 8 is a second structural diagram of a pooling processing device provided by an embodiment of the application.
  • the process of pooling the existing NHWC picture with the memory layout is: determine the first from the first column of the R channel The memory addresses of row A1, the second row A2 and the third row A3, after that, the memory address of A1 is increased by 3, and the memory address of A1 is increased by 6 to determine the two memory addresses in the same line as A1 in the R channel; Add 3 to the memory address, add 6 to the memory address of A2 to determine the two memory addresses in the R channel that are in line with A2; add 3 to the memory address of A3, and add 6 to the memory address of A3 to determine the R channel that is in line with A3 Two memory addresses.
  • a 3 ⁇ 3 memory address window is obtained on the R channel, and the 3 ⁇ 3 data of the corresponding area is obtained from the 3 ⁇ 3 memory address window, and the 3 ⁇ 3 data is processed
  • Maximum pooling or average pooling operation the above process is also applicable to G channel and B channel.
  • the existing process of pooling a color channel involves calculating the address of each data, which results in a very slow pooling speed. Therefore, the solution of the present application is proposed, and the solution of the present application is described in detail below.
  • the embodiment of the present application provides a pooling processing method. As shown in FIG. 1, the method may include:
  • the pooling processing method provided in the embodiments of the present application is applicable to a scenario where a pooling processing device performs pooling processing on pictures.
  • the pictures to be pooled are stored according to the memory layout of the NHWC dimension.
  • the pooling processing device determines the amount to be acquired according to the side length of the pooling window. The number of memory addresses.
  • the pooling processing device determines to acquire 3 memory addresses in the target channel of the picture to be pooled.
  • the target channel includes R channel, G channel, B channel or other color channels, which are specifically selected according to actual conditions.
  • the embodiment of this application does not make specific limitations, and is the R channel in actual applications.
  • the picture to be pooled includes data of three color channels of R channel, G channel and B channel, and the pooling processing device needs to use the pooling window to pool the picture to be pooled to 3 ⁇ 3.
  • the pooling processing device is on the R color channel, starting from the upper right corner to frame the data of the pooling window size, and obtaining the three memory addresses A1, A2, and A3 in the first column of the pooling window.
  • the pooling processing device uses formula (1) to obtain the memory address of A1 and uses the formula ( 2) Obtain the memory address of A2 and use formula (3) to obtain the memory address of A3.
  • kernelWidth is the width of the pooling window, with a value of 3; channelNums is the number of channels, with a value of 3; T is the starting address.
  • S102 Interleave multiple memory addresses in sequence, determine multiple sets of memory addresses corresponding to the multiple memory addresses, and obtain multiple sets of data from the multiple sets of memory addresses, and one memory address of the multiple memory addresses corresponds to one set
  • the memory address, the number of a group of data in multiple groups of data is determined by the bit width of the storage unit.
  • the pooling processing device After the pooling processing device obtains multiple memory addresses in the target channel of the picture to be pooled, the pooling processing device sequentially interleaves the multiple memory addresses to determine multiple sets of memory addresses corresponding to the multiple memory addresses , And obtain multiple sets of data from multiple sets of memory addresses.
  • the pooling processing device sequentially determines the first number of a group of memory addresses corresponding to the first memory address among the multiple memory addresses according to the bit width of the storage unit and the bit width of the data in the memory address;
  • the instruction multi-data stream SIMD extension structure NEON instruction interleaves the first memory address to obtain a group of the first number of memory addresses; multiple groups are determined according to the multiple groups of the first number of data corresponding to the multiple memory addresses Memory address.
  • the storage unit is a register.
  • the pooling processing device determines the first amount of data stored in the memory address at a time according to the bit width of the register and the bit width of the data in the memory address, and the pooling processing device will interleave according to the first memory address.
  • a set of data in a set of memory addresses of a channel read is stored in a register. Therefore, corresponding to the three channels of RGB, three registers are required to store the three cross-reads according to the first memory address. Three sets of data for each channel.
  • the register when a 128-bit register is used, it is determined that the data involved in the operation is a 32-bit floating point number, and the bit width of the register is divided by the bit width of the data in the memory address, that is, 128 is divided by 32, and the register stores 4 at a time The data in the memory address.
  • the pooling processing device uses the interleaving vld3q_f32 of the NEON instruction to interleave a group of memory addresses of the first memory address in multiple channels, and obtain a group of data from the group of memory addresses.
  • the multiple channels include the target channel, and one channel of the multiple channels corresponds to a set of data to be pooled.
  • the pooling processing device After the pooling processing device determines multiple sets of memory addresses corresponding to the multiple memory addresses, and obtains multiple sets of data from the multiple sets of memory addresses, the pooling processing device divides the multiple sets of data into multiple sets of waiting pools corresponding to multiple channels ⁇ Data.
  • the pooling processing device since the multiple sets of data obtained by the pooling processing device interleaving multiple memory addresses are data in multiple channels, the pooling processing device sequentially divides the data in the multiple sets of data according to the multiple channels. A group of data is divided into a group of data to be pooled.
  • the pooling processing device uses the interleaving vld3q_f32 of the NEON instruction to read the four data from left to right in the first row of the RGB channel.
  • the pooling processing device stores the multiple sets of data to be pooled corresponding to the multiple channels into multiple storage units, respectively .
  • One storage unit among the multiple storage units stores a group of data to be pooled corresponding to one channel.
  • a memory address or specific data in a memory address can be stored, and the specific selection is made according to actual conditions, and the embodiment of the present application does not make specific limitations.
  • the three memory addresses of A1, A2, and A3 are respectively interleaved to obtain 12*3 values corresponding to the RGB channels.
  • S104 According to the pooling window, perform corresponding pooling processing on a group of data to be pooled corresponding to any one of the multiple channels.
  • the pooling processing device After the pooling processing device divides multiple sets of data into multiple sets of data to be pooled corresponding to multiple channels, the pooling processing device performs a set of data to be pooled corresponding to any one of the multiple channels according to the pooling window. Carry out the corresponding pooling treatment.
  • the pooling processing device determines the target pooling data from a set of data to be pooled according to the size of the pooling window; after that, the pooling processing device performs corresponding pooling processing on the target pooling data, Among them, pooling processing includes maximum pooling processing and average pooling processing.
  • the pooling processing device defines the target pooling data according to the size of the pooling window from a group of memory addresses to be pooled, and performs maximum pooling or average pooling on the target pooling data, and then Group the data to be pooled except the target pooled data for elimination processing.
  • the data in a group of to-be-pooled data except the target pooled data is set to the minimum value.
  • filter when calculating the maximum value.
  • a set of data to be pooled except the target pooled data is dropped.
  • the pooling window size for the maximum pooling operation is 3 ⁇ 3, and the pooling processing device sets each row of The 4-bit value is set to the minimum value, -max, as shown in Figure 6, and then the maximum value vmaxq_f32 is calculated for the 3 ⁇ 3 value, and the result is 5.
  • a register stores data in a memory address of a channel
  • the pooling processing device performs pooling processing on the data corresponding to the memory address in each register to obtain a pooling result corresponding to a channel.
  • the size of the pooling window is less than or equal to the storage capacity of the storage memory address of the storage unit, so that a pooled data greater than or equal to the pooling window can be obtained, and the corresponding pooling operation can be completed.
  • multiple memory addresses are read by interleaving to obtain multiple sets of memory addresses, and then multiple sets of data in multiple sets of memory addresses are divided into multiple sets of data to be pooled corresponding to multiple channels ,
  • One-time interleaving can obtain the data to be pooled for pooling, thereby improving the hit rate and pooling speed of the memory cache.
  • the pooling processing device 1 may include:
  • the acquiring part 10 is configured to acquire multiple memory addresses in the target channel of the picture to be pooled, the number of the multiple memory addresses is the same as the side length of the pooling window, wherein the picture to be pooled is laid out according to NHWC Type for memory layout, N is the number of pictures, C is the number of channels, H is the height of the picture, and W is the width of the picture;
  • the interleaving part 11 is configured to sequentially interleave the multiple memory addresses, determine multiple sets of memory addresses corresponding to the multiple memory addresses, and obtain multiple sets of data from the multiple sets of memory addresses, One memory address in the multiple memory addresses corresponds to a group of memory addresses, and the number of a group of data in the multiple groups of data is determined by the bit width of the storage unit;
  • the dividing part 12 is configured to divide the multiple sets of data into multiple sets of data to be pooled corresponding to multiple channels, the multiple channels include the target channel, and one channel of the multiple channels corresponds to a set of data to be pooled. Pooled data;
  • the pooling part 13 is configured to perform corresponding pooling processing on a group of data to be pooled corresponding to any one of the multiple channels according to the pooling window.
  • the device further includes: a determining part 14;
  • the determining part 14 is configured to sequentially determine the first number of a group of memory addresses corresponding to the first memory address among the plurality of memory addresses according to the bit width of the storage unit and the bit width of the data in the memory address; Determine the multiple sets of memory addresses according to multiple sets of first quantity data corresponding to the multiple memory addresses;
  • the interleaving part 11 is also configured to use a single instruction multiple data stream instruction NEON instruction to perform interleaving access to the first memory address to obtain a set of memory addresses of the first quantity.
  • the device further includes: a storage part 15;
  • the storage part 15 is configured to store the plurality of groups of data to be pooled corresponding to the plurality of channels into a plurality of storage units, and one storage unit of the plurality of storage units stores a group corresponding to one channel Data to be pooled.
  • the determining part 14 is further configured to determine target pooling data from the set of data to be pooled according to the size of the pooling window;
  • the pooling part 13 is further configured to perform corresponding pooling processing on the target pooling data.
  • the size of the pooling window is smaller than or equal to the storage capacity of the memory address of the storage unit.
  • the storage unit is a register
  • the determining part 14 is further configured to determine the first amount of data stored in the memory address in the register at one time according to the bit width of the register and the bit width of the data in the memory address;
  • the storage part 15 is also configured to store a group of data in a group of memory addresses of a channel that is cross-read according to the first memory address into a register.
  • the interleaving part 11 is further configured to interleave a group of memory addresses of the first memory address in multiple channels by using the interleaving vld3q_f32 of the NEON instruction;
  • the acquiring part 10 is also configured to acquire a group of data from a group of memory addresses.
  • the storage unit is a register; the register stores a memory address or stores specific data in the memory address.
  • the pooling processing includes: maximum pooling processing and average pooling processing,
  • the pooling part 13 is also configured to perform maximum pooling or average pooling on the target pooled data, and remove data from a group of to-be-pooled data except for the target pooled data.
  • An embodiment of the present application provides a pooling processing device that acquires multiple memory addresses in a target channel of a picture to be pooled, and the number of the multiple memory addresses is the same as the side length of the pooling window, where the picture to be pooled
  • the memory layout is carried out according to the NHWC layout type, N is the number of pictures, C is the number of channels, H is the picture height, and W is the picture width; multiple memory addresses are sequentially interleaved to determine multiple groups of memory corresponding to multiple memory addresses Address, and obtain multiple sets of data from multiple sets of memory addresses.
  • One memory address in the multiple memory addresses corresponds to a set of memory addresses.
  • the number of a set of data in the multiple sets of data is determined by the bit width of the storage unit;
  • the group data is divided into multiple groups of data to be pooled corresponding to multiple channels.
  • the multiple channels include the target channel, and one channel of the multiple channels corresponds to a group of data to be pooled; according to the pooling window, any of the multiple channels is A group of data to be pooled corresponding to a channel undergoes corresponding pooling processing.
  • the pooling processing device proposed in this embodiment uses interleaving to read multiple memory addresses to obtain multiple sets of memory addresses, and then divide the multiple data in the multiple sets of memory addresses into multiple For multiple sets of data to be pooled corresponding to the channel, one interleaved access can obtain the data to be pooled required for pooling, thereby improving the hit rate and pooling speed of the memory cache.
  • FIG. 8 is a second schematic diagram of the composition structure of a pooling processing device 1 provided by an embodiment of the application.
  • the pooling processing of this embodiment is The device 1 includes a processor 16, a memory 17 and a communication bus 18.
  • the acquisition unit 10, the interleaving unit 11, the dividing unit 12, the pooling unit 13, and the determining unit 14 may be implemented by the processor 16 located on the pooling processing device 1, and the storage unit 15 may be The memory 17 located on the pooling processing device 1 is implemented.
  • the above-mentioned processor 16 may be an Application Specific Integrated Circuit (ASIC), a digital signal processor (DSP, Digital Signal Processor), or a digital signal processing pooling processing device. (DSPD, Digital Signal Processing Device), Programmable Logic Pooling Processing Device (PLD, Programmable Logic Device), Field Programmable Gate Array (FPGA, Field Programmable Gate Array), CPU, controller, microcontroller, microprocessor At least one of them. It can be understood that, for different devices, the electronic devices used to implement the above-mentioned processor functions may also be other, which is not specifically limited in this embodiment.
  • the above-mentioned communication bus 18 is used to realize the connection and communication between the processor 16 and the memory 17; when the above-mentioned processor 16 executes the operating program stored in the memory 17, the pooling process as described in the first embodiment is implemented. method.
  • the embodiments of the present application provide a storage medium on which a computer program is stored, and the above-mentioned computer-readable storage medium stores one or more programs, and the above-mentioned one or more programs can be executed by one or more processors and applied to the pool.
  • the computer program implements the pool processing method as described in the first embodiment.
  • the embodiments of the present application provide a pooling processing method and device, and a storage medium.
  • the method includes: acquiring multiple memory addresses in a target channel of a picture to be pooled, the number of multiple memory addresses, and the margins of the pooling window.
  • the length is the same, where the pictures to be pooled are arranged in memory according to the NHWC layout type, N is the number of pictures, C is the number of channels, H is the picture height, and W is the picture width; multiple memory addresses are sequentially interleaved to determine Multiple sets of memory addresses correspond to multiple sets of memory addresses, and multiple sets of data are obtained from multiple sets of memory addresses.
  • One memory address of multiple sets of memory addresses corresponds to one set of memory addresses.
  • the number of sets of data in multiple sets of data is determined by The bit width of the storage unit is determined; multiple sets of data are divided into multiple sets of data to be pooled corresponding to multiple channels, multiple channels include the target channel, and one channel of the multiple channels corresponds to a set of data to be pooled; according to pooling Window, a group of data to be pooled corresponding to any one of the multiple channels is pooled accordingly.
  • multiple memory addresses are read by interleaving to obtain multiple sets of memory addresses.
  • the multiple data in the multiple sets of memory addresses are divided into multiple sets of data to be pooled corresponding to multiple channels .
  • One-time interleaving can obtain the data to be pooled for pooling, thereby improving the hit rate and pooling speed of the memory cache.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Input (AREA)

Abstract

A pooling processing method and apparatus, and a storage medium. The method comprises: acquiring a plurality of memory addresses from a target channel of a picture to be pooled, wherein the number of the plurality of memory addresses is the same as the edge length of a pooling window (S101); successively performing interleaving access on the plurality of memory addresses, determining a plurality of groups of memory addresses corresponding to the plurality of memory addresses, and acquiring a plurality of groups of data from the plurality of groups of memory addresses, wherein one memory address from among the plurality of memory addresses corresponds to one group of memory addresses, and the number of one group of data from among the plurality of groups of data is determined by the bit width of a storage unit (S102); dividing the plurality of groups of data into a plurality of groups of data to be pooled corresponding to a plurality of channels, wherein the plurality of channels comprise the target channel, and one channel from among the plurality of channels corresponds to one group of data to be pooled (S103); and according to the pooling window, performing corresponding pooling processing on a group of data to be pooled corresponding to any channel from among the plurality of channels (S104).

Description

一种池化处理方法及装置、存储介质Pool processing method, device and storage medium
相关申请的交叉引用Cross-references to related applications
本申请基于申请号为201910797622.4、申请日为2019年08月27日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以引入方式并入本申请。This application is filed based on a Chinese patent application with an application number of 201910797622.4 and an application date of August 27, 2019, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is hereby incorporated into this application by way of introduction.
技术领域Technical field
本申请涉及图像处理领域,尤其涉及一种池化处理方法及装置、存储介质。This application relates to the field of image processing, and in particular to a pooling processing method, device, and storage medium.
背景技术Background technique
近几年来,作为具有自组织、自学习和联想功能的神经网络已成功地应用于图像处理的许多方面,如图像压缩、图像分割、边缘检测、图像增强、图像识别等,而池化是神经网络中一个不可或缺的操作,由于图像具有“静态型”的属性,即在一个图像区域的特征极有可能在另一个区域同样适用,故可以通过对不同位置的特征进行聚合统计的池化方式来描述一幅图像,该池化方式可以包括计算一个图像区域中某个特征的平均值或者最大值,采用这种池化方式可以保留有用信息而去除冗余信息,有利于后续提取有效信息。In recent years, as a neural network with self-organization, self-learning and association functions, it has been successfully applied to many aspects of image processing, such as image compression, image segmentation, edge detection, image enhancement, image recognition, etc., and pooling is neural An indispensable operation in the network, because the image has the "static" attribute, that is, the features in one image area are very likely to be equally applicable in another area, so it can be pooled by aggregation statistics of the features in different locations A way to describe an image. The pooling method can include calculating the average or maximum value of a feature in an image area. Using this pooling method can retain useful information and remove redundant information, which is conducive to subsequent extraction of effective information .
目前神经网络中内存布局主要有NCHW和NHWC两种。对于NCHW格式,其内存布局示例性的为(RRRR GGGG BBBB),在对NCHW进行池化操作时,由于其内存布局连续,多采用“滑窗法”,而NHWC格式其内存布局示例性的为(RGB RGB RGB RGB),可以看出其多通道内存不连续,无法采用“滑窗法”,需要对每个待比较的值先计算其内存地址,故,对NHWC进行池化操作时需要间隔获取内存,进而导致内存的高速缓冲存储器cache的命中率降低,池化速度降低。At present, there are two main types of memory layouts in neural networks: NCHW and NHWC. For the NCHW format, the exemplary memory layout is (RRRR GGGG BBBB). When the NCHW is pooled, due to its continuous memory layout, the "sliding window method" is often used, while the exemplary memory layout of the NHWC format is (RGB RGB RGB RGB), it can be seen that the multi-channel memory is not continuous, and the "sliding window method" cannot be used. It is necessary to calculate the memory address for each value to be compared. Therefore, the interval is required when pooling the NHWC. Acquiring memory, which in turn causes the hit rate of the memory cache to decrease, and the pooling speed decreases.
发明内容Summary of the invention
本申请实施例提供一种池化处理方法及装置、存储介质,能够提高池化速度。The embodiments of the present application provide a pooling processing method, device, and storage medium, which can increase the pooling speed.
本申请的技术方案是这样实现的:The technical solution of this application is realized as follows:
本申请实施例提供一种池化处理方法,所述方法包括:The embodiment of the present application provides a pooling processing method, the method includes:
在待池化图片的目标通道中获取多个内存地址,所述多个内存地址的个数与池化窗口的边长相同,其中,所述待池化图片按照NHWC布局类型进行内存布局,N为图片数量,C为通道数量,H为图片高度,W为图片宽度;Acquire multiple memory addresses in the target channel of the picture to be pooled, and the number of the multiple memory addresses is the same as the side length of the pooling window, where the picture to be pooled performs memory layout according to the NHWC layout type, N Is the number of pictures, C is the number of channels, H is the height of the picture, and W is the width of the picture;
对所述多个内存地址依次进行交叉存取,确定出所述多个内存地址对应的多组内存地址,并从所述多组内存地址中获取多组数据,所述多个内存地址中的一个内存地址对应一组内存地址,所述多组数据中的一组数据的个数由存储单元的位宽确定;The multiple memory addresses are sequentially interleaved to determine multiple sets of memory addresses corresponding to the multiple memory addresses, and multiple sets of data are obtained from the multiple sets of memory addresses. One memory address corresponds to a set of memory addresses, and the number of one set of data in the multiple sets of data is determined by the bit width of the storage unit;
将所述多组数据划分为多个通道对应的多组待池化数据,所述多个通道包括所述目标通道,所述多个通道中的一个通道对应一组待池化数据;Dividing the multiple sets of data into multiple sets of data to be pooled corresponding to multiple channels, the multiple channels include the target channel, and one channel of the multiple channels corresponds to a set of data to be pooled;
按照所述池化窗口,对所述多个通道中的任一通道对应的一组待池化数据进行相应的池化处理。According to the pooling window, corresponding pooling processing is performed on a group of data to be pooled corresponding to any one of the multiple channels.
在上述方法中,所述对所述多个内存地址依次进行交叉存取,确定出所述多个内存地址对应的多组内存地址,包括:In the above method, the step of sequentially interleaving the multiple memory addresses to determine multiple groups of memory addresses corresponding to the multiple memory addresses includes:
根据所述存储单元的位宽和内存地址中数据的位宽,依次确定出所述多个内存地址中第一内存地址对应的一组内存地址的第一数量;According to the bit width of the storage unit and the bit width of the data in the memory address, sequentially determining the first number of a group of memory addresses corresponding to the first memory address among the plurality of memory addresses;
使用单指令多数据流指令NEON指令对第一内存地址进行交叉存取,得到一组第一数量的内存地址;Use the single instruction multiple data stream instruction NEON instruction to interleave the first memory address to obtain a set of memory addresses of the first quantity;
根据所述多个内存地址对应的多个一组第一数量的数据,确定出所述多组内存地址。The multiple sets of memory addresses are determined according to multiple sets of first quantity data corresponding to the multiple memory addresses.
在上述方法中,所述将所述多组数据划分为多个通道对应的多组待池化数据之后,所述方法还包括:In the above method, after dividing the multiple sets of data into multiple sets of data to be pooled corresponding to multiple channels, the method further includes:
将所述多个通道对应的所述多组待池化数据分别存储至多个存储单元中,所述多个存储单元中的一个存储单元存储一个通道对应的一组待池化数据。The multiple sets of data to be pooled corresponding to the multiple channels are respectively stored in multiple storage units, and one storage unit of the multiple storage units stores a set of data to be pooled corresponding to one channel.
在上述方法中,所述按照所述池化窗口,对所述多个通道中的任一通道对应的一组待池化数据进行相应的池化处理,包括:In the above method, the performing corresponding pooling processing on a group of data to be pooled corresponding to any one of the multiple channels according to the pooling window includes:
按照所述池化窗口的尺寸,从所述一组待池化数据中确定出目标池化数据;Determining target pooling data from the set of data to be pooled according to the size of the pooling window;
对所述目标池化数据进行相应的池化处理。Perform corresponding pooling processing on the target pooling data.
在上述方法中,所述池化窗口的尺寸小于或者等于所述存储单元存储内存地址的存储容量。In the above method, the size of the pooling window is smaller than or equal to the storage capacity of the memory address of the storage unit.
在上述方法中,所述存储单元为寄存器,所述根据所述存储单元的位宽和内存地址中数据的位宽,依次确定出所述多个内存地址中第一内存地址对应的一组内存地址的第一数量,包括:In the above method, the storage unit is a register, and according to the bit width of the storage unit and the bit width of the data in the memory address, a group of memory corresponding to the first memory address among the plurality of memory addresses is sequentially determined. The first number of addresses, including:
根据寄存器的位宽和内存地址中数据的位宽,确定出寄存器一次存储内存地址中的数据的第一数量;According to the bit width of the register and the bit width of the data in the memory address, determine the first amount of data stored in the memory address at a time by the register;
将根据第一内存地址交叉读取到的一个通道的一组内存地址中的一组 数据存储至一个寄存器中。Store a group of data in a group of memory addresses of a channel cross read according to the first memory address into a register.
在上述方法中,所述使用单指令、多数据流SIMD扩展结构NEON指令对第一内存地址进行交叉存取,得到一组第一数量的内存地址,包括:In the above method, the interleaved access to the first memory address using the single instruction, multiple data stream SIMD extended structure NEON instruction to obtain a set of memory addresses of the first quantity includes:
使用NEON指令的交叉存取vld3q_f32交叉读取第一内存地址在多个通道中的一组内存地址;Use NEON instruction interleaving vld3q_f32 interleaving to read a group of memory addresses of the first memory address in multiple channels;
相应的,所述方法还包括:Correspondingly, the method further includes:
从一组内存地址中获取一组数据。Get a set of data from a set of memory addresses.
在上述方法中,所述存储单元为寄存器;所述寄存器存储内存地址或者存储内存地址中的具体数据。In the above method, the storage unit is a register; the register stores a memory address or stores specific data in the memory address.
在上述方法中,所述池化处理包括:最大池化处理和平均池化处理;所述对所述目标池化数据进行相应的池化处理,包括:In the above method, the pooling processing includes: maximum pooling processing and average pooling processing; and performing corresponding pooling processing on the target pooling data includes:
对目标池化数据进行最大池化或者平均池化,将一组待池化数据中除目标池化数据外的数据进行剔除处理。Maximum pooling or average pooling is performed on the target pooled data, and data other than the target pooled data in a set of to-be-pooled data is eliminated.
本申请实施例提供一种池化处理装置,所述装置包括:An embodiment of the application provides a pooling processing device, and the device includes:
获取部分,配置为在待池化图片的目标通道中获取多个内存地址,所述多个内存地址的个数与池化窗口的边长相同,其中,所述待池化图片按照NHWC布局类型进行内存布局,N为图片数量,C为通道数量,H为图片高度,W为图片宽度;The acquiring part is configured to acquire multiple memory addresses in the target channel of the picture to be pooled, and the number of the multiple memory addresses is the same as the side length of the pooling window, wherein the picture to be pooled is in accordance with the NHWC layout type Perform memory layout, N is the number of pictures, C is the number of channels, H is the height of the picture, and W is the width of the picture;
交叉存取部分,配置为对所述多个内存地址依次进行交叉存取,确定出所述多个内存地址对应的多组内存地址,并从所述多组内存地址中获取多组数据,所述多个内存地址中的一个内存地址对应一组内存地址,所述多组数据中的一组数据的个数由存储单元的位宽确定;The interleaving part is configured to sequentially interleave the multiple memory addresses, determine multiple sets of memory addresses corresponding to the multiple memory addresses, and obtain multiple sets of data from the multiple sets of memory addresses, so One memory address in the multiple memory addresses corresponds to a group of memory addresses, and the number of one group of data in the multiple groups of data is determined by the bit width of the storage unit;
划分部分,配置为将所述多组数据划分为多个通道对应的多组待池化数据,所述多个通道包括所述目标通道,所述多个通道中的一个通道对应一组待池化数据;The dividing part is configured to divide the multiple sets of data into multiple sets of data to be pooled corresponding to multiple channels, the multiple channels include the target channel, and one channel of the multiple channels corresponds to a group of to be pooled化 data;
池化部分,配置为按照所述池化窗口,对所述多个通道中的任一通道对应的一组待池化数据进行相应的池化处理。The pooling part is configured to perform corresponding pooling processing on a group of data to be pooled corresponding to any one of the multiple channels according to the pooling window.
在上述装置中,所述装置还包括:确定部分;In the above device, the device further includes: a determining part;
所述确定部分,配置为根据所述存储单元的位宽和内存地址中数据的位宽,依次确定出所述多个内存地址中第一内存地址对应的一组内存地址的第一数量;根据所述多个内存地址对应的多个一组第一数量的数据,确定出所述多组内存地址;The determining part is configured to sequentially determine the first number of a group of memory addresses corresponding to the first memory address among the plurality of memory addresses according to the bit width of the storage unit and the bit width of the data in the memory address; according to Multiple sets of data of the first quantity corresponding to the multiple memory addresses determine the multiple sets of memory addresses;
所述交叉存取部分,还配置为使用单指令多数据流指令NEON指令对第一内存地址进行交叉存取,得到一组第一数量的内存地址。The interleaving part is also configured to use a single instruction multiple data stream instruction NEON instruction to perform interleaving access to the first memory address to obtain a set of memory addresses of the first quantity.
在上述装置中,所述装置还包括:存储部分;In the above device, the device further includes: a storage part;
所述存储部分,配置为将所述多个通道对应的所述多组待池化数据分别存储至多个存储单元中,所述多个存储单元中的一个存储单元存储一个通道对应的一组待池化数据。The storage part is configured to store the multiple sets of data to be pooled corresponding to the multiple channels in multiple storage units, and one storage unit of the multiple storage units stores a set of to-be-pooled data corresponding to one channel. Pooled data.
在上述装置中,所述确定部分,还配置为按照所述池化窗口的尺寸,从所述一组待池化数据中确定出目标池化数据;In the above device, the determining part is further configured to determine the target pooling data from the set of data to be pooled according to the size of the pooling window;
所述池化部分,还配置为对所述目标池化数据进行相应的池化处理。The pooling part is further configured to perform corresponding pooling processing on the target pooling data.
在上述装置中,所述池化窗口的尺寸小于或者等于所述存储单元存储内存地址的存储容量。In the above device, the size of the pooling window is smaller than or equal to the storage capacity of the memory address of the storage unit.
在上述装置中,所述存储单元为寄存器,In the above device, the storage unit is a register,
所述确定部分,还配置为根据寄存器的位宽和内存地址中数据的位宽,确定出寄存器一次存储内存地址中的数据的第一数量;The determining part is further configured to determine the first amount of data stored in the memory address in the register at one time according to the bit width of the register and the bit width of the data in the memory address;
所述存储部分,还配置为将根据第一内存地址交叉读取到的一个通道的一组内存地址中的一组数据存储至一个寄存器中。The storage part is also configured to store a group of data in a group of memory addresses of a channel that is cross-read according to the first memory address into a register.
在上述装置中,所述交叉存取部分,还配置为使用NEON指令的交叉存取vld3q_f32交叉读取第一内存地址在多个通道中的一组内存地址;In the above-mentioned device, the interleaving part is further configured to interleave a group of memory addresses in a plurality of channels of the first memory address by using the interleaving vld3q_f32 of the NEON instruction;
所述获取部分,还配置为从一组内存地址中获取一组数据。The acquiring part is also configured to acquire a group of data from a group of memory addresses.
在上述装置中,所述存储单元为寄存器;所述寄存器存储内存地址或者存储内存地址中的具体数据。In the above device, the storage unit is a register; the register stores a memory address or stores specific data in the memory address.
在上述装置中,所述池化处理包括:最大池化处理和平均池化处理,In the above device, the pooling processing includes: maximum pooling processing and average pooling processing,
所述池化部分,还配置为对目标池化数据进行最大池化或者平均池化,将一组待池化数据中除目标池化数据外的数据进行剔除处理。The pooling part is also configured to perform maximum pooling or average pooling on the target pooled data, and remove data from a group of to-be-pooled data except for the target pooled data.
本申请实施例提供一种池化处理装置,所述池化处理装置包括:处理器、存储器及通信总线;所述处理器执行存储器存储的运行程序时实现如上述任一项所述的方法。An embodiment of the present application provides a pooling processing device. The pooling processing device includes a processor, a memory, and a communication bus; when the processor executes an operating program stored in the memory, the method according to any one of the above is implemented.
本申请实施例提供一种存储介质,其上存储有计算机程序,应用于池化处理装置,该计算机程序被处理器执行时实现如上述任一项所述的方法。The embodiment of the present application provides a storage medium on which a computer program is stored, which is applied to a pooling processing device, and when the computer program is executed by a processor, the method as described in any one of the above is implemented.
本申请实施例提供了一种池化处理方法及装置、存储介质,该方法包括:在待池化图片的目标通道中获取多个内存地址,多个内存地址的个数与池化窗口的边长相同,其中,待池化图片按照NHWC布局类型进行内存布局,N为图片数量,C为通道数量,H为图片高度,W为图片宽度;对多个内存地址依次进行交叉存取,确定出多个内存地址对应的多组内存地址,并从多组内存地址中获取多组数据,多个内存地址中的一个内存地址对应一组内存地址,多组数据中的一组数据的个数由存储单元的位宽确定;将多组数据划分为多个通道对应的多组待池化数据,多个通道包括目标通道,多个通道中的一个通道对应一组待池化数据;按照池化窗口,对多个通道中的任一通道对应的一组待池化数据进行相应的池化处理。采用上述实现方案,采用交叉存取的方式对多个内存地址进行读取,得到多组内存地址,之后,将多组内存地址中的多数据划分为多个通道对应的多组待池化数据,一次交叉存取就能获取一个池化所需的待池化数据,进而提高了内存cache的命中率和池化速度。The embodiments of the present application provide a pooling processing method and device, and a storage medium. The method includes: acquiring multiple memory addresses in a target channel of a picture to be pooled, the number of multiple memory addresses, and the margins of the pooling window. The length is the same, where the pictures to be pooled are arranged in memory according to the NHWC layout type, N is the number of pictures, C is the number of channels, H is the picture height, and W is the picture width; multiple memory addresses are sequentially interleaved to determine Multiple sets of memory addresses correspond to multiple sets of memory addresses, and multiple sets of data are obtained from multiple sets of memory addresses. One memory address of multiple sets of memory addresses corresponds to one set of memory addresses. The number of sets of data in multiple sets of data is determined by The bit width of the storage unit is determined; multiple sets of data are divided into multiple sets of data to be pooled corresponding to multiple channels, multiple channels include the target channel, and one channel of the multiple channels corresponds to a set of data to be pooled; according to pooling Window, a group of data to be pooled corresponding to any one of the multiple channels is pooled accordingly. Using the above implementation scheme, multiple memory addresses are read by interleaving to obtain multiple sets of memory addresses. After that, the multiple data in the multiple sets of memory addresses are divided into multiple sets of data to be pooled corresponding to multiple channels , One-time interleaving can obtain the data to be pooled for pooling, thereby improving the hit rate and pooling speed of the memory cache.
附图说明Description of the drawings
图1为本申请实施例提供的一种池化处理方法的流程图一;FIG. 1 is a first flowchart of a pooling processing method provided by an embodiment of the application;
图2为本申请实施例提供的一种示例性的获取R通道中第一列三个内存地址的示意图;FIG. 2 is an exemplary schematic diagram of obtaining three memory addresses in the first column of the R channel according to an embodiment of the application;
图3为本申请实施例提供的一种示例性的对A1进行交叉存取的示意图;FIG. 3 is an exemplary schematic diagram of interleaving A1 according to an embodiment of the application;
图4为本申请实施例提供的一种示例性的对A1、A2、A3进行交叉存取的示意图;FIG. 4 is an exemplary schematic diagram of interleaving A1, A2, and A3 provided by an embodiment of the application;
图5为本申请实施例提供的一种示例性的对R通道的3×4的12个数值进行最大处理处理的示意图;FIG. 5 is an exemplary schematic diagram of performing maximum processing on 12 values of 3×4 of the R channel according to an embodiment of the application; FIG.
图6为本申请实施例提供的一种示例性的将第4位的数值设置为最小值的示意图;FIG. 6 is an exemplary schematic diagram of setting the value of the fourth digit to the minimum value provided by an embodiment of the application; FIG.
图7为本申请实施例提供的一种池化处理装置的结构示意图一;FIG. 7 is a first structural diagram of a pooling processing device provided by an embodiment of the application;
图8为本申请实施例提供的一种池化处理装置的结构示意图二。FIG. 8 is a second structural diagram of a pooling processing device provided by an embodiment of the application.
具体实施方式detailed description
应当理解,此处描述的具体实施例仅仅用以解释本申请。并不用于限定本申请。It should be understood that the specific embodiments described here are only used to explain the present application. It is not used to limit this application.
以池化窗口尺寸为3×3、颜色通道为RGB三个颜色通道为例,对现有的对内存布局为NHWC图片进行池化的过程为:从R通道的第一列中确定出第一行A1、第二行A2和第三行A3的内存地址,之后,将A1的内存地址加3、将A1的内存地址加6确定出R通道中与A1同行的两个内存地址;将A2的内存地址加3、将A2的内存地址加6确定出R通道中与A2同行的两个内存地址;将A3的内存地址加3、将A3的内存地址加6确定出R通道中与A3同行的两个内存地址,此时,在R通道得到一个3×3的内存地址窗口,分别从3×3的内存地址窗口中获取对应区域的3×3的数据,并对该3×3的数据进行最大池化或者平均池化操作,上述过程对于G通道和B通道同样适用。现有的对一个颜色通道进行池化的过程涉及到对每一个数据的地址计算,导致池化速度非常慢,由此提出了本申请的方案,以下对本申请的方案进行详述。Taking the pooling window size of 3×3 and the color channel of the three color channels of RGB as an example, the process of pooling the existing NHWC picture with the memory layout is: determine the first from the first column of the R channel The memory addresses of row A1, the second row A2 and the third row A3, after that, the memory address of A1 is increased by 3, and the memory address of A1 is increased by 6 to determine the two memory addresses in the same line as A1 in the R channel; Add 3 to the memory address, add 6 to the memory address of A2 to determine the two memory addresses in the R channel that are in line with A2; add 3 to the memory address of A3, and add 6 to the memory address of A3 to determine the R channel that is in line with A3 Two memory addresses. At this time, a 3×3 memory address window is obtained on the R channel, and the 3×3 data of the corresponding area is obtained from the 3×3 memory address window, and the 3×3 data is processed Maximum pooling or average pooling operation, the above process is also applicable to G channel and B channel. The existing process of pooling a color channel involves calculating the address of each data, which results in a very slow pooling speed. Therefore, the solution of the present application is proposed, and the solution of the present application is described in detail below.
实施例一Example one
本申请实施例提供一种池化处理方法,如图1所示,该方法可以包括:The embodiment of the present application provides a pooling processing method. As shown in FIG. 1, the method may include:
S101、在待池化图片的目标通道中获取多个内存地址,多个内存地址的个数与池化窗口的边长相同,其中,待池化图片按照NHWC布局类型进行内存布局,N为图片数量,C为通道数量,H为图片高度,W为图片宽度。S101. Obtain multiple memory addresses in the target channel of the picture to be pooled, and the number of the multiple memory addresses is the same as the side length of the pooling window, where the picture to be pooled is arranged in memory according to the NHWC layout type, and N is the picture Quantity, C is the number of channels, H is the height of the picture, and W is the width of the picture.
本申请实施例提供的一种池化处理方法适用于池化处理装置对图片进行池化处理的场景下。The pooling processing method provided in the embodiments of the present application is applicable to a scenario where a pooling processing device performs pooling processing on pictures.
本申请实施例中,将待池化图片按照NHWC维度的内存布局进行存储,当需要对待池化图片进行池化处理时,池化处理装置根据池化窗口的边长,确定出需获取的多个内存地址的数量。In the embodiment of the present application, the pictures to be pooled are stored according to the memory layout of the NHWC dimension. When the pictures to be pooled need to be pooled, the pooling processing device determines the amount to be acquired according to the side length of the pooling window. The number of memory addresses.
示例性的,当预设窗口为3×3时,池化处理装置确定出在待池化图片的目标通道中获取3个内存地址。Exemplarily, when the preset window is 3×3, the pooling processing device determines to acquire 3 memory addresses in the target channel of the picture to be pooled.
本申请实施例中,目标通道包括R通道、G通道、B通道或者其他的颜色通道,具体的根据实际情况进行选择,本申请实施例不做具体的限定,在实际应用中为R通道。In the embodiment of this application, the target channel includes R channel, G channel, B channel or other color channels, which are specifically selected according to actual conditions. The embodiment of this application does not make specific limitations, and is the R channel in actual applications.
示例性的,如图2所示,待池化图片包括R通道、G通道和B通道三个颜色通道的数据,池化处理装置需要利用池化窗口为3×3对待池化图片进行池化操作,此时,池化处理装置在R颜色通道上,从右上角开始框定池化窗口大小的数据,并获取池化窗口第一列三个内存地址A1、A2和A3。Exemplarily, as shown in Fig. 2, the picture to be pooled includes data of three color channels of R channel, G channel and B channel, and the pooling processing device needs to use the pooling window to pool the picture to be pooled to 3×3. Operation, at this time, the pooling processing device is on the R color channel, starting from the upper right corner to frame the data of the pooling window size, and obtaining the three memory addresses A1, A2, and A3 in the first column of the pooling window.
本申请实施例中,设利用3×3的池化窗口进行池化操作,且多个内存地址为A1、A2和A3,池化处理装置利用公式(1)获取A1的内存地址、利用公式(2)获取A2的内存地址、利用公式(3)获取A3的内存地址。In the embodiment of the present application, it is assumed that a 3×3 pooling window is used to perform the pooling operation, and the multiple memory addresses are A1, A2, and A3. The pooling processing device uses formula (1) to obtain the memory address of A1 and uses the formula ( 2) Obtain the memory address of A2 and use formula (3) to obtain the memory address of A3.
A1=T+2*channelNums     (1)A1=T+2*channelNums (1)
A2=T+(kernelWidth+2)*channelNums    (2)A2=T+(kernelWidth+2)*channelNums (2)
A3=T+(kernelWidth+kernelWidth+2)*channelNums  (3)A3=T+(kernelWidth+kernelWidth+2)*channelNums (3)
其中,kernelWidth为池化窗口的宽度,取值为3;channelNums为通道数量,取值为3;T为起始地址。Among them, kernelWidth is the width of the pooling window, with a value of 3; channelNums is the number of channels, with a value of 3; T is the starting address.
S102、对多个内存地址依次进行交叉存取,确定出多个内存地址对应的多组内存地址,并从多组内存地址中获取多组数据,多个内存地址中的一个内存地址对应一组内存地址,多组数据中的一组数据的个数由存储单元的位宽确定。S102. Interleave multiple memory addresses in sequence, determine multiple sets of memory addresses corresponding to the multiple memory addresses, and obtain multiple sets of data from the multiple sets of memory addresses, and one memory address of the multiple memory addresses corresponds to one set The memory address, the number of a group of data in multiple groups of data is determined by the bit width of the storage unit.
当池化处理装置在待池化图片的目标通道中分别获取到多个内存地址之后,池化处理装置对多个内存地址依次进行交叉存取,确定出多个内存地址对应的多组内存地址,并从多组内存地址中获取多组数据。After the pooling processing device obtains multiple memory addresses in the target channel of the picture to be pooled, the pooling processing device sequentially interleaves the multiple memory addresses to determine multiple sets of memory addresses corresponding to the multiple memory addresses , And obtain multiple sets of data from multiple sets of memory addresses.
本申请实施例中,池化处理装置根据存储单元的位宽和内存地址中数据的位宽,依次确定出多个内存地址中第一内存地址对应的一组内存地址的第一数量;使用单指令多数据流SIMD扩展结构NEON指令对第一内存地址进行交叉存取,得到一组第一数量的内存地址;根据多个内存地址对应的多个一组第一数量的数据,确定出多组内存地址。In the embodiment of the present application, the pooling processing device sequentially determines the first number of a group of memory addresses corresponding to the first memory address among the multiple memory addresses according to the bit width of the storage unit and the bit width of the data in the memory address; The instruction multi-data stream SIMD extension structure NEON instruction interleaves the first memory address to obtain a group of the first number of memory addresses; multiple groups are determined according to the multiple groups of the first number of data corresponding to the multiple memory addresses Memory address.
本申请实施例中,存储单元为寄存器。In the embodiment of the present application, the storage unit is a register.
本申请实施例中,池化处理装置根据寄存器的位宽和内存地址中数据的位宽,确定出寄存器一次存储内存地址中的数据的第一数量,池化处理装置将根据第一内存地址交叉读取到的一个通道的一组内存地址中的一组数据存储至一个寄存器中,由此,对应RGB三个通道而言,需要三个寄存器分别存放根据第一内存地址交叉读取到的三个通道的三组数据。In the embodiment of the present application, the pooling processing device determines the first amount of data stored in the memory address at a time according to the bit width of the register and the bit width of the data in the memory address, and the pooling processing device will interleave according to the first memory address. A set of data in a set of memory addresses of a channel read is stored in a register. Therefore, corresponding to the three channels of RGB, three registers are required to store the three cross-reads according to the first memory address. Three sets of data for each channel.
示例性的,当使用128位寄存器时,确定出参与运算的数据为32位浮点数,利用寄存器的位宽除以内存地址中数据的位宽,即128除以32,得到寄存器一次存储4个内存地址中的数据。Exemplarily, when a 128-bit register is used, it is determined that the data involved in the operation is a 32-bit floating point number, and the bit width of the register is divided by the bit width of the data in the memory address, that is, 128 is divided by 32, and the register stores 4 at a time The data in the memory address.
本申请实施例中,池化处理装置使用NEON指令的交叉存取vld3q_f32交叉读取第一内存地址在多个通道中的一组内存地址,并从一组内存地址中获取一组数据。In the embodiment of the present application, the pooling processing device uses the interleaving vld3q_f32 of the NEON instruction to interleave a group of memory addresses of the first memory address in multiple channels, and obtain a group of data from the group of memory addresses.
S103、将多组数据划分为多个通道对应的多组待池化数据,多个通道包括目标通道,多个通道中的一个通道对应一组待池化数据。S103. Divide the multiple sets of data into multiple sets of data to be pooled corresponding to multiple channels, the multiple channels include the target channel, and one channel of the multiple channels corresponds to a set of data to be pooled.
当池化处理装置确定出多个内存地址对应的多组内存地址,并从多组内存地址中获取多组数据之后,池化处理装置将多组数据划分为多个通道对应的多组待池化数据。After the pooling processing device determines multiple sets of memory addresses corresponding to the multiple memory addresses, and obtains multiple sets of data from the multiple sets of memory addresses, the pooling processing device divides the multiple sets of data into multiple sets of waiting pools corresponding to multiple channels化数据。 Data.
本申请实施例中,由于池化处理装置对多个内存地址交叉存取得到的多组数据为多个通道中的数据,故,池化处理装置按照多个通道,依次将多组数据中的一组数据划分为一组待池化数据。In the embodiment of this application, since the multiple sets of data obtained by the pooling processing device interleaving multiple memory addresses are data in multiple channels, the pooling processing device sequentially divides the data in the multiple sets of data according to the multiple channels. A group of data is divided into a group of data to be pooled.
示例性的,如图3所示,池化处理装置在确定出A1的内存地址时,池化处理装置利用NEON指令的交叉存取vld3q_f32读取RGB通道中第一行从左到右四个数据,并将R通道中的四个数据存储在R1寄存器中,即R1(1,1,1,1),将G通道的四个数据存储在R2寄存器中,即R2(2,2,2,2),将B通道中的四个数据存储在R3寄存器中,即R3(3,3,3,3);其中,当A1的内存地址为0时,对于NHWC格式交叉读取的12个数值在内存中的地址排布为{0,1,2,3,4,5,6,7,8,9,10,11},对应的12个数值的排布顺序为{1,2,3,1,2,3,1,2,3,1,2,3}。所以R1寄存器四个数值的地址分别为:{0,3,6,9},R2寄存器四个数值为{1,4,7,10},R3寄存器四个数值为{2,5,8,11}。Exemplarily, as shown in FIG. 3, when the pooling processing device determines the memory address of A1, the pooling processing device uses the interleaving vld3q_f32 of the NEON instruction to read the four data from left to right in the first row of the RGB channel. , And store the four data in the R channel in the R1 register, namely R1(1,1,1,1), and store the four data in the G channel in the R2 register, namely R2(2,2,2, 2), store the four data in the B channel in the R3 register, namely R3(3,3,3,3); among them, when the memory address of A1 is 0, the 12 values read across in the NHWC format The address arrangement in the memory is {0,1,2,3,4,5,6,7,8,9,10,11}, and the arrangement order of the corresponding 12 values is {1,2,3 ,1,2,3,1,2,3,1,2,3}. Therefore, the addresses of the four values in the R1 register are: {0,3,6,9}, the four values in the R2 register are {1,4,7,10}, and the four values in the R3 register are {2,5,8, 11}.
进一步地,在池化处理装置将多组数据划分为多个通道对应的多组待池化数据之后,池化处理装置将多个通道对应的多组待池化数据分别存储至多个存储单元中,多个存储单元中的一个存储单元存储一个通道对应的一组待池化数据。Further, after the pooling processing device divides the multiple sets of data into multiple sets of data to be pooled corresponding to the multiple channels, the pooling processing device stores the multiple sets of data to be pooled corresponding to the multiple channels into multiple storage units, respectively , One storage unit among the multiple storage units stores a group of data to be pooled corresponding to one channel.
需要说明的是,对于寄存器而言,可以存储内存地址、或者存储内存地址中的具体数据,具体的根据实际情况进行选择,本申请实施例不做具体的限定。It should be noted that, for the register, a memory address or specific data in a memory address can be stored, and the specific selection is made according to actual conditions, and the embodiment of the present application does not make specific limitations.
示例性的,如图4所示,对A1、A2和A3三个内存地址分别进行交叉存取,得到RGB通道对应的12*3个数值。Exemplarily, as shown in FIG. 4, the three memory addresses of A1, A2, and A3 are respectively interleaved to obtain 12*3 values corresponding to the RGB channels.
S104、按照池化窗口,对多个通道中的任一通道对应的一组待池化数据进行相应的池化处理。S104: According to the pooling window, perform corresponding pooling processing on a group of data to be pooled corresponding to any one of the multiple channels.
当池化处理装置将多组数据划分为多个通道对应的多组待池化数据之后,池化处理装置按照池化窗口,对多个通道中的任一通道对应的一组待池化数据进行相应的池化处理。After the pooling processing device divides multiple sets of data into multiple sets of data to be pooled corresponding to multiple channels, the pooling processing device performs a set of data to be pooled corresponding to any one of the multiple channels according to the pooling window. Carry out the corresponding pooling treatment.
本申请实施例中,池化处理装置按照池化窗口的尺寸,从一组待池化数据中确定出目标池化数据;之后,池化处理装置对目标池化数据进行相应的池化处理,其中,池化处理包括最大池化处理和平均池化处理。In the embodiment of the present application, the pooling processing device determines the target pooling data from a set of data to be pooled according to the size of the pooling window; after that, the pooling processing device performs corresponding pooling processing on the target pooling data, Among them, pooling processing includes maximum pooling processing and average pooling processing.
本申请实施例中,池化处理装置从一组待池化内存地址中,按照池化窗口的尺寸框定出目标池化数据,并对目标池化数据进行最大池化或者平均池化,将一组待池化数据中除目标池化数据外的数据进行剔除处理。In the embodiment of the present application, the pooling processing device defines the target pooling data according to the size of the pooling window from a group of memory addresses to be pooled, and performs maximum pooling or average pooling on the target pooling data, and then Group the data to be pooled except the target pooled data for elimination processing.
可选的,当对一组待池化内存地址进行最大池化处理时,将一组待池化数据中除目标池化数据外的数据设置为最小值,此时,求取最大值时过滤掉了一组待池化数据中除目标池化数据外的数据。Optionally, when the maximum pooling process is performed on a group of to-be-pooled memory addresses, the data in a group of to-be-pooled data except the target pooled data is set to the minimum value. In this case, filter when calculating the maximum value. A set of data to be pooled except the target pooled data is dropped.
示例性的,如图5所示,对于R通道获取到的3×4的12个数值而言,其进行最大池化操作的池化窗口尺寸为3×3,池化处理装置将每一行第4位的数值设置为最小值,-max,如图6所示,之后对3×3的数值求取最大值vmaxq_f32,结果为5。Exemplarily, as shown in FIG. 5, for the 12 values of 3×4 acquired by the R channel, the pooling window size for the maximum pooling operation is 3×3, and the pooling processing device sets each row of The 4-bit value is set to the minimum value, -max, as shown in Figure 6, and then the maximum value vmaxq_f32 is calculated for the 3×3 value, and the result is 5.
本申请实施例中,一个寄存器存储的是一个通道的内存地址中的数据,池化处理装置对每个寄存器中内存地址对应的数据进行池化处理,得到一个通道对应的池化结果。In the embodiment of the present application, a register stores data in a memory address of a channel, and the pooling processing device performs pooling processing on the data corresponding to the memory address in each register to obtain a pooling result corresponding to a channel.
本申请实施例中,池化窗口的尺寸小于或者等于存储单元存储内存地址的存储容量,由此才能得到一个大于或者等于池化窗口的池化数据,进而完成相应的池化操作。In the embodiment of the present application, the size of the pooling window is less than or equal to the storage capacity of the storage memory address of the storage unit, so that a pooled data greater than or equal to the pooling window can be obtained, and the corresponding pooling operation can be completed.
可以理解的是,采用交叉存取的方式对多个内存地址进行读取,得到多组内存地址,之后,将多组内存地址中的多数据划分为多个通道对应的多组待池化数据,一次交叉存取就能获取一个池化所需的待池化数据,进而提高了内存cache的命中率和池化速度。It is understandable that multiple memory addresses are read by interleaving to obtain multiple sets of memory addresses, and then multiple sets of data in multiple sets of memory addresses are divided into multiple sets of data to be pooled corresponding to multiple channels , One-time interleaving can obtain the data to be pooled for pooling, thereby improving the hit rate and pooling speed of the memory cache.
实施例二Example two
本申请实施例提供一种池化处理装置1,如图7所示,该池化处理装置1可以包括:The embodiment of the present application provides a pooling processing device 1. As shown in FIG. 7, the pooling processing device 1 may include:
获取部分10,配置为在待池化图片的目标通道中获取多个内存地址,所述多个内存地址的个数与池化窗口的边长相同,其中,所述待池化图片按照NHWC布局类型进行内存布局,N为图片数量,C为通道数量,H为图片高度,W为图片宽度;The acquiring part 10 is configured to acquire multiple memory addresses in the target channel of the picture to be pooled, the number of the multiple memory addresses is the same as the side length of the pooling window, wherein the picture to be pooled is laid out according to NHWC Type for memory layout, N is the number of pictures, C is the number of channels, H is the height of the picture, and W is the width of the picture;
交叉存取部分11,配置为对所述多个内存地址依次进行交叉存取,确定出所述多个内存地址对应的多组内存地址,并从所述多组内存地址中获取多组数据,所述多个内存地址中的一个内存地址对应一组内存地址,所述多组数据中的一组数据的个数由存储单元的位宽确定;The interleaving part 11 is configured to sequentially interleave the multiple memory addresses, determine multiple sets of memory addresses corresponding to the multiple memory addresses, and obtain multiple sets of data from the multiple sets of memory addresses, One memory address in the multiple memory addresses corresponds to a group of memory addresses, and the number of a group of data in the multiple groups of data is determined by the bit width of the storage unit;
划分部分12,配置为将所述多组数据划分为多个通道对应的多组待池化数据,所述多个通道包括所述目标通道,所述多个通道中的一个通道对应一组待池化数据;The dividing part 12 is configured to divide the multiple sets of data into multiple sets of data to be pooled corresponding to multiple channels, the multiple channels include the target channel, and one channel of the multiple channels corresponds to a set of data to be pooled. Pooled data;
池化部分13,配置为按照所述池化窗口,对所述多个通道中的任一通 道对应的一组待池化数据进行相应的池化处理。The pooling part 13 is configured to perform corresponding pooling processing on a group of data to be pooled corresponding to any one of the multiple channels according to the pooling window.
可选的,所述装置还包括:确定部分14;Optionally, the device further includes: a determining part 14;
所述确定部分14,配置为根据所述存储单元的位宽和内存地址中数据的位宽,依次确定出所述多个内存地址中第一内存地址对应的一组内存地址的第一数量;根据所述多个内存地址对应的多个一组第一数量的数据,确定出所述多组内存地址;The determining part 14 is configured to sequentially determine the first number of a group of memory addresses corresponding to the first memory address among the plurality of memory addresses according to the bit width of the storage unit and the bit width of the data in the memory address; Determine the multiple sets of memory addresses according to multiple sets of first quantity data corresponding to the multiple memory addresses;
所述交叉存取部分11,还配置为使用单指令多数据流指令NEON指令对第一内存地址进行交叉存取,得到一组第一数量的内存地址。The interleaving part 11 is also configured to use a single instruction multiple data stream instruction NEON instruction to perform interleaving access to the first memory address to obtain a set of memory addresses of the first quantity.
可选的,所述装置还包括:存储部分15;Optionally, the device further includes: a storage part 15;
所述存储部分15,配置为将所述多个通道对应的所述多组待池化数据分别存储至多个存储单元中,所述多个存储单元中的一个存储单元存储一个通道对应的一组待池化数据。The storage part 15 is configured to store the plurality of groups of data to be pooled corresponding to the plurality of channels into a plurality of storage units, and one storage unit of the plurality of storage units stores a group corresponding to one channel Data to be pooled.
可选的,所述确定部分14,还配置为按照所述池化窗口的尺寸,从所述一组待池化数据中确定出目标池化数据;Optionally, the determining part 14 is further configured to determine target pooling data from the set of data to be pooled according to the size of the pooling window;
所述池化部分13,还配置为对所述目标池化数据进行相应的池化处理。The pooling part 13 is further configured to perform corresponding pooling processing on the target pooling data.
可选的,所述池化窗口的尺寸小于或者等于所述存储单元存储内存地址的存储容量。所述存储单元为寄存器,Optionally, the size of the pooling window is smaller than or equal to the storage capacity of the memory address of the storage unit. The storage unit is a register,
可选的,所述确定部分14,还配置为根据寄存器的位宽和内存地址中数据的位宽,确定出寄存器一次存储内存地址中的数据的第一数量;Optionally, the determining part 14 is further configured to determine the first amount of data stored in the memory address in the register at one time according to the bit width of the register and the bit width of the data in the memory address;
所述存储部分15,还配置为将根据第一内存地址交叉读取到的一个通道的一组内存地址中的一组数据存储至一个寄存器中。The storage part 15 is also configured to store a group of data in a group of memory addresses of a channel that is cross-read according to the first memory address into a register.
可选的,所述交叉存取部分11,还配置为使用NEON指令的交叉存取vld3q_f32交叉读取第一内存地址在多个通道中的一组内存地址;Optionally, the interleaving part 11 is further configured to interleave a group of memory addresses of the first memory address in multiple channels by using the interleaving vld3q_f32 of the NEON instruction;
所述获取部分10,还配置为从一组内存地址中获取一组数据。The acquiring part 10 is also configured to acquire a group of data from a group of memory addresses.
可选的,所述存储单元为寄存器;所述寄存器存储内存地址或者存储内存地址中的具体数据。Optionally, the storage unit is a register; the register stores a memory address or stores specific data in the memory address.
可选的,所述池化处理包括:最大池化处理和平均池化处理,Optionally, the pooling processing includes: maximum pooling processing and average pooling processing,
所述池化部分13,还配置为对目标池化数据进行最大池化或者平均池化,将一组待池化数据中除目标池化数据外的数据进行剔除处理。The pooling part 13 is also configured to perform maximum pooling or average pooling on the target pooled data, and remove data from a group of to-be-pooled data except for the target pooled data.
本申请实施例提供的一种池化处理装置,在待池化图片的目标通道中获取多个内存地址,多个内存地址的个数与池化窗口的边长相同,其中,待池化图片按照NHWC布局类型进行内存布局,N为图片数量,C为通道数量,H为图片高度,W为图片宽度;对多个内存地址依次进行交叉存取,确定出多个内存地址对应的多组内存地址,并从多组内存地址中获取多组数据,多个内存地址中的一个内存地址对应一组内存地址,多组数据中的一组数据的个数由存储单元的位宽确定;将多组数据划分为多个通道对应的多组待池化数据,多个通道包括目标通道,多个通道中的一个通道对应一组待池化数据;按照池化窗口,对多个通道中的任一通道对应的一组待 池化数据进行相应的池化处理。由此可见,本实施例提出的池化处理装置,采用交叉存取的方式对多个内存地址进行读取,得到多组内存地址,之后,将多组内存地址中的多数据划分为多个通道对应的多组待池化数据,一次交叉存取就能获取一个池化所需的待池化数据,进而提高了内存cache的命中率和池化速度。An embodiment of the present application provides a pooling processing device that acquires multiple memory addresses in a target channel of a picture to be pooled, and the number of the multiple memory addresses is the same as the side length of the pooling window, where the picture to be pooled The memory layout is carried out according to the NHWC layout type, N is the number of pictures, C is the number of channels, H is the picture height, and W is the picture width; multiple memory addresses are sequentially interleaved to determine multiple groups of memory corresponding to multiple memory addresses Address, and obtain multiple sets of data from multiple sets of memory addresses. One memory address in the multiple memory addresses corresponds to a set of memory addresses. The number of a set of data in the multiple sets of data is determined by the bit width of the storage unit; The group data is divided into multiple groups of data to be pooled corresponding to multiple channels. The multiple channels include the target channel, and one channel of the multiple channels corresponds to a group of data to be pooled; according to the pooling window, any of the multiple channels is A group of data to be pooled corresponding to a channel undergoes corresponding pooling processing. It can be seen that the pooling processing device proposed in this embodiment uses interleaving to read multiple memory addresses to obtain multiple sets of memory addresses, and then divide the multiple data in the multiple sets of memory addresses into multiple For multiple sets of data to be pooled corresponding to the channel, one interleaved access can obtain the data to be pooled required for pooling, thereby improving the hit rate and pooling speed of the memory cache.
图8为本申请实施例提供的一种池化处理装置1的组成结构示意图二,在实际应用中,基于上述实施例的同一公开构思下,如图8所示,本实施例的池化处理装置1包括:处理器16、存储器17及通信总线18。FIG. 8 is a second schematic diagram of the composition structure of a pooling processing device 1 provided by an embodiment of the application. In practical applications, based on the same disclosure concept of the above-mentioned embodiment, as shown in FIG. 8, the pooling processing of this embodiment is The device 1 includes a processor 16, a memory 17 and a communication bus 18.
在具体的实施例的过程中,上述获取单元10、交叉存取单元11、划分单元12、池化单元13、确定单元14可由位于池化处理装置1上的处理器16实现,存储单元15可由位于池化处理装置1上的存储器17实现,上述处理器16可以为特定用途集成电路(ASIC,Application Specific Integrated Circuit)、数字信号处理器(DSP,Digital Signal Processor)、数字信号处理池化处理装置(DSPD,Digital Signal Processing Device)、可编程逻辑池化处理装置(PLD,Programmable Logic Device)、现场可编程门阵列(FPGA,Field Programmable Gate Array)、CPU、控制器、微控制器、微处理器中的至少一种。可以理解地,对于不同的设备,用于实现上述处理器功能的电子器件还可以为其它,本实施例不作具体限定。In the process of a specific embodiment, the acquisition unit 10, the interleaving unit 11, the dividing unit 12, the pooling unit 13, and the determining unit 14 may be implemented by the processor 16 located on the pooling processing device 1, and the storage unit 15 may be The memory 17 located on the pooling processing device 1 is implemented. The above-mentioned processor 16 may be an Application Specific Integrated Circuit (ASIC), a digital signal processor (DSP, Digital Signal Processor), or a digital signal processing pooling processing device. (DSPD, Digital Signal Processing Device), Programmable Logic Pooling Processing Device (PLD, Programmable Logic Device), Field Programmable Gate Array (FPGA, Field Programmable Gate Array), CPU, controller, microcontroller, microprocessor At least one of them. It can be understood that, for different devices, the electronic devices used to implement the above-mentioned processor functions may also be other, which is not specifically limited in this embodiment.
在本申请实施例中,上述通信总线18用于实现处理器16和存储器17之间的连接通信;上述处理器16执行存储器17中存储的运行程序时实现如实施例一所述的池化处理方法。In the embodiment of the present application, the above-mentioned communication bus 18 is used to realize the connection and communication between the processor 16 and the memory 17; when the above-mentioned processor 16 executes the operating program stored in the memory 17, the pooling process as described in the first embodiment is implemented. method.
本申请实施例提供一种存储介质,其上存储有计算机程序,上述计算机可读存储介质存储有一个或者多个程序,上述一个或者多个程序可被一个或者多个处理器执行,应用于池化处理装置中,该计算机程序实现如实施例一所述的池化处理方法。The embodiments of the present application provide a storage medium on which a computer program is stored, and the above-mentioned computer-readable storage medium stores one or more programs, and the above-mentioned one or more programs can be executed by one or more processors and applied to the pool. In the chemical processing device, the computer program implements the pool processing method as described in the first embodiment.
以上所述,仅为本申请的较佳实施例而已,并非用于限定本申请的保护范围。The above are only preferred embodiments of the present application, and are not used to limit the protection scope of the present application.
工业实用性Industrial applicability
本申请实施例提供了一种池化处理方法及装置、存储介质,该方法包括:在待池化图片的目标通道中获取多个内存地址,多个内存地址的个数与池化窗口的边长相同,其中,待池化图片按照NHWC布局类型进行内存布局,N为图片数量,C为通道数量,H为图片高度,W为图片宽度;对多个内存地址依次进行交叉存取,确定出多个内存地址对应的多组内存地址,并从多组内存地址中获取多组数据,多个内存地址中的一个内存地址对应一组内存地址,多组数据中的一组数据的个数由存储单元的位宽确定;将多组数据划分为多个通道对应的多组待池化数据,多个通道包括目标通 道,多个通道中的一个通道对应一组待池化数据;按照池化窗口,对多个通道中的任一通道对应的一组待池化数据进行相应的池化处理。采用上述实现方案,采用交叉存取的方式对多个内存地址进行读取,得到多组内存地址,之后,将多组内存地址中的多数据划分为多个通道对应的多组待池化数据,一次交叉存取就能获取一个池化所需的待池化数据,进而提高了内存cache的命中率和池化速度。The embodiments of the present application provide a pooling processing method and device, and a storage medium. The method includes: acquiring multiple memory addresses in a target channel of a picture to be pooled, the number of multiple memory addresses, and the margins of the pooling window. The length is the same, where the pictures to be pooled are arranged in memory according to the NHWC layout type, N is the number of pictures, C is the number of channels, H is the picture height, and W is the picture width; multiple memory addresses are sequentially interleaved to determine Multiple sets of memory addresses correspond to multiple sets of memory addresses, and multiple sets of data are obtained from multiple sets of memory addresses. One memory address of multiple sets of memory addresses corresponds to one set of memory addresses. The number of sets of data in multiple sets of data is determined by The bit width of the storage unit is determined; multiple sets of data are divided into multiple sets of data to be pooled corresponding to multiple channels, multiple channels include the target channel, and one channel of the multiple channels corresponds to a set of data to be pooled; according to pooling Window, a group of data to be pooled corresponding to any one of the multiple channels is pooled accordingly. Using the above implementation scheme, multiple memory addresses are read by interleaving to obtain multiple sets of memory addresses. After that, the multiple data in the multiple sets of memory addresses are divided into multiple sets of data to be pooled corresponding to multiple channels , One-time interleaving can obtain the data to be pooled for pooling, thereby improving the hit rate and pooling speed of the memory cache.

Claims (20)

  1. 一种池化处理方法,所述方法包括:A pooling processing method, the method includes:
    在待池化图片的目标通道中获取多个内存地址,所述多个内存地址的个数与池化窗口的边长相同,其中,所述待池化图片按照NHWC布局类型进行内存布局,N为图片数量,C为通道数量,H为图片高度,W为图片宽度;Acquire multiple memory addresses in the target channel of the picture to be pooled, and the number of the multiple memory addresses is the same as the side length of the pooling window, where the picture to be pooled performs memory layout according to the NHWC layout type, N Is the number of pictures, C is the number of channels, H is the height of the picture, and W is the width of the picture;
    对所述多个内存地址依次进行交叉存取,确定出所述多个内存地址对应的多组内存地址,并从所述多组内存地址中获取多组数据,所述多个内存地址中的一个内存地址对应一组内存地址,所述多组数据中的一组数据的个数由存储单元的位宽确定;The multiple memory addresses are sequentially interleaved to determine multiple sets of memory addresses corresponding to the multiple memory addresses, and multiple sets of data are obtained from the multiple sets of memory addresses. One memory address corresponds to a set of memory addresses, and the number of one set of data in the multiple sets of data is determined by the bit width of the storage unit;
    将所述多组数据划分为多个通道对应的多组待池化数据,所述多个通道包括所述目标通道,所述多个通道中的一个通道对应一组待池化数据;Dividing the multiple sets of data into multiple sets of data to be pooled corresponding to multiple channels, the multiple channels include the target channel, and one channel of the multiple channels corresponds to a set of data to be pooled;
    按照所述池化窗口,对所述多个通道中的任一通道对应的一组待池化数据进行相应的池化处理。According to the pooling window, corresponding pooling processing is performed on a group of data to be pooled corresponding to any one of the multiple channels.
  2. 根据权利要求1所述的方法,其中,所述对所述多个内存地址依次进行交叉存取,确定出所述多个内存地址对应的多组内存地址,包括:The method according to claim 1, wherein the sequentially interleaving the multiple memory addresses to determine multiple groups of memory addresses corresponding to the multiple memory addresses comprises:
    根据所述存储单元的位宽和内存地址中数据的位宽,依次确定出所述多个内存地址中第一内存地址对应的一组内存地址的第一数量;According to the bit width of the storage unit and the bit width of the data in the memory address, sequentially determining the first number of a group of memory addresses corresponding to the first memory address among the plurality of memory addresses;
    使用单指令、多数据流SIMD扩展结构NEON指令对第一内存地址进行交叉存取,得到一组第一数量的内存地址;Use single instruction, multiple data stream SIMD extended structure NEON instruction to interleave the first memory address to obtain a set of memory addresses of the first quantity;
    根据所述多个内存地址对应的多个一组第一数量的数据,确定出所述多组内存地址。The multiple sets of memory addresses are determined according to multiple sets of first quantity data corresponding to the multiple memory addresses.
  3. 根据权利要求1或2所述的方法,其中,所述将所述多组数据划分为多个通道对应的多组待池化数据之后,所述方法还包括:The method according to claim 1 or 2, wherein after the dividing the multiple sets of data into multiple sets of data to be pooled corresponding to multiple channels, the method further comprises:
    将所述多个通道对应的所述多组待池化数据分别存储至多个存储单元中,所述多个存储单元中的一个存储单元存储一个通道对应的一组待池化数据。The multiple sets of data to be pooled corresponding to the multiple channels are respectively stored in multiple storage units, and one storage unit of the multiple storage units stores a set of data to be pooled corresponding to one channel.
  4. 根据权利要求1所述的方法,其中,所述按照所述池化窗口,对所述多个通道中的任一通道对应的一组待池化数据进行相应的池化处理,包括:The method according to claim 1, wherein, according to the pooling window, performing corresponding pooling processing on a group of data to be pooled corresponding to any one of the plurality of channels comprises:
    按照所述池化窗口的尺寸,从所述一组待池化数据中确定出目标池化数据;Determining target pooling data from the set of data to be pooled according to the size of the pooling window;
    对所述目标池化数据进行相应的池化处理。Perform corresponding pooling processing on the target pooling data.
  5. 根据权利要求1或4所述的方法,其中,所述池化窗口的尺寸小于或者等于所述存储单元存储内存地址的存储容量。The method according to claim 1 or 4, wherein the size of the pooling window is smaller than or equal to the storage capacity of the memory address of the storage unit.
  6. 根据权利要求2所述的方法,其中,所述存储单元为寄存器,所述 根据所述存储单元的位宽和内存地址中数据的位宽,依次确定出所述多个内存地址中第一内存地址对应的一组内存地址的第一数量,包括:The method according to claim 2, wherein the storage unit is a register, and the first memory address among the plurality of memory addresses is sequentially determined according to the bit width of the storage unit and the bit width of the data in the memory address. The first number of a group of memory addresses corresponding to the address, including:
    根据寄存器的位宽和内存地址中数据的位宽,确定出寄存器一次存储内存地址中的数据的第一数量;According to the bit width of the register and the bit width of the data in the memory address, determine the first amount of data stored in the memory address at a time by the register;
    将根据第一内存地址交叉读取到的一个通道的一组内存地址中的一组数据存储至一个寄存器中。Store a group of data in a group of memory addresses of a channel cross-read according to the first memory address into a register.
  7. 根据权利要求2所述的方法,其中,所述使用单指令、多数据流SIMD扩展结构NEON指令对第一内存地址进行交叉存取,得到一组第一数量的内存地址,包括:The method according to claim 2, wherein said using a single instruction, multiple data stream SIMD extended structure NEON instruction to interleave the first memory address to obtain a set of memory addresses of the first quantity comprises:
    使用NEON指令的交叉存取vld3q_f32交叉读取第一内存地址在多个通道中的一组内存地址;Use NEON instruction interleaving vld3q_f32 interleaving to read a group of memory addresses of the first memory address in multiple channels;
    相应的,所述方法还包括:Correspondingly, the method further includes:
    从一组内存地址中获取一组数据。Get a set of data from a set of memory addresses.
  8. 根据权利要求3所述的方法,其中,所述存储单元为寄存器;所述寄存器存储内存地址或者存储内存地址中的具体数据。The method according to claim 3, wherein the storage unit is a register; the register stores a memory address or stores specific data in the memory address.
  9. 根据权利要求4所述的方法,其中,所述池化处理包括:最大池化处理和平均池化处理;所述对所述目标池化数据进行相应的池化处理,包括:The method according to claim 4, wherein the pooling processing includes: maximum pooling processing and average pooling processing; and performing corresponding pooling processing on the target pooling data includes:
    对目标池化数据进行最大池化或者平均池化,将一组待池化数据中除目标池化数据外的数据进行剔除处理。Maximum pooling or average pooling is performed on the target pooled data, and data other than the target pooled data in a set of to-be-pooled data is eliminated.
  10. 一种池化处理装置,所述装置包括:A pooling treatment device, the device comprising:
    获取部分,配置为在待池化图片的目标通道中获取多个内存地址,所述多个内存地址的个数与池化窗口的边长相同,其中,所述待池化图片按照NHWC布局类型进行内存布局,N为图片数量,C为通道数量,H为图片高度,W为图片宽度;The acquiring part is configured to acquire multiple memory addresses in the target channel of the picture to be pooled, and the number of the multiple memory addresses is the same as the side length of the pooling window, wherein the picture to be pooled is in accordance with the NHWC layout type Perform memory layout, N is the number of pictures, C is the number of channels, H is the height of the picture, and W is the width of the picture;
    交叉存取部分,配置为对所述多个内存地址依次进行交叉存取,确定出所述多个内存地址对应的多组内存地址,并从所述多组内存地址中获取多组数据,所述多个内存地址中的一个内存地址对应一组内存地址,所述多组数据中的一组数据的个数由存储单元的位宽确定;The interleaving part is configured to sequentially interleave the multiple memory addresses, determine multiple sets of memory addresses corresponding to the multiple memory addresses, and obtain multiple sets of data from the multiple sets of memory addresses, so One memory address in the multiple memory addresses corresponds to a group of memory addresses, and the number of one group of data in the multiple groups of data is determined by the bit width of the storage unit;
    划分部分,配置为将所述多组数据划分为多个通道对应的多组待池化数据,所述多个通道包括所述目标通道,所述多个通道中的一个通道对应一组待池化数据;The dividing part is configured to divide the multiple sets of data into multiple sets of data to be pooled corresponding to multiple channels, the multiple channels include the target channel, and one channel of the multiple channels corresponds to a group of to be pooled化 data;
    池化部分,配置为按照所述池化窗口,对所述多个通道中的任一通道对应的一组待池化数据进行相应的池化处理。The pooling part is configured to perform corresponding pooling processing on a group of data to be pooled corresponding to any one of the multiple channels according to the pooling window.
  11. 根据权利要求10所述的装置,其中,所述装置还包括:确定部分;The device according to claim 10, wherein the device further comprises: a determining part;
    所述确定部分,配置为根据所述存储单元的位宽和内存地址中数据的位宽,依次确定出所述多个内存地址中第一内存地址对应的一组内存地址的第一数量;根据所述多个内存地址对应的多个一组第一数量的数据,确 定出所述多组内存地址;The determining part is configured to sequentially determine the first number of a group of memory addresses corresponding to the first memory address among the plurality of memory addresses according to the bit width of the storage unit and the bit width of the data in the memory address; according to Multiple sets of data of the first quantity corresponding to the multiple memory addresses determine the multiple sets of memory addresses;
    所述交叉存取部分,还配置为使用NEON指令对第一内存地址进行交叉存取,得到一组第一数量的内存地址。The interleaving part is also configured to use the NEON instruction to interleave the first memory address to obtain a set of memory addresses of the first number.
  12. 根据权利要求10或11所述的装置,其中,所述装置还包括:存储部分;The device according to claim 10 or 11, wherein the device further comprises: a storage part;
    所述存储部分,配置为将所述多个通道对应的所述多组待池化数据分别存储至多个存储单元中,所述多个存储单元中的一个存储单元存储一个通道对应的一组待池化数据。The storage part is configured to store the multiple sets of data to be pooled corresponding to the multiple channels in multiple storage units, and one storage unit of the multiple storage units stores a set of to-be-pooled data corresponding to one channel. Pooled data.
  13. 根据权利要求10所述的装置,其中,The device according to claim 10, wherein:
    所述确定部分,还配置为按照所述池化窗口的尺寸,从所述一组待池化数据中确定出目标池化数据;The determining part is further configured to determine target pooling data from the group of data to be pooled according to the size of the pooling window;
    所述池化部分,还配置为对所述目标池化数据进行相应的池化处理。The pooling part is further configured to perform corresponding pooling processing on the target pooling data.
  14. 根据权利要求10或13所述的装置,其中,所述池化窗口的尺寸小于或者等于所述存储单元存储内存地址的存储容量。The device according to claim 10 or 13, wherein the size of the pooling window is smaller than or equal to the storage capacity of the memory address of the storage unit.
  15. 根据权利要求10所述的装置,其中,所述存储单元为寄存器,The device according to claim 10, wherein the storage unit is a register,
    所述确定部分,还配置为根据寄存器的位宽和内存地址中数据的位宽,确定出寄存器一次存储内存地址中的数据的第一数量;The determining part is further configured to determine the first amount of data stored in the memory address in the register at one time according to the bit width of the register and the bit width of the data in the memory address;
    所述存储部分,还配置为将根据第一内存地址交叉读取到的一个通道的一组内存地址中的一组数据存储至一个寄存器中。The storage part is also configured to store a group of data in a group of memory addresses of a channel that is cross-read according to the first memory address into a register.
  16. 根据权利要求11所述的装置,其中,The device according to claim 11, wherein:
    所述交叉存取部分,还配置为使用NEON指令的交叉存取vld3q_f32交叉读取第一内存地址在多个通道中的一组内存地址;The interleaving part is also configured to use the interleaving vld3q_f32 of the NEON instruction to interleavingly read a group of memory addresses of the first memory address in multiple channels;
    所述获取部分,还配置为从一组内存地址中获取一组数据。The acquiring part is also configured to acquire a group of data from a group of memory addresses.
  17. 根据权利要求12所述的装置,其中,所述存储单元为寄存器;所述寄存器存储内存地址或者存储内存地址中的具体数据。The device according to claim 12, wherein the storage unit is a register; the register stores a memory address or stores specific data in the memory address.
  18. 根据权利要求13所述的装置,其中,所述池化处理包括:最大池化处理和平均池化处理,The device according to claim 13, wherein the pooling processing comprises: maximum pooling processing and average pooling processing,
    所述池化部分,还配置为对目标池化数据进行最大池化或者平均池化,将一组待池化数据中除目标池化数据外的数据进行剔除处理。The pooling part is also configured to perform maximum pooling or average pooling on the target pooled data, and remove data from a group of to-be-pooled data except for the target pooled data.
  19. 一种池化处理装置,所述池化处理装置包括:处理器、存储器及通信总线;所述处理器执行存储器存储的运行程序时实现如权利要求1-9任一项所述的方法。A pooling processing device, the pooling processing device comprising: a processor, a memory, and a communication bus; when the processor executes an operating program stored in the memory, the method according to any one of claims 1-9 is implemented.
  20. 一种存储介质,其上存储有计算机程序,应配置为池化处理装置,该计算机程序被处理器执行时实现如权利要求1-9任一项所述的方法。A storage medium on which a computer program is stored, and should be configured as a pooling processing device. When the computer program is executed by a processor, the method according to any one of claims 1-9 is realized.
PCT/CN2020/111277 2019-08-27 2020-08-26 Pooling processing method and apparatus, and storage medium WO2021037042A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910797622.4A CN110516793B (en) 2019-08-27 2019-08-27 Pooling processing method and device and storage medium
CN201910797622.4 2019-08-27

Publications (1)

Publication Number Publication Date
WO2021037042A1 true WO2021037042A1 (en) 2021-03-04

Family

ID=68627315

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/111277 WO2021037042A1 (en) 2019-08-27 2020-08-26 Pooling processing method and apparatus, and storage medium

Country Status (2)

Country Link
CN (1) CN110516793B (en)
WO (1) WO2021037042A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110516793B (en) * 2019-08-27 2022-06-17 Oppo广东移动通信有限公司 Pooling processing method and device and storage medium
CN111506520B (en) * 2020-07-01 2020-09-22 腾讯科技(深圳)有限公司 Address generation method, related device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107506822A (en) * 2017-07-26 2017-12-22 天津大学 A kind of deep neural network method based on Space integration pond
US20180232629A1 (en) * 2017-02-10 2018-08-16 Kneron, Inc. Pooling operation device and method for convolutional neural network
CN109165733A (en) * 2018-07-11 2019-01-08 中国人民解放军国防科技大学 Multi-input multi-output matrix maximum pooling vectorization implementation method
CN109902804A (en) * 2017-08-31 2019-06-18 北京中科寒武纪科技有限公司 A kind of convolution algorithm method and device
CN110516793A (en) * 2019-08-27 2019-11-29 Oppo广东移动通信有限公司 A kind of pond processing method and processing device, storage medium

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9996350B2 (en) * 2014-12-27 2018-06-12 Intel Corporation Hardware apparatuses and methods to prefetch a multidimensional block of elements from a multidimensional array
US10489703B2 (en) * 2015-05-20 2019-11-26 Nec Corporation Memory efficiency for convolutional neural networks operating on graphics processing units
US20170177352A1 (en) * 2015-12-18 2017-06-22 Intel Corporation Instructions and Logic for Lane-Based Strided Store Operations
US10338920B2 (en) * 2015-12-18 2019-07-02 Intel Corporation Instructions and logic for get-multiple-vector-elements operations
CN109389215B (en) * 2017-08-03 2020-07-31 杭州海康威视数字技术股份有限公司 Network structure determination method and device of deep learning network
CN109754359B (en) * 2017-11-01 2021-12-07 腾讯科技(深圳)有限公司 Pooling processing method and system applied to convolutional neural network
US11061402B2 (en) * 2017-11-15 2021-07-13 Uatc, Llc Sparse convolutional neural networks
US10779186B2 (en) * 2017-12-01 2020-09-15 At&T Intellectual Property I, L.P. Dynamic access slice pooling and software defined network controlled capabilities

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180232629A1 (en) * 2017-02-10 2018-08-16 Kneron, Inc. Pooling operation device and method for convolutional neural network
CN107506822A (en) * 2017-07-26 2017-12-22 天津大学 A kind of deep neural network method based on Space integration pond
CN109902804A (en) * 2017-08-31 2019-06-18 北京中科寒武纪科技有限公司 A kind of convolution algorithm method and device
CN109165733A (en) * 2018-07-11 2019-01-08 中国人民解放军国防科技大学 Multi-input multi-output matrix maximum pooling vectorization implementation method
CN110516793A (en) * 2019-08-27 2019-11-29 Oppo广东移动通信有限公司 A kind of pond processing method and processing device, storage medium

Also Published As

Publication number Publication date
CN110516793A (en) 2019-11-29
CN110516793B (en) 2022-06-17

Similar Documents

Publication Publication Date Title
WO2021037042A1 (en) Pooling processing method and apparatus, and storage medium
TWI796490B (en) Providing multi-element multi-vector (memv) register file access in vector-processor-based devices
US11436017B2 (en) Data temporary storage apparatus, data temporary storage method and operation method
EP3414909B1 (en) A method for enabling processing of a video stream and a device thereof
JP2009524127A5 (en)
US20200012694A1 (en) Apparatus and method for searching linked lists
US10169295B2 (en) Convolution operation device and method
US10929965B2 (en) Histogram statistics circuit and multimedia processing system
CN111091572A (en) Image processing method and device, electronic equipment and storage medium
CN107667355A (en) The translation cache of MMU (MMU) subregion, and relevant device, method and computer-readable media are provided
CN111984189A (en) Neural network computing device, data reading method, data storage method and related equipment
US20070022261A1 (en) Method of interleaving asymmetric memory arrays
WO2015043445A1 (en) Method and device for correlating virtual large page and physical large page
WO2022068328A1 (en) Data migration method and apparatus, and processor and calculation device
TWI696949B (en) Direct memory access method, device, dedicated computing chip and heterogeneous computing system
US20200117989A1 (en) Memory chip capable of performing artificial intelligence operation and method thereof
CN107766021B (en) Image processing method, image processing apparatus, display system, and storage medium
US20200327638A1 (en) Connected component detection method, circuit, device and computer-readable storage medium
US8135229B1 (en) Image processing method and device
JP2020191012A (en) Image processing apparatus, imaging apparatus, and image processing method
WO2019114044A1 (en) Image processing method and device, electronic apparatus, and computer readable storage medium
CN101996390B (en) Image copying method and device
US9792988B2 (en) Parallel turbine ternary content addressable memory for high-speed applications
TWI765446B (en) Pipelining data transmission method and data pipeline device
JPH07234948A (en) Picture processor

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20859090

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20859090

Country of ref document: EP

Kind code of ref document: A1