WO2022160703A1 - Pooling method, and chip, device and storage medium - Google Patents

Pooling method, and chip, device and storage medium Download PDF

Info

Publication number
WO2022160703A1
WO2022160703A1 PCT/CN2021/115667 CN2021115667W WO2022160703A1 WO 2022160703 A1 WO2022160703 A1 WO 2022160703A1 CN 2021115667 W CN2021115667 W CN 2021115667W WO 2022160703 A1 WO2022160703 A1 WO 2022160703A1
Authority
WO
WIPO (PCT)
Prior art keywords
pooling
shift register
feature map
register
sub
Prior art date
Application number
PCT/CN2021/115667
Other languages
French (fr)
Chinese (zh)
Inventor
周军
常亮
周亮
杨雨桐
Original Assignee
成都商汤科技有限公司
电子科技大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 成都商汤科技有限公司, 电子科技大学 filed Critical 成都商汤科技有限公司
Publication of WO2022160703A1 publication Critical patent/WO2022160703A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map

Definitions

  • the present application relates to computer technology, and in particular to a pooling method, chip, device and storage medium.
  • Pooling refers to down-sampling the input feature map, reducing the number of features and simplifying the computational complexity of convolutional networks while maintaining the invariance of features in certain dimensions (eg, rotation, translation, scaling). .
  • Pooling processing usually needs to rely on artificial intelligence chips (hereinafter referred to as AI chips).
  • AI chips artificial intelligence chips
  • the present application discloses at least one pooling method, and the method may include: acquiring a target feature map; splitting the above target feature map to obtain several sub-feature maps; wherein, the above target feature maps are in the same pooling At least some of the pixel values in the window are in different sub-feature maps respectively, and the pixel values in the same position in each pooling window are in the same sub-feature map; the pixels belonging to different pooling windows in each sub-feature map are processed in parallel to obtain the above The pooling result corresponding to the target feature map.
  • the above-mentioned parallel processing of pixels belonging to different pooling windows in each sub-feature map to obtain a pooling result corresponding to the above-mentioned target feature map includes: It is loaded into the shift register array, and according to the pooling instruction, the pixel values in the same position in the above-mentioned several sub-feature maps are pooled in parallel to obtain the pooling result corresponding to the above-mentioned target feature map.
  • the above-mentioned parallel processing of pixels belonging to different pooling windows in each sub-feature map to obtain a pooling result corresponding to the above-mentioned target feature map includes: according to the fact that each sub-feature map is in the same pooling window The position of the pixel value of the sub-feature map is determined, and the shift operation mode of the shift register array corresponding to each sub-feature map is determined; the above-mentioned sub-feature maps are respectively loaded into the shift register array, and for each sub-feature map, according to The shift operation mode of the shift register array determined by the sub-feature map performs a shift operation on the pixel values stored in the shift registers in the above-mentioned shift register array. Partial pooling results of different pooling windows; according to the partial pooling results of each sub-feature map corresponding to different pooling windows, determine the pooling results corresponding to the above target feature maps.
  • the above-mentioned splitting the above-mentioned target feature map to obtain several sub-feature maps includes: determining the pixel values at odd-numbered rows and odd-numbered column positions in the above-mentioned target feature map as the first sub-feature map ; The pixel value in the odd row, the even column position in the above-mentioned target feature map is determined as the second sub-feature map; The pixel value in the even row, the odd column position in the above-mentioned target feature map is determined as the third sub-feature map; The pixel values in the even-numbered rows and even-numbered columns in the above target feature map are determined as the fourth sub-feature map.
  • the pixel values respectively included in the several sub-feature maps are loaded into the shift register array, and the pixel values in the same position in the several sub-feature maps are pooled in parallel according to the pooling instruction processing to obtain a pooling result corresponding to the target feature map, including: moving each pixel value included in the first sub-feature map to at least part of the shift registers included in the shift register array; Each pixel value included in the feature map is respectively moved to the above-mentioned partial shift register, so that the calculation kernel corresponding to each shift register in the partial shift register pools the received two pixel values according to the above-mentioned pooling instruction.
  • the first pooling processing result is obtained; the pixel values included in the third sub-feature map are respectively moved to the partial shift registers, so that the calculation corresponding to each shift register in the partial shift registers
  • the kernel performs pooling processing on the above-mentioned first pooling processing result and the received pixel value according to the above-mentioned pooling instruction to obtain the second pooling processing result; each pixel value included in the above-mentioned fourth sub-feature map is respectively moved to the above-mentioned part In the shift register, so that the calculation kernel corresponding to each shift register in the partial shift register performs the pooling process on the above-mentioned second pooling processing result and the received pixel value according to the above-mentioned pooling instruction, and obtains the third. Pooling processing result; outputting a third pooling processing result obtained by performing pooling processing on the computing cores corresponding to each of the shift registers in the partial shift registers, and obtaining a pooling result corresponding to the above target feature map.
  • the above-mentioned pooling processing includes maximum pooling processing; the above-mentioned pooling instruction includes comparing the maximum value between the two; the above-mentioned moving each pixel value included in the above-mentioned first sub-feature map to the above-mentioned At least part of the shift registers included in the shift register array includes: moving each pixel value included in the first sub-feature map to the first registers of at least part of the shift registers included in the shift register array; Each pixel value included in the above-mentioned second sub-feature map is respectively moved to the above-mentioned partial shift register, so that the calculation kernel corresponding to each of the shift registers in the partial shift register, according to the above-mentioned pooling instruction, The pixel values are pooled to obtain a first pooling result, which includes: moving each pixel value included in the second sub-feature map to the second register of the partial shift register, so that the partial shift register and the partial shift register are respectively moved.
  • the pixel values included in the third sub-feature map are respectively moved to the partial shift registers, so that the calculation kernel corresponding to each shift register in the partial shift registers According to the above-mentioned pooling instruction, performing pooling processing on the above-mentioned first pooling processing result and the received pixel values to obtain a second pooling processing result, including: moving each pixel value included in the above-mentioned third sub-feature map to the above-mentioned In the second register of the partial shift register, so that the computing kernel corresponding to each shift register in the partial shift register obtains the value stored in the first register and the second register according to the pooling instruction.
  • the maximum value is stored in the above-mentioned first register as the above-mentioned second pooling processing result; the above-mentioned each pixel value included in the above-mentioned fourth sub-feature map is respectively moved to the above-mentioned part of the shift register, so that the The computing kernel corresponding to each shift register in the partial shift registers performs pooling processing on the above-mentioned second pooling processing result and the received pixel value according to the above-mentioned pooling instruction to obtain a third pooling processing result, including: Each pixel value included in the above-mentioned fourth sub-feature map is respectively moved to the second register of the above-mentioned partial shift register, so that the calculation kernel corresponding to each of the shift registers in the partial shift register obtains according to the above-mentioned pooling instruction.
  • the maximum value among the values stored in the first register and the second register is stored in the first register as the third pooling processing result.
  • the above-mentioned output is the third pooling processing result obtained by performing pooling processing on the computing kernels corresponding to each of the shift registers in the partial shift registers, to obtain a pool corresponding to the above-mentioned target feature map.
  • the pooling result includes: outputting the value stored in the first register of the partial shift register to obtain the pooling result corresponding to the target feature map.
  • the above-mentioned parallel processing of pixels belonging to different pooling windows in each sub-feature map to obtain a pooling result corresponding to the above-mentioned target feature map includes: combining each pixel included in the above-mentioned first sub-feature map The values are respectively moved to at least part of the shift registers included in the above-mentioned shift register array; according to the above-mentioned pooling instruction, the pixel values in any four up, down, left, and right adjacent shift registers included in the above-mentioned shift register array are pooled.
  • the pixel values included in the figure are respectively moved to the above-mentioned partial shift registers; according to the above-mentioned pooling instruction, the pixel values in any two up and down adjacent shift registers included in the above-mentioned shift register array are pooled to obtain The second part of the pooling processing result is stored in the above-mentioned target shift register; the pixel values included in the above-mentioned third sub-feature map are respectively moved to the above-mentioned partial shift register; according to the above The pooling instruction performs a pooling operation on the pixel values in any two left and right adjacent shift registers included in the above-mentioned shift register array to obtain a third part of the pooling processing result, and the above-mentioned third part of the pooling processing result Store to the above-ment
  • the second partial pooling processing result the third partial pooling processing result in each target shift register included in the above-mentioned shift register array, and the above-mentioned fourth sub-feature map, the above-mentioned target feature is obtained.
  • the above-mentioned pooling processing includes maximum pooling processing; the above-mentioned pooling instruction includes comparing the maximum value between the two; the above-mentioned preset position includes any four adjacent shift registers in the upper, lower, left and right.
  • the above-mentioned target shift register includes the shift register at the lower left corner position in the above-mentioned four adjacent shift registers; the above-mentioned each pixel value included in the above-mentioned first sub-feature map is respectively moved to the above-mentioned shift
  • At least part of the shift registers included in the register array include: moving each pixel value included in the first sub-feature map to the first registers of at least part of the shift registers included in the shift register array; The first part of the pooling processing result is obtained, and the above-mentioned first part of the pooling processing result is stored in the In the target shift register in the preset position among the above-mentioned four adjacent shift registers, including: moving the numerical value in the first register of each first shift register in the above-mentioned partial shift registers to the first shift register.
  • moving each pixel value included in the second sub-feature map to the partial shift register includes: moving each pixel value included in the second sub-feature map to the above-mentioned partial shift register.
  • the above-mentioned pooling operation is performed on the pixel values in any two upper and lower adjacent shift registers included in the above-mentioned shift register array according to the above-mentioned pooling instruction, and the second partial pooling is obtained.
  • the above-mentioned moving each pixel value included in the third sub-feature map to the partial shift register includes: moving each pixel value included in the third sub-feature map to the above-mentioned In the third register of the partial shift register; the above-mentioned pooling operation is performed on the pixel values in any two left and right adjacent shift registers included in the above-mentioned shift register array according to the above-mentioned pooling instruction, so as to obtain a third partial pooling operation.
  • moving each pixel value included in the fourth sub-feature map to the partial shift register includes: moving each pixel value included in the fourth sub-feature map to the aforementioned part In the fourth register of the shift register; the above-mentioned moving the pixel values in each of the shift registers in the above-mentioned partial shift registers to the above-mentioned target shift register includes: The values in the four registers are moved to the fourth register of the target shift register below the first shift register.
  • the first partial pooling processing result, the second partial pooling processing result, the third partial pooling processing result and the above-mentioned first partial pooling processing result in each target shift register included in the above-mentioned shift register array The pixel values corresponding to the four sub-feature maps are obtained to obtain the pooling result corresponding to the above target feature map, including: storing the larger value in the first register and the second register in each target shift register into each target shift register in the first register of each target shift register; store the larger value of the first register in each target shift register and the third register in the first register of each target shift register; store the first register in each target shift register
  • the larger value in the fourth register is stored in the first register of each target shift register; the value stored in the first register of each target shift register is output to obtain the pooling result corresponding to the above target feature map.
  • a plurality of temporary registers are connected to the periphery of the above-mentioned shift register array; the above-mentioned temporary registers are used to store the pixel values that overflow the above-mentioned shift register array when performing a numerical value transfer operation.
  • the number of pixels included in at least some of the sub-feature maps in the above-mentioned several sub-feature maps is consistent with the number of shift registers included in the above-mentioned shift register array.
  • the present application also proposes a pooling method, which may include: obtaining an original feature map; dividing the original feature map into several target feature maps; pooling each target feature map according to the pooling method shown in any of the foregoing embodiments After processing, the pooling results corresponding to each target feature map are obtained; the pooling results corresponding to each target feature map are output to obtain the pooling results corresponding to the above-mentioned original feature maps.
  • the application also proposes a chip, the chip may include a controller; the controller is used to obtain a target feature map; the target feature map is split to obtain several sub-feature maps; wherein, the target feature maps are in the same At least part of the pixel values in the pooling window are in different sub-feature maps, and the pixel values in the same position in each pooling window are in the same sub-feature map; the pixels belonging to different pooling windows in each sub-feature map are processed in parallel, The pooling result corresponding to the above target feature map is obtained.
  • the above-mentioned controller is configured to: load the pixel values respectively included in the above-mentioned sub-feature maps into the shift register array; The values are pooled in parallel to obtain the pooling results corresponding to the above target feature maps.
  • the above-mentioned controller is configured to: determine the shift operation mode of the shift register array corresponding to each sub-feature map according to the positions of the pixel values in the same pooling window in each sub-feature map ; Load the above-mentioned several sub-feature maps into the shift register array respectively, and for each sub-feature map, shift the shift register array in the above-mentioned shift register array according to the shift operation mode of the shift register array determined for the sub-feature map.
  • Shift operation is performed on the pixel values stored in the register, and the partial pooling results corresponding to different pooling windows in the sub-feature map are obtained by parallel pooling processing according to the pooling instruction; the partial pooling results corresponding to different pooling windows are obtained according to each sub-feature map. , and determine the pooling result corresponding to the above target feature map.
  • the controller is configured to: determine the pixel values in odd rows and odd columns in the target feature map as the first sub-feature map; The pixel value of the position is determined as the second sub-feature map; the pixel value in the even-numbered row and odd-numbered column position in the above-mentioned target feature map is determined as the third sub-feature map; The pixel value is determined as the fourth sub-feature map.
  • the controller is configured to: move each pixel value included in the first sub-feature map to at least part of the shift registers included in the shift register array; The included pixel values are respectively moved to the above-mentioned partial shift registers, so that the calculation kernel corresponding to each shift register performs pooling processing on the received two pixel values according to the above-mentioned pooling instructions, and obtains the first pooling processing.
  • each pixel value included in the above-mentioned third sub-feature map is respectively moved to the above-mentioned partial shift registers, so that the calculation kernel corresponding to each of the shift registers in the partial shift registers can perform the above-mentioned pooling instruction according to the above-mentioned pooling instruction.
  • the first pooling processing result and the received pixel values are pooled to obtain the second pooling processing result; the pixel values included in the fourth sub-feature map are respectively moved to the above-mentioned partial shift registers, so as to be consistent with all the pixel values.
  • the computing kernel corresponding to each shift register in the partial shift registers performs pooling processing on the above-mentioned second pooling processing result and the received pixel value according to the above-mentioned pooling instruction to obtain the third pooling processing result;
  • the calculation kernels corresponding to each shift register in the partial shift registers respectively perform pooling processing to obtain the third pooling processing result, and obtain the pooling result corresponding to the above-mentioned target feature map.
  • the above-mentioned pooling processing includes maximum pooling processing; the above-mentioned pooling instruction includes comparing the maximum value between the two; the above-mentioned controller is configured to: combine the pixel values included in the above-mentioned first sub-feature map are respectively moved to the first registers of at least part of the shift registers included in the above-mentioned shift register array; and each pixel value included in the above-mentioned second sub-feature map is respectively moved to the second registers of the above-mentioned part of the shift registers, to Make the calculation kernel corresponding to each shift register in the partial shift register obtain the maximum value of the values stored in the first register and the second register according to the pooling instruction, and use the maximum value as the first A pooling result is stored in the above-mentioned first register.
  • the controller is configured to: move each pixel value included in the third sub-feature map to the second register of the partial shift register, so as to be consistent with the partial shift register.
  • the computing kernel corresponding to each shift register obtains the maximum value of the values stored in the first register and the second register according to the above-mentioned pooling instruction, and stores the above-mentioned maximum value as the result of the above-mentioned second pooling process in the above-mentioned No.
  • the controller is configured to: output the value stored in the first register of the partial shift register to obtain a pooling result corresponding to the target feature map.
  • the present application also proposes a chip, the chip may include a controller; the controller is used to obtain an original feature map; the original feature map is divided into several target feature maps; Each target feature map is pooled to obtain the pooling result corresponding to each target feature map; the pooling result corresponding to each target feature map is output to obtain the pooling result corresponding to the above-mentioned original feature map.
  • the present application also provides an electronic device, including the chip shown in any of the foregoing embodiments.
  • the present application also provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by the controller, any one of the pooling methods described above is implemented.
  • FIG. 1 is a schematic structural diagram of a shift register array according to an embodiment of the application
  • FIG. 2 is a schematic structural diagram of a PE shown in an embodiment of the application.
  • FIG. 3 is a flowchart of a pooling method according to an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a splitting process of a target feature map shown in an embodiment of the application.
  • FIG. 5 is a schematic diagram of a splitting process of a target feature map shown in an embodiment of the application.
  • FIG. 6 is a schematic diagram of a pooling window shown in an embodiment of the present application.
  • FIG. 7 is a schematic diagram of shifting pixel values of a first sub-feature map according to an embodiment of the present application.
  • FIG. 8 is a schematic diagram of transferring pixel values for a second sub-feature map according to an embodiment of the present application.
  • FIG. 9 is a schematic diagram of transferring pixel values for a third sub-feature map according to an embodiment of the present application.
  • FIG. 10 is a flowchart of a pooling method according to an embodiment of the present application.
  • FIG. 1 is a schematic structural diagram of a shift register array shown in the present application.
  • the shift register array may include a plurality of shift registers arranged vertically and horizontally, and each shift register may uniquely correspond to a computing core (Processing Element, hereinafter referred to as PE), and each PE is used to perform operations according to the values in the shift register.
  • PE Computing Element
  • FIG. 1 it can be considered that the above-mentioned shift register array includes a plurality of PEs arranged vertically and horizontally. Data can be moved between any two adjacent PEs (shift registers corresponding to PEs); each row of PEs can obtain data from the corresponding RAM (Random Access Memory). Assume that the size of the above shift register array is 8*8; the size of the feature map that needs to be input is also 8*8.
  • the above-mentioned feature map may be split into 8 rows of pixel values by a controller (eg, an array controller, also referred to as a processor), and the 8 rows of pixel values are divided into 8 rows respectively.
  • the pixel value is input into the RAM corresponding to each row of PE.
  • the above-mentioned controller can move the pixel values in each RAM to the corresponding shift registers according to the position sequence of each pixel value in the above-mentioned feature map through the data moving instruction, so as to complete the input shift of the feature map. operation of the register.
  • the above-mentioned PE can perform data operation on the data in the shift register in response to the instruction.
  • the PE corresponding to the above-mentioned shift register may include a register, and an ALU (arithmetic and logic unit, arithmetic logic unit).
  • the above-mentioned register may be a register obtained by dividing the storage space of the above-mentioned shift register.
  • the above-mentioned shift register may be configured as several registers (for example, register 1 and register 2 shown in FIG. 2 ) that can move data between each other according to actual requirements.
  • the above-mentioned PE can perform arithmetic processing on the data in the multiple registers according to the arithmetic instruction.
  • the ALU user described above performs logical operations. For example, when the PE receives an operation instruction such as adding or subtracting a value or comparing the size, the above-mentioned ALU can perform a relevant operation on the value stored in the register.
  • the above-mentioned shift register array in the embodiments of the present application includes a plurality of shift registers, where the shift registers correspond to respective PEs, and each shift register (or each PE) includes a plurality of registers (For example: the first register, the second register, the third register, the fourth register, etc., the number of registers per PE is not limited here).
  • This application proposes a pooling method.
  • the method splits at least part of the pixel values in the same pooling window in the target feature map into different sub-feature maps respectively, and processes the pixels belonging to different pooling windows in each sub-feature map in parallel to obtain the above target feature
  • the pooling result corresponding to the graph can improve the efficiency of chip pooling processing, reduce the computational burden of the chip, and reduce the difficulty of chip design.
  • FIG. 3 is a flowchart of a pooling method shown in this application. As shown in FIG. 3 , the above-mentioned pooling method may include steps S302 to S306.
  • the above target feature map is a feature map that needs to be pooled.
  • the above target feature map may be a feature map that needs to be pooled after convolution processing.
  • the target feature map may be a target feature map obtained by performing convolution processing on each PE in the shift register array. It is understandable that, the above-mentioned target feature map may be stored in the RAM corresponding to each row of PE.
  • the pooling process usually includes a pooling window of a preset size and a step size of a preset size set according to business requirements.
  • the pooling window as 2*2 and the step size as 2 as an example, when performing the pooling operation on the feature map, it can be understood as starting from the first pixel value in the upper left corner of the feature map, and taking the first pixel value as the upper left corner.
  • the elements form a pooling window of size 2*2.
  • the pooling operation in the pooling window is completed.
  • slide two pixel values to the right of the first pixel value and form a 2*2 pooling window.
  • the pooling operation is performed on the pixel values in the current pooling window.
  • the maximum pixel value output by each pooling window is combined to obtain the pooling result corresponding to the above feature map.
  • the pooling window may include several pixel values. Each pixel value can be in a different position of the pooling window. Take the pooling window as 2*2 as an example. The 4 pixel values in the pooling window can be located in the upper left corner, lower left corner, upper right corner and lower right corner of the pooling window, respectively.
  • the sub-feature map of the splitting method under the condition that the pixel values in the same position in each pooling window are in the same sub-feature map.
  • the target feature map is split according to the determined splitting method to obtain several sub-feature maps.
  • the pooling window is 2*2
  • the above target feature map can be split, and at least some pixel values in the same pooling window in the above target feature map are obtained in different sub-feature maps respectively, and the pixels in the same position in each pooling window are obtained.
  • the split results with values in the same sub-feature map condition, and 4 sub-feature maps are obtained.
  • the correspondence between pooling and splitting schemes may be maintained in advance.
  • parameters such as the pooling window and step size of the pooling process can be determined first. Then, according to the determined parameters, the above-mentioned corresponding relationship is queried to obtain a corresponding splitting scheme, and the above-mentioned target feature map is split according to the above-mentioned splitting scheme.
  • S306 can be executed to perform parallel processing on pixels belonging to different pooling windows in each sub-feature map to obtain the pooling result corresponding to the above target feature map.
  • the pixel values respectively included in the above-mentioned several sub-feature maps can be loaded into the shift register array, and according to the pooling instruction, the pixel values in the same position in the above-mentioned several sub-feature maps can be pooled in parallel to obtain the same The pooling result corresponding to the above target feature map.
  • the above-mentioned pooling instruction may include an instruction corresponding to the current pooling process. This instruction can be pre-generated. When the pooling process is max pooling, the above-mentioned pooling instruction can compare the maximum value between the two. When the pooling process is average pooling, the above-mentioned pooling instructions may be summation or average value.
  • the above pooling result may include a pooling result obtained after the target feature map is pooled.
  • the pixel values included in each sub-feature map may be sequentially moved to the shift registers included in the shift register array according to a preset moving method, so that the pixels in the same position in each sub-feature map are moved. Pixel values are moved into the same shift register.
  • a preset moving method each pixel value is moved to the above-mentioned shift register array according to the order of each pixel value in the target feature map.
  • the above-mentioned transfer method is not particularly limited in this application. It is understandable that the pixel values of each sub-feature map are transferred according to the same transfer method to ensure that the pixel values in the same position in each sub-feature map are transferred to the same shift register.
  • the PE corresponding to each shift register can perform parallel pooling processing on the received pixel values according to the pooling instruction to obtain the pooling result corresponding to the above target feature map. For example, take the pooling process as max pooling.
  • each shift register receives a new pixel value, it can compare the newly received pixel value with the stored pixel value through its corresponding PE, obtain the maximum value, and cover the maximum value to the shift register. Therefore, after the input of each sub-feature map split from the target feature map is completed, the above-mentioned shift register can include the maximum pixel value in each pooling window. After that, output the maximum pixel value stored in the register array to obtain the pooling result for the above target feature map.
  • the target feature map split result all pixel values included in the same pooling window are in different sub-feature maps.
  • the pixel values respectively included in the above-mentioned sub-feature maps can be loaded into the shift register.
  • the pixel values in the same position in the above-mentioned several sub-feature maps are pooled in parallel to obtain the pooling result corresponding to the above-mentioned target feature map, so that multiple PEs corresponding to the shift register array can be used.
  • the pooling processing operations of each pooling window are performed in parallel, thereby improving the pooling processing efficiency, reducing the computational burden of the chip, and reducing the difficulty of chip design.
  • the shift operation mode of the shift register array corresponding to each sub-feature map may be determined according to the positions of the pixel values in the same pooling window in each sub-feature map.
  • the above-mentioned several sub-feature maps can be loaded into the shift register array respectively, and for each sub-feature map, according to the shift operation mode of the shift register array determined for the sub-feature map
  • the pixel values stored in the shift register array in the above-mentioned shift register array perform a shift operation, and then perform parallel pooling processing according to the pooling instruction to obtain partial pooling results corresponding to different pooling windows in the sub-feature map.
  • the graphs correspond to partial pooling results of different pooling windows, and determine the pooling results corresponding to the above target feature graphs.
  • some pixel values in the same pooling window are in the same sub-feature map.
  • the pixel values in the same pooling window in each sub-feature map can be pooled first.
  • the partial pooling results corresponding to different pooling windows in each sub-feature map are obtained; then the partial pooling results corresponding to the same pooling window in each sub-feature map are pooled again to obtain the final pooling result.
  • Multiple PEs corresponding to the shift register array can be used to perform the pooling processing operation of each pooling window in parallel, thereby improving the pooling processing efficiency, reducing the computational burden of the chip, and reducing the difficulty of chip design.
  • the number of pixels included in at least some of the sub-feature maps in the above-mentioned sub-feature maps is the same as the number of pixels included in the shift register array in the above-mentioned shift register array. The number is the same.
  • each sub-feature map obtained by splitting the target feature map contains some sub-feature maps or The number of pixels included in all the sub-feature maps is the same as the number of the above-mentioned shift registers.
  • Scenario 1 The size of the target feature map is 16*16, the size of the pooling window is 2*2, the step size is 2, the pooling process is maximum pooling process, and the pooling instruction includes comparing the maximum value between the two.
  • the size of the shift register array included in the AI chip for pooling is 8*8.
  • the above target feature map can be split to obtain four sub-feature maps.
  • pixel values in odd rows and odd columns in the target feature map may be determined as the first sub-feature map; pixel values in odd rows and even columns in the target feature map may be determined as the second sub-feature map sub-feature map; the pixel values in the even-numbered rows and odd-numbered columns in the above-mentioned target feature map are determined as the third sub-feature map; the pixel values in the even-numbered rows and even-numbered columns in the above-mentioned target feature map are determined as the fourth sub-feature picture.
  • FIG. 4 is a schematic diagram of a splitting process of a target feature map shown in this application.
  • the target feature map is 16*16.
  • black squares indicate pixels in odd rows and columns in the target feature map
  • dark gray squares indicate pixels in odd rows and even columns in the target feature map
  • white squares indicate pixels in even rows and odd columns in the target feature map
  • the light gray squares refer to the pixels in the even-numbered rows and even-numbered columns in the target feature map.
  • the target feature map is split according to the aforementioned splitting method to obtain the first to fourth sub-feature maps.
  • the size of each sub-feature map is 8*8, which is consistent with the size of the above-mentioned shift register array.
  • each pixel value included in the above-mentioned first sub-feature map can be moved to at least part of the shift registers corresponding to the above-mentioned shift register array according to the preset moving method.
  • each pixel value included in the first sub-feature map may be moved to the first registers of at least part of the shift registers included in the shift register array according to a preset moving method.
  • each pixel value needs to be moved as a whole among the shift registers in the shift register array, then according to the moving direction and the moving step size, an idle shift can be reserved at a preset position in the shift register array Register in order to store the moved pixel value. Assuming that each pixel value needs to be moved to the right by one step as a whole, then at least the free shift register on the right adjacent to the pixel value of the rightmost column in each pixel value storage location needs to be reserved. , and other moving methods are the same, and will not be repeated here. In this way, all the pixel values included in the first sub-feature map can be transferred to the shift registers included in the shift register array.
  • each pixel value included in the second sub-feature map can be respectively moved to the partial shift register according to the above-mentioned preset moving method, so that the computing kernel corresponding to the shift register can The received two pixel values are pooled to obtain a first pooling result.
  • each pixel value included in the second sub-feature map can be respectively moved to the second register of the partial shift register according to the above-mentioned preset moving method, so that each computing core can perform the pooling instruction according to the above-mentioned pooling instruction. , obtain the maximum value among the values stored in the first register and the second register, and store the maximum value in the first register as the result of the first pooling process. Therefore, the pixel values in the first sub-feature map and the second sub-feature map respectively in the same pooling window can be compared to obtain the maximum value and store it in the shift register.
  • each pixel value included in the above-mentioned third sub-feature map can be respectively moved to the above-mentioned partial shift registers according to the above-mentioned preset moving method, so that each computing kernel can process the above-mentioned first pooling process according to the above-mentioned pooling instruction
  • the result is pooled with the received pixel value to obtain a second pooling result.
  • each pixel value included in the above-mentioned third sub-feature map is respectively moved to the second register of the above-mentioned partial shift register, so that each computing core can be based on the above-mentioned pooling instruction, Obtain the maximum value among the values stored in the first register and the second register, and store the maximum value in the first register as the second pooling processing result. Therefore, the pixel values in the first sub-feature map, the second sub-feature map and the third sub-feature map respectively in the same pooling window can be compared to obtain the maximum value and store it in the shift register.
  • each pixel value included in the above-mentioned fourth sub-feature map can be respectively moved to the above-mentioned partial shift register according to the above-mentioned preset moving method, so that each computing kernel can process the above-mentioned second pooling process according to the above-mentioned pooling instruction.
  • the result is pooled with the received pixel value to obtain a third pooling result.
  • each pixel value included in the above-mentioned fourth sub-feature map is respectively moved to the second register of the above-mentioned partial shift register according to the above-mentioned preset moving method, so that each computing core can obtain the The maximum value among the values stored in the first register and the second register is stored in the first register as the third pooling processing result.
  • the pixel values in the first sub-feature map, the second sub-feature map, the third sub-feature map and the fourth sub-feature map in the same pooling window can be compared, and the maximum value can be obtained and stored in the shift in the register.
  • the third pooling processing result obtained by performing pooling processing in each computing core can be outputted to obtain the pooling result corresponding to the above-mentioned target feature map.
  • the value stored in the first register of the partial shift register can be output to obtain the pooling result corresponding to the target feature map. In this way, the maximum pooling process for the above target feature map can be completed, and the corresponding pooling result can be obtained.
  • the specific process may refer to the above-mentioned embodiment, only the pooling instructions are different, which will not be described in detail here.
  • Scenario 2 The size of the target feature map is 17*17, the size of the pooling window is 3*3, the step size is 2, the pooling process is the maximum pooling process, and the pooling command is the above-mentioned pooling command, including comparing the values between the two. maximum value.
  • the size of the shift register array included in the AI chip for pooling is 9*9.
  • the above target feature map can be split to obtain four sub-feature maps.
  • FIG. 5 is a schematic diagram of a splitting process of a target feature map shown in the present application.
  • the target feature map is 17*17.
  • black squares indicate pixels in odd rows and columns in the target feature map
  • dark gray squares indicate pixels in odd rows and even columns in the target feature map
  • white squares indicate pixels in even rows and odd columns in the target feature map
  • the light gray squares refer to the pixels in the even-numbered rows and even-numbered columns in the target feature map.
  • the target feature map is split according to the aforementioned splitting method to obtain the first to fourth sub-feature maps.
  • the first sub-feature map is 9*9
  • the second sub-feature map is 9*8
  • the third sub-feature map is 8*9
  • the fourth sub-feature map is 8*8.
  • the size of the above-mentioned first sub-feature map is consistent with the size of the above-mentioned shift register array.
  • FIG. 6 is a schematic diagram of a pooling window shown in this application.
  • the pooling window shown in FIG. 6 includes the pooling window when the pooling window size is 3*3 and the step size is 2 when the above target feature map is pooled.
  • the dashed box represents a pooling window in the target feature map.
  • the pooling window can include 4 black blocks, 2 dark gray blocks, 2 white blocks and 1 light gray block.
  • the maximum value among the four pixel values in the upper right corner, the lower left corner, and the lower right corner that is, the first maximum value among the four adjacent pixel values in the upper, lower, left, and right corners corresponding to the pooling window.
  • execute S63-S64 to determine the maximum value of the two pixel values in the first row, second column, and the third row, second column, that is, the pooling window corresponds to the upper and lower adjacent sub-feature maps in the second sub-feature map.
  • S61 can be executed to move each pixel value included in the first sub-feature map to at least part of the shift registers included in the shift register array according to a preset moving method.
  • each pixel value included in the first sub-feature map may be moved to the first registers of at least part of the shift registers included in the shift register array according to a preset moving method.
  • S62 can be executed, and the computing kernel corresponding to the above-mentioned at least part of the shift registers can perform a pooling operation on the pixel values in any four adjacent shift registers included in the above-mentioned shift register array according to the above-mentioned pooling instruction , obtain the first partial pooling processing result, and store the first partial pooling processing result in the target shift register at the preset position among the four adjacent shift registers.
  • the above-mentioned preset position includes the position of the lower left corner of any four adjacent shift registers. It is understandable that, for the solution in which the above-mentioned preset positions are other positions, reference may be made to this embodiment.
  • the above-mentioned target shift register includes a shift register located at the lower left corner among the above-mentioned four adjacent shift registers.
  • FIG. 7 is a schematic diagram of transferring pixel values of the first sub-feature map according to the present application. It should be noted that any four adjacent shift registers in the shift register array can be regarded as a group of shift registers shown in FIG. 7 . Taking the second shift register in the group of shift registers shown in FIG. 7 as an example, in another group of shift registers, it may be the first shift register, or the third shift register or the target shift register. register. FIG.
  • FIG. 7 only schematically illustrates the movement flow of pixel values in one group of shift registers, and the movement flow of pixel values in other groups of shift registers can be illustrated with reference to FIG. 7 , and will not be described in detail in this application.
  • the shift register at the upper left corner can be regarded as the first shift register
  • the shift register to the right of the first shift register The register can be regarded as the second shift register
  • the shift register below the second shift register can be regarded as the third shift register
  • the shift register to the left of the third shift register can be regarded as the above-mentioned target shift register .
  • each computing core can store the larger value in the first register and the second register in each second shift register into the first register in each second shift register.
  • the value in the first register of the second shift registers can be moved to the second register of the third shift register below the second shift register.
  • each computing core can store the larger value in the first register and the second register in each third shift register into the first register in each third shift register.
  • the value in the first register of the third shift registers can be moved to the second register of the target shift register on the left side of the third shift register.
  • each computing core can store the larger value in the first register and the second register in each target shift register as the above-mentioned first partial pooling processing result in the first register of each target shift register.
  • the maximum value of the four pixel values in the upper left corner, upper right corner, lower left corner and lower right corner of the same pooling window can be determined, that is, among the four adjacent pixel values in the first sub-feature map.
  • the first maximum value is stored in the aforementioned target shift register.
  • S63 can be executed to move each pixel value included in the second sub-feature map to the partial shift register according to the preset moving method.
  • each pixel value included in the second sub-feature map can be respectively moved to the second register of the partial shift register according to the preset moving method.
  • each computing core can execute S64, and according to the above-mentioned pooling instruction, perform a pooling operation on the pixel values in any two upper and lower adjacent shift registers included in the above-mentioned shift register array, to obtain a second partial pooling processing result , and store the above-mentioned second partial pooling processing result in the above-mentioned target shift register.
  • FIG. 8 is a schematic diagram of transferring pixel values of the second sub-feature map according to the present application. It should be noted that any two adjacent shift registers in the shift register array can be regarded as a group of shift registers shown in FIG. 8 .
  • FIG. 8 only schematically illustrates the movement flow of pixel values in one group of shift registers, and the movement flow of pixel values in other groups of shift registers can be illustrated with reference to FIG. 8 , and will not be described in detail in this application.
  • the shift register at the upper position in a group of shift registers can be regarded as the first shift register, and the shift register below the first shift register can be regarded as the first shift register. Think of it as the above target shift register.
  • each computing core may store the larger value in the second register and the third register in each target shift register as the result of the above-mentioned second partial pooling processing in the second register of each target shift register.
  • the maximum value of the two pixel values in the first row, second column, and the third row and second column position in the same pooling window can be determined, that is, the two adjacent top and bottom in the second sub-feature map.
  • the second largest of the pixel values is stored in the above-mentioned destination shift register.
  • each pixel value included in the third sub-feature map is respectively moved to the partial shift register according to the preset moving method.
  • each pixel value included in the third sub-feature map is respectively moved to the third register of the partial shift register according to the above-mentioned preset moving method.
  • each computing core can execute S66, and according to the above-mentioned pooling instruction, perform a pooling operation on the pixel values in any two left and right adjacent shift registers included in the above-mentioned shift register array, and obtain a third partial pooling processing result , and store the above-mentioned third partial pooling processing result in the above-mentioned target shift register.
  • FIG. 9 is a schematic diagram of transferring pixel values of the third sub-feature map according to the present application. It should be noted that any two left and right adjacent shift registers in the shift register array can be regarded as a group of shift registers shown in FIG. 9 . FIG. 9 only schematically illustrates the movement flow of pixel values in one group of shift registers. The movement flow of pixel values in other groups of shift registers can be illustrated with reference to FIG. 9 , and will not be described in detail in this application. As shown in FIG. 9 (the register is not shown in FIG.
  • the shift register in the upper left corner of a group of shift registers can be regarded as the first shift register, and the shift register to the right of the first shift register
  • the register can be regarded as the second shift register, the shift register below the second shift register can be regarded as the third shift register, and the shift register to the left of the third shift register can be regarded as the above-mentioned target shift register .
  • each computing core may store the larger value in the third register and the fourth register in each second shift register as the above-mentioned third partial pooling processing result in the third register of each second shift register.
  • the third maximum value of is stored in the above-mentioned second shift register.
  • the above-mentioned third maximum value (the result of the third partial pooling process) can be moved to the above-mentioned target shift register. That is, S93 is executed, and the third part of the pooling processing result in the third register of the second shift register is moved to the third register of the first shift register on the left side of the second shift register.
  • S94 Move the third part of the pooling processing result in the third register of the first shift register to the third register of the target shift register below the first shift register. Therefore, the maximum value of the two pixel values in the second row, the first column and the second row, the third column in the same pooling window, that is, the left and right adjacent pixel values in the third sub-feature map.
  • the third maximum value is stored in the aforementioned target shift register. In some examples, when data is moved from the second shift register to the target shift register, the data can also be moved to the third shift register first, and then moved to the above-mentioned target shift register.
  • each pixel value included in the fourth sub-feature map is respectively moved to the partial shift register according to the above-mentioned preset moving method.
  • each pixel value included in the fourth sub-feature map can be respectively moved to the fourth register of the partial shift register according to the above-mentioned preset moving method.
  • S68 can be executed to move the pixel values in each of the shift registers in the partial shift registers to the target shift register.
  • the values in the fourth register of each of the first shift registers in the above-mentioned partial shift registers may be moved to the fourth register of the target shift register below the first shift register.
  • the above target shift register includes the first part of the pooling processing result (the first maximum value), the second part of the pooling processing result (the second maximum value), and the third part of the pooling processing result (the third maximum value) ), and the pixel value in the middle of the pooling window (the pixel value corresponding to the fourth sub-feature map).
  • the first partial pooling processing result, the second partial pooling processing result, the third partial pooling processing result and the pixels corresponding to the fourth sub-feature map in each target shift register included in the shift register array can be Values are compared, and the maximum value among them is output to obtain the pooling result corresponding to the above target feature map.
  • each computing core may store the larger value of the first register and the second register of each target shift register into the first register of each target shift register.
  • the larger value of the first register and the third register of each target shift register may be stored in the first register of each target shift register.
  • the larger value of the first register and the fourth register of each target shift register may be stored in the first register of each target shift register.
  • the value stored in the first register of each target shift register is output, and the pooling result corresponding to the above target feature map is obtained.
  • the maximum value obtained by performing maximum pooling on each pooling window can be stored in the above target shift register, and by outputting the maximum value in each target shift register, the corresponding target feature map can be obtained. Pooling results.
  • a plurality of temporary registers are connected to the periphery of the above-mentioned shift register. Wherein, the temporary register is used for storing the pixel value overflowing the shift register array when the numerical value transfer operation is performed. Therefore, in the process of data transfer, it is not necessary to store the pixel values of the overflow shift register array in RAM, but only the overflow pixel values need to be stored in the temporary register, thereby improving the data transfer efficiency and thus the pooling efficiency.
  • FIG. 10 is a method flowchart of a pooling method shown in this application. As shown in Figure 10, the above method may include:
  • S1008 output the pooling result corresponding to each target feature map, and obtain the pooling result corresponding to the above-mentioned original feature map.
  • the original feature map can be divided into several target feature maps first, and then each target feature map can be pooled according to the pooling method shown in any of the above embodiments, and the pooling result corresponding to each target feature map can be obtained. . Finally, the pooling result corresponding to each target feature map is output, and the pooling result corresponding to the original feature map above is obtained.
  • efficient pooling of the above-mentioned original feature maps larger than the shift register array can be achieved.
  • the present application also proposes a chip.
  • the above-mentioned chip may include a controller; the above-mentioned controller is used to obtain a target feature map; the above-mentioned target feature map is split to obtain several sub-feature maps; wherein, at least some of the pixels in the same pooling window in the above-mentioned target feature map The values are in different sub-feature maps, and the pixel values in the same position in each pooling window are in the same sub-feature map; the pixels belonging to different pooling windows in each sub-feature map are processed in parallel to obtain the pool corresponding to the above target feature map. result.
  • the above-mentioned controller is configured to: load the pixel values respectively included in the above-mentioned several sub-feature maps into the shift register array, and, according to the pooling instruction, perform the processing for the pixels in the same position in the above-mentioned several sub-feature maps The values are pooled in parallel to obtain the pooling results corresponding to the above target feature maps.
  • the above-mentioned controller is configured to: determine the shift operation mode of the shift register array corresponding to each sub-feature map according to the positions of the pixel values in the same pooling window in each sub-feature map ; Load the above-mentioned several sub-feature maps into the shift register array respectively, and for each sub-feature map, shift the shift register array in the above-mentioned shift register array according to the shift operation mode of the shift register array determined for the sub-feature map.
  • Shift operation is performed on the pixel values stored in the register, and the partial pooling results corresponding to different pooling windows in the sub-feature map are obtained by parallel pooling processing according to the pooling instruction; the partial pooling results corresponding to different pooling windows are obtained according to each sub-feature map. , and determine the pooling result corresponding to the above target feature map.
  • the controller is configured to: determine the pixel values in odd rows and odd columns in the target feature map as the first sub-feature map; The pixel value of the position is determined as the second sub-feature map; the pixel value in the even-numbered row and odd-numbered column position in the above-mentioned target feature map is determined as the third sub-feature map; The pixel value is determined as the fourth sub-feature map.
  • the controller is configured to: move each pixel value included in the first sub-feature map to at least part of the shift registers included in the shift register array; The included pixel values are respectively moved to the above-mentioned partial shift registers, so that the calculation kernel corresponding to each shift register performs pooling processing on the received two pixel values according to the above-mentioned pooling instructions, and obtains the first pooling processing. Result; each pixel value included in the above-mentioned third sub-feature map is respectively moved to the above-mentioned partial shift register, so that each computing kernel can pool the above-mentioned first pooling processing result and the received pixel value according to the above-mentioned pooling instruction.
  • the above-mentioned pooling processing includes maximum pooling processing; the above-mentioned pooling instruction includes comparing the maximum value between the two; the above-mentioned controller is configured to: combine the pixel values included in the above-mentioned first sub-feature map are respectively moved to the first registers of at least part of the shift registers included in the above-mentioned shift register array; and each pixel value included in the above-mentioned second sub-feature map is respectively moved to the second registers of the above-mentioned part of the shift registers, to Make the calculation kernel corresponding to each shift register in the partial shift register obtain the maximum value of the values stored in the first register and the second register according to the pooling instruction, and use the maximum value as the first A pooling result is stored in the above-mentioned first register.
  • the controller is configured to: move each pixel value included in the third sub-feature map to the second register of the partial shift register, so as to be consistent with the partial shift register.
  • the computing kernel corresponding to each shift register obtains the maximum value of the values stored in the first register and the second register according to the above-mentioned pooling instruction, and stores the above-mentioned maximum value as the result of the above-mentioned second pooling process in the above-mentioned No.
  • the above-mentioned controller is configured to: output the value stored in the first register of the above-mentioned partial shift register, and obtain the pooling result corresponding to the above-mentioned target feature map.
  • the controller is configured to: move each pixel value included in the first sub-feature map to at least part of the shift registers included in the shift register array; Perform a pooling operation on the pixel values in any four adjacent shift registers included in the above-mentioned shift register array to obtain a first partial pooling processing result, and store the above-mentioned first partial pooling processing result in the above-mentioned four upper, lower, and lower
  • the left and right adjacent shift registers are located in the target shift register at the preset position; each pixel value included in the above-mentioned second sub-feature map is respectively moved to the above-mentioned partial shift register; according to the above-mentioned pooling instruction, the above-mentioned shift register is moved.
  • the second partial pooling processing result the third partial pooling processing result in each target shift register included in the above-mentioned shift register array, and the above-mentioned fourth sub-feature map, the above-mentioned target feature is obtained.
  • the above-mentioned pooling processing includes maximum pooling processing; the above-mentioned pooling instruction includes comparing the maximum value between the two; the above-mentioned preset position includes any four adjacent shift registers in the upper, lower, left and right.
  • the above-mentioned target shift register includes the shift register at the lower-left corner position in the above-mentioned four adjacent shift registers; the above-mentioned controller is used for: respectively moved to the first register of at least part of the shift registers included in the above-mentioned shift register array; the above-mentioned controller is used for: moving the value in the first register of each first shift register in the above-mentioned partial shift register to the first register In the second register of the second shift register on the right side of the shift register; store the larger value of the first register and the second register in each second shift register into the first register of each second shift register ; Move the numerical value in the first register of the above-mentioned second shift registers to the second register of the third shift register below the second shift register; The larger value in the two registers is stored in the first register of each third shift register; the value in the first register in each of the third shift registers is moved to the target shift on the left of the third shift register In the second register of the register; the larger value
  • the above-mentioned controller is used for: moving each pixel value included in the above-mentioned second sub-feature map to the second register of the above-mentioned partial shift register; the above-mentioned controller is used for: shifting the above-mentioned part The value in the second register of each first shift register in the register is moved to the third register of the target shift register below the first shift register; the second register and the third register in each target shift register are moved. The larger value is stored in the second register of each target shift register as the result of the above-mentioned second partial pooling process.
  • the above-mentioned controller is used for: moving each pixel value included in the above-mentioned third sub-feature map to the third register of the above-mentioned partial shift register; the above-mentioned controller is used for: shifting the above-mentioned part The value in the third register of each first shift register in the register is moved to the fourth register of the second shift register to the right of the first shift register; The larger value in the four registers is used as the result of the above-mentioned third part of the pooling processing, and is stored in the third register of each second shift register; the third part in the third register of the above-mentioned second shift register is pooled The result is moved to the third register of the first shift register on the left side of the second shift register; the third part of the pooling processing result in the third register of the first shift register is moved to the first shift register. in the third register of the destination shift register below the shift register.
  • the above-mentioned controller is used for: moving each pixel value included in the above-mentioned fourth sub-feature map to the fourth register of the above-mentioned partial shift register; the above-mentioned controller is used for: shifting the above-mentioned part The value in the fourth register of each first shift register in the registers is moved to the fourth register of the target shift register below the first shift register.
  • the above-mentioned controller is used to: store the larger value of the first register and the second register in each target shift register into the first register of each target shift register; store each target shift register The larger value in the first register and the third register in the shift register is stored in the first register of each target shift register; the larger value in the first register and the fourth register in each target shift register is stored Store the value in the first register of each target shift register; output the value stored in the first register of each target shift register, and obtain the pooling result corresponding to the above target feature map.
  • a plurality of temporary registers are connected to the periphery of the above-mentioned shift register array; the above-mentioned temporary registers are used to store the pixel values that overflow the above-mentioned shift register array when performing a numerical value transfer operation.
  • the number of pixels included in at least some of the sub-feature maps in the above-mentioned several sub-feature maps is consistent with the number of shift registers included in the above-mentioned shift register array.
  • the present application also proposes a chip.
  • the above-mentioned chip may include a controller; the above-mentioned controller is used to obtain an original feature map; the above-mentioned original feature map is divided into several target feature maps; and each target feature map is pooled according to the pooling method shown in any of the foregoing embodiments. , obtain the pooling result corresponding to each target feature map; output the pooling result corresponding to each target feature map, and obtain the pooling result corresponding to the above-mentioned original feature map.
  • the present application also provides an electronic device, which includes the chip shown in any of the foregoing embodiments.
  • the electronic device may be a smart terminal such as a mobile phone, or may be other devices that have a camera and can perform image processing.
  • the chip of the embodiment of the present application may be used to perform the pooling task. Since the above chip has higher pooling processing efficiency and higher performance, the use of this chip can assist in improving the processing efficiency of the pooling task, thereby improving the performance of electronic equipment.
  • the present application also provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by the controller, any one of the pooling methods described above is implemented.
  • one or more embodiments of the present application may be provided as a method, system or computer program product. Accordingly, one or more embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present application may employ a computer program implemented on one or more computer-usable storage media (including, but not limited to, disk storage, OxCD_00-ROM, optical storage, etc.) having computer-usable program code embodied therein form of the product.
  • computer-usable storage media including, but not limited to, disk storage, OxCD_00-ROM, optical storage, etc.
  • Embodiments of the subject matter and functional operations described in this application can be implemented in digital electronic circuits, in tangible embodiment of computer software or firmware, in computer hardware including the structures disclosed in this application and their structural equivalents, or in a combination of one or more.
  • Embodiments of the subject matter described in this application may be implemented as one or more computer programs, ie, one or more of computer program instructions encoded on a tangible, non-transitory program carrier for execution by or to control the operation of a data processing chip. multiple units.
  • the program instructions may be encoded on an artificially generated propagated signal, such as a machine-generated electrical, optical or electromagnetic signal, which is generated to encode and transmit information to a suitable receiver chip for interpretation by the data. Processing chip execution.
  • the computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of these.
  • the processes and logic flows described in this application can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output.
  • the processes and logic flows described above can also be performed by, and chips can also be implemented as, special purpose logic circuits, such as FPGAs (field programmable gate arrays) or ASICs (application specific integrated circuits).
  • Computers suitable for the execution of a computer program include, for example, general and/or special purpose microprocessors, or any other type of central processing unit.
  • the central processing unit will receive instructions and data from read only memory and/or random access memory.
  • the basic components of a computer include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data.
  • a computer will also include, or be operatively coupled to, one or more mass storage devices for storing data, such as magnetic, magneto-optical or optical disks, to receive data therefrom or to It transmits data, or both.
  • the computer does not have to have such a device.
  • the computer may be embedded in another device, such as a mobile phone, personal digital assistant (PDA), mobile audio or video player, game console, global positioning system (GPS) receiver, or a universal serial bus (USB) ) flash drives for portable storage devices, to name a few.
  • PDA personal digital assistant
  • GPS global positioning system
  • USB universal serial bus
  • Computer-readable media suitable for storage of computer program instructions and data include all forms of non-volatile memory, media, and memory devices including, for example, semiconductor memory devices (eg, EPROM, EEPROM, and flash memory devices), magnetic disks (eg, internal hard disks or memory devices). removable disk), magneto-optical disk and 0xCD_00ROM and DVD-ROM disks.
  • semiconductor memory devices eg, EPROM, EEPROM, and flash memory devices
  • magnetic disks eg, internal hard disks or memory devices. removable disk
  • magneto-optical disk 0xCD_00ROM and DVD-ROM disks.
  • the processor and memory may be supplemented by or incorporated in special purpose logic circuitry.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Processing (AREA)

Abstract

Provided are a pooling method, and a chip, a device and a storage medium. The method may comprise: acquiring a target feature map; splitting the target feature map to obtain a plurality of feature sub-maps, wherein at least some pixel values, within the same pooling window, in the target feature map are respectively in different feature sub-maps, and pixel values, at the same positions, within pooling windows are in the same feature sub-map; and performing parallel processing on pixels, which belong to different pooling windows, in each feature sub-map, so as to obtain a pooling result corresponding to the target feature map.

Description

池化方法、芯片、设备和存储介质Pooling method, chip, device and storage medium
相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS
本申请要求在2021年1月29日提交中国专利局、申请号为202110127626.9、发明名称为“一种池化方法、芯片、设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。This application claims the priority of the Chinese patent application filed on January 29, 2021 with the application number 202110127626.9 and the invention titled "A pooling method, chip, device and storage medium", the entire contents of which are by reference Incorporated in this disclosure.
技术领域technical field
本申请涉及计算机技术,具体涉及一种池化方法、芯片、设备和存储介质。The present application relates to computer technology, and in particular to a pooling method, chip, device and storage medium.
背景技术Background technique
池化处理是指对输入的特征图进行下采样,在保持特征在某些维度(例如,旋转、平移、伸缩)不变性的情形下,减少特征数量,简化卷积网络计算复杂度的处理过程。池化处理可包括两种:第一,平均池化,即对池化窗口内的特征值求平均;第二,最大池化,即对池化窗口内的特征值取最大。池化处理通常需要依赖人工智能芯片(以下简称AI芯片)。目前,亟需对池化处理的加速方法。Pooling refers to down-sampling the input feature map, reducing the number of features and simplifying the computational complexity of convolutional networks while maintaining the invariance of features in certain dimensions (eg, rotation, translation, scaling). . There are two types of pooling processing: first, average pooling, that is, averaging the eigenvalues within the pooling window; second, max pooling, that is, maximizing the eigenvalues within the pooling window. Pooling processing usually needs to rely on artificial intelligence chips (hereinafter referred to as AI chips). Currently, there is an urgent need for accelerated methods for pooling.
发明内容SUMMARY OF THE INVENTION
有鉴于此,本申请至少公开一种池化方法,上述方法可包括:获取目标特征图;对上述目标特征图进行拆分,得到若干子特征图;其中,上述目标特征图中处于同一池化窗口内的至少部分像素值分别处于不同的子特征图,各池化窗口内处于同一位置的像素值处于同一子特征图;对各子特征图中属于不同池化窗口的像素并行处理,得到上述目标特征图对应的池化结果。In view of this, the present application discloses at least one pooling method, and the method may include: acquiring a target feature map; splitting the above target feature map to obtain several sub-feature maps; wherein, the above target feature maps are in the same pooling At least some of the pixel values in the window are in different sub-feature maps respectively, and the pixel values in the same position in each pooling window are in the same sub-feature map; the pixels belonging to different pooling windows in each sub-feature map are processed in parallel to obtain the above The pooling result corresponding to the target feature map.
在示出的一些实施例中,上述对各子特征图中属于不同池化窗口的像素并行处理,得到上述目标特征图对应的池化结果,包括:将上述若干子特征图分别包括的像素值加载至移位寄存器阵列中,根据池化指令,对上述若干子特征图中处于同一位置的像素值并行池化处理,得到与上述目标特征图对应的池化结果。In some of the illustrated embodiments, the above-mentioned parallel processing of pixels belonging to different pooling windows in each sub-feature map to obtain a pooling result corresponding to the above-mentioned target feature map includes: It is loaded into the shift register array, and according to the pooling instruction, the pixel values in the same position in the above-mentioned several sub-feature maps are pooled in parallel to obtain the pooling result corresponding to the above-mentioned target feature map.
在示出的一些实施例中,上述对各子特征图中属于不同池化窗口的像素并行处理,得到上述目标特征图对应的池化结果,包括:根据各子特征图中处于同一池化窗口的像素值所处的位置,确定各子特征图分别对应的移位寄存器阵列的移位操作方式;将上述若干子特征图分别加载至移位寄存器阵列中,并针对每个子特征图,按照为该子特征图确定的移位寄存器阵列的移位操作方式对上述移位寄存器阵列中的移位寄存器存储的像素值执行移位操作,根据池化指令并行池化处理得到该子特征图中对应不同池化窗口的部分池化结果;根据各个子特征图对应不同池化窗口的部分池化结果,确定与上述目标特征图对应的池化结果。In some of the illustrated embodiments, the above-mentioned parallel processing of pixels belonging to different pooling windows in each sub-feature map to obtain a pooling result corresponding to the above-mentioned target feature map includes: according to the fact that each sub-feature map is in the same pooling window The position of the pixel value of the sub-feature map is determined, and the shift operation mode of the shift register array corresponding to each sub-feature map is determined; the above-mentioned sub-feature maps are respectively loaded into the shift register array, and for each sub-feature map, according to The shift operation mode of the shift register array determined by the sub-feature map performs a shift operation on the pixel values stored in the shift registers in the above-mentioned shift register array. Partial pooling results of different pooling windows; according to the partial pooling results of each sub-feature map corresponding to different pooling windows, determine the pooling results corresponding to the above target feature maps.
在示出的一些实施例中,上述对上述目标特征图进行拆分,得到若干子特征图,包括:将上述目标特征图中处于奇数行、奇数列位置的像素值确定为第一子特征图;将上述目标特征图中处于奇数行、偶数列位置的像素值确定为第二子特征图;将上述目标特征图中处于偶数行、奇数列位置的像素值确定为第三子特征图;将上述目标特征图中处于偶数行、偶数列位置的像素值确定为第四子特征图。In some of the illustrated embodiments, the above-mentioned splitting the above-mentioned target feature map to obtain several sub-feature maps includes: determining the pixel values at odd-numbered rows and odd-numbered column positions in the above-mentioned target feature map as the first sub-feature map ; The pixel value in the odd row, the even column position in the above-mentioned target feature map is determined as the second sub-feature map; The pixel value in the even row, the odd column position in the above-mentioned target feature map is determined as the third sub-feature map; The pixel values in the even-numbered rows and even-numbered columns in the above target feature map are determined as the fourth sub-feature map.
在示出的一些实施例中,上述将上述若干子特征图分别包括的像素值加载至移位寄存器阵列中,根据池化指令,对上述若干子特征图中处于同一位置的像素值并行池化处理,得到与上述目标特征图对应的池化结果,包括:将上述第一子特征图包括的各像素值分别搬移至上述移位寄存器阵列包括的至少部分移位寄存器中;将上述第二子特征图包括的各像素值分别搬移至上述部分移位寄存器中,以使与所述部分移位寄存器中各移位寄存器对应的计算内核根据上述池化指令,对接收的两个像素值进行池化处理,得到 第一池化处理结果;将上述第三子特征图包括的各像素值分别搬移至上述部分移位寄存器中,以使与所述部分移位寄存器中各移位寄存器对应的计算内核根据上述池化指令,对上述第一池化处理结果与接收的像素值进行池化处理,得到第二池化处理结果;将上述第四子特征图包括的各像素值分别搬移至上述部分移位寄存器中,以使与所述部分移位寄存器中各移位寄存器对应的计算内核根据上述池化指令,对上述第二池化处理结果与接收的像素值进行池化处理,得到第三池化处理结果;输出与所述部分移位寄存器中各移位寄存器对应的计算内核分别进行池化处理得到的第三池化处理结果,得到与上述目标特征图对应的池化结果。In some of the illustrated embodiments, the pixel values respectively included in the several sub-feature maps are loaded into the shift register array, and the pixel values in the same position in the several sub-feature maps are pooled in parallel according to the pooling instruction processing to obtain a pooling result corresponding to the target feature map, including: moving each pixel value included in the first sub-feature map to at least part of the shift registers included in the shift register array; Each pixel value included in the feature map is respectively moved to the above-mentioned partial shift register, so that the calculation kernel corresponding to each shift register in the partial shift register pools the received two pixel values according to the above-mentioned pooling instruction. The first pooling processing result is obtained; the pixel values included in the third sub-feature map are respectively moved to the partial shift registers, so that the calculation corresponding to each shift register in the partial shift registers The kernel performs pooling processing on the above-mentioned first pooling processing result and the received pixel value according to the above-mentioned pooling instruction to obtain the second pooling processing result; each pixel value included in the above-mentioned fourth sub-feature map is respectively moved to the above-mentioned part In the shift register, so that the calculation kernel corresponding to each shift register in the partial shift register performs the pooling process on the above-mentioned second pooling processing result and the received pixel value according to the above-mentioned pooling instruction, and obtains the third. Pooling processing result; outputting a third pooling processing result obtained by performing pooling processing on the computing cores corresponding to each of the shift registers in the partial shift registers, and obtaining a pooling result corresponding to the above target feature map.
在示出的一些实施例中,上述池化处理包括最大池化处理;上述池化指令包括比较两者之间的最大值;上述将上述第一子特征图包括的各像素值分别搬移至上述移位寄存器阵列包括的至少部分移位寄存器中,包括:将上述第一子特征图包括的各像素值分别搬移至上述移位寄存器阵列包括的至少部分移位寄存器的第一寄存器中;上述将上述第二子特征图包括的各像素值分别搬移至上述部分移位寄存器中,以使与所述部分移位寄存器中各移位寄存器对应的计算内核根据上述池化指令,对接收的两个像素值进行池化处理,得到第一池化处理结果,包括:将上述第二子特征图包括的各像素值分别搬移至上述部分移位寄存器的第二寄存器中,以使与所述部分移位寄存器中各移位寄存器对应的计算内核根据上述池化指令,获取上述第一寄存器与上述第二寄存器中存储的数值中的最大值,并将上述最大值作为上述第一池化处理结果存储在上述第一寄存器中。In some of the illustrated embodiments, the above-mentioned pooling processing includes maximum pooling processing; the above-mentioned pooling instruction includes comparing the maximum value between the two; the above-mentioned moving each pixel value included in the above-mentioned first sub-feature map to the above-mentioned At least part of the shift registers included in the shift register array includes: moving each pixel value included in the first sub-feature map to the first registers of at least part of the shift registers included in the shift register array; Each pixel value included in the above-mentioned second sub-feature map is respectively moved to the above-mentioned partial shift register, so that the calculation kernel corresponding to each of the shift registers in the partial shift register, according to the above-mentioned pooling instruction, The pixel values are pooled to obtain a first pooling result, which includes: moving each pixel value included in the second sub-feature map to the second register of the partial shift register, so that the partial shift register and the partial shift register are respectively moved. The computing kernel corresponding to each shift register in the bit register obtains the maximum value of the values stored in the first register and the second register according to the pooling instruction, and stores the maximum value as the result of the first pooling process. in the first register above.
在示出的一些实施例中,上述将上述第三子特征图包括的各像素值分别搬移至上述部分移位寄存器中,以使与所述部分移位寄存器中各移位寄存器对应的计算内核根据上述池化指令,对上述第一池化处理结果与接收的像素值进行池化处理,得到第二池化处理结果,包括:将上述第三子特征图包括的各像素值分别搬移至上述部分移位寄存器的第二寄存器中,以使与所述部分移位寄存器中各移位寄存器对应的计算内核根据上述池化指令,获取上述第一寄存器与上述第二寄存器中存储的数值中的最大值,并将上述最大值作为上述第二池化处理结果存储在上述第一寄存器中;上述将上述第四子特征图包括的各像素值分别搬移至上述部分移位寄存器中,以使与所述部分移位寄存器中各移位寄存器对应的计算内核根据上述池化指令,对上述第二池化处理结果与接收的像素值进行池化处理,得到第三池化处理结果,包括:将上述第四子特征图包括的各像素值分别搬移至上述部分移位寄存器的第二寄存器中,以使与所述部分移位寄存器中各移位寄存器对应的计算内核根据上述池化指令,获取上述第一寄存器与上述第二寄存器中存储的数值中的最大值,并将上述最大值作为上述第三池化处理结果存储在上述第一寄存器中。In some of the illustrated embodiments, the pixel values included in the third sub-feature map are respectively moved to the partial shift registers, so that the calculation kernel corresponding to each shift register in the partial shift registers According to the above-mentioned pooling instruction, performing pooling processing on the above-mentioned first pooling processing result and the received pixel values to obtain a second pooling processing result, including: moving each pixel value included in the above-mentioned third sub-feature map to the above-mentioned In the second register of the partial shift register, so that the computing kernel corresponding to each shift register in the partial shift register obtains the value stored in the first register and the second register according to the pooling instruction. The maximum value is stored in the above-mentioned first register as the above-mentioned second pooling processing result; the above-mentioned each pixel value included in the above-mentioned fourth sub-feature map is respectively moved to the above-mentioned part of the shift register, so that the The computing kernel corresponding to each shift register in the partial shift registers performs pooling processing on the above-mentioned second pooling processing result and the received pixel value according to the above-mentioned pooling instruction to obtain a third pooling processing result, including: Each pixel value included in the above-mentioned fourth sub-feature map is respectively moved to the second register of the above-mentioned partial shift register, so that the calculation kernel corresponding to each of the shift registers in the partial shift register obtains according to the above-mentioned pooling instruction. The maximum value among the values stored in the first register and the second register is stored in the first register as the third pooling processing result.
在示出的一些实施例中,上述输出与所述部分移位寄存器中各移位寄存器对应的计算内核分别进行池化处理得到的第三池化处理结果,得到与上述目标特征图对应的池化结果,包括:输出上述部分移位寄存器的第一寄存器中存储的数值,得到与上述目标特征图对应的池化结果。In some of the illustrated embodiments, the above-mentioned output is the third pooling processing result obtained by performing pooling processing on the computing kernels corresponding to each of the shift registers in the partial shift registers, to obtain a pool corresponding to the above-mentioned target feature map. The pooling result includes: outputting the value stored in the first register of the partial shift register to obtain the pooling result corresponding to the target feature map.
在示出的一些实施例中,上述对各子特征图中属于不同池化窗口的像素并行处理,得到上述目标特征图对应的池化结果,包括:将上述第一子特征图包括的各像素值分别搬移至上述移位寄存器阵列包括的至少部分移位寄存器中;根据上述池化指令,对上述移位寄存器阵列包括的任意四个上下左右相邻的移位寄存器中的像素值进行池化操作,得到第一部分池化处理结果,并将上述第一部分池化处理结果存储至上述四个上下左右相邻的移位寄存器中处于预设位置的目标移位寄存器内;将上述第二子特征图包括的各像素值分别搬移至上述部分移位寄存器中;根据上述池化指令,对上述移位寄存器阵列包括的任意两个上下相邻的移位寄存器中的像素值进行池化操作,得到第二部分池化处理结果,并将上述第二部分池化处理结果存储至上述目标移位寄存器;将上述第三子特征图包括的各像素值分别搬移至上述部分移位寄存器中;根据上述池化指令,对上述移位寄存器阵列包括的任意两个左右相邻的移位寄存器中的像素值进行池化操作,得到第三部分池化处理结果,并将上述第三部分池化处理结果存储至上述目标移位寄存器;将上述第四子特征图包括的各像素值分别搬移至上述部分移位寄存器中,并将上述部分移位寄存器中各移位寄存器中的像素值搬移至上述目标移位寄存器。In some of the illustrated embodiments, the above-mentioned parallel processing of pixels belonging to different pooling windows in each sub-feature map to obtain a pooling result corresponding to the above-mentioned target feature map includes: combining each pixel included in the above-mentioned first sub-feature map The values are respectively moved to at least part of the shift registers included in the above-mentioned shift register array; according to the above-mentioned pooling instruction, the pixel values in any four up, down, left, and right adjacent shift registers included in the above-mentioned shift register array are pooled. operation to obtain the first partial pooling processing result, and store the above-mentioned first partial pooling processing result in the target shift register at the preset position among the above-mentioned four adjacent shift registers; The pixel values included in the figure are respectively moved to the above-mentioned partial shift registers; according to the above-mentioned pooling instruction, the pixel values in any two up and down adjacent shift registers included in the above-mentioned shift register array are pooled to obtain The second part of the pooling processing result is stored in the above-mentioned target shift register; the pixel values included in the above-mentioned third sub-feature map are respectively moved to the above-mentioned partial shift register; according to the above The pooling instruction performs a pooling operation on the pixel values in any two left and right adjacent shift registers included in the above-mentioned shift register array to obtain a third part of the pooling processing result, and the above-mentioned third part of the pooling processing result Store to the above-mentioned target shift register; move each pixel value included in the above-mentioned fourth sub-feature map to the above-mentioned partial shift register respectively, and move the pixel values in each of the above-mentioned partial shift registers to the above-mentioned target Shift Register.
根据上述移位寄存器阵列包括的各目标移位寄存器内的第一部分池化处理结果、第二部分池化处理结果、第三部分池化处理结果及上述第四子特征图,得到与上述目标特征图对应的池化结果。According to the first partial pooling processing result, the second partial pooling processing result, the third partial pooling processing result in each target shift register included in the above-mentioned shift register array, and the above-mentioned fourth sub-feature map, the above-mentioned target feature is obtained. The pooling result corresponding to the graph.
在示出的实施例中,上述池化处理包括最大池化处理;上述池化指令包括比较两者之间的最大值;上述预设位置包括任意四个上下左右相邻的移位寄存器中的左下角位置;上述目标移位寄存器包括上述四个上下左右相邻的移位寄存器中的左下角位置的移位寄存器;上述将上述第一子特征图包括的各像素值分别搬移至上述移位寄存器阵列包括的至少部分移位寄存器中,包括:将上述第一子特征图包括的各像素值分别搬移至上述移位寄存器阵列包括的至少部分移位寄存器的第一寄存器中;上述根据上述池化指令,对上述移位寄存器阵列包括的任意四个上下左右相邻的移位寄存器中的像素值进行池化操作,得到第一部分池化处理结果,并将上述第一部分池化处理结果存储至上述四个上下左右相邻的移位寄存器中处于预设位置的目标移位寄存器内,包括:将上述部分移位寄存器中各第一移位寄存器的第一寄存器中的数值搬移至该第一移位寄存器右方的第二移位寄存器的第二寄存器中;将各第二移位寄存器中第一寄存器与第二寄存器中较大的数值存储至各第二移位寄存器的第一寄存器中;将上述各第二移位寄存器中第一寄存器中的数值搬移至该第二移位寄存器下方的第三移位寄存器的第二寄存器中;将各第三移位寄存器中第一寄存器与第二寄存器中较大的数值存储至各第三移位寄存器的第一寄存器中;将上述各第三移位寄存器中第一寄存器中的数值搬移至该第三移位寄存器左方的目标移位寄存器的第二寄存器中;将各目标移位寄存器中第一寄存器与第二寄存器中较大的数值作为上述第一部分池化处理结果,存储至各目标移位寄存器的第一寄存器中。In the illustrated embodiment, the above-mentioned pooling processing includes maximum pooling processing; the above-mentioned pooling instruction includes comparing the maximum value between the two; the above-mentioned preset position includes any four adjacent shift registers in the upper, lower, left and right. the lower left corner position; the above-mentioned target shift register includes the shift register at the lower left corner position in the above-mentioned four adjacent shift registers; the above-mentioned each pixel value included in the above-mentioned first sub-feature map is respectively moved to the above-mentioned shift At least part of the shift registers included in the register array include: moving each pixel value included in the first sub-feature map to the first registers of at least part of the shift registers included in the shift register array; The first part of the pooling processing result is obtained, and the above-mentioned first part of the pooling processing result is stored in the In the target shift register in the preset position among the above-mentioned four adjacent shift registers, including: moving the numerical value in the first register of each first shift register in the above-mentioned partial shift registers to the first shift register. In the second register of the second shift register on the right side of the shift register; store the larger value of the first register and the second register in each second shift register into the first register of each second shift register ; Move the numerical value in the first register of the above-mentioned second shift registers to the second register of the third shift register below the second shift register; The larger value in the two registers is stored in the first register of each third shift register; the value in the first register in each of the third shift registers is moved to the target shift on the left of the third shift register In the second register of the register; the larger value in the first register and the second register in each target shift register is stored in the first register of each target shift register as the above-mentioned first partial pooling processing result.
在示出的一些实施例中,上述将上述第二子特征图包括的各像素值分别搬移至上述部分移位寄存器中,包括:将上述第二子特征图包括的各像素值分别搬移至上述部分移位寄存器的第二寄存器中;上述根据上述池化指令,对上述移位寄存器阵列包括的任意两个上下相邻的移位寄存器中的像素值进行池化操作,得到第二部分池化处理结果,并将上述第二部分池化处理结果存储至上述目标移位寄存器,包括:将上述部分移位寄存器中各第一移位寄存器的第二寄存器中的数值搬移至该第一移位寄存器下方的目标移位寄存器的第三寄存器中;将各目标移位寄存器中第二寄存器与第三寄存器中较大的数值作为上述第二部分池化处理结果,存储至各目标移位寄存器的第二寄存器中。In some of the illustrated embodiments, moving each pixel value included in the second sub-feature map to the partial shift register includes: moving each pixel value included in the second sub-feature map to the above-mentioned partial shift register. In the second register of the partial shift register; the above-mentioned pooling operation is performed on the pixel values in any two upper and lower adjacent shift registers included in the above-mentioned shift register array according to the above-mentioned pooling instruction, and the second partial pooling is obtained. processing the result, and storing the second partial pooling processing result in the target shift register, including: moving the value in the second register of each first shift register in the partial shift register to the first shift register In the third register of the target shift register below the register; the larger value in the second register and the third register in each target shift register is used as the result of the above-mentioned second part of the pooling processing, and is stored in each target shift register. in the second register.
在示出的一些实施例中,上述将上述第三子特征图包括的各像素值分别搬移至上述部分移位寄存器中,包括:将上述第三子特征图包括的各像素值分别搬移至上述部分移位寄存器的第三寄存器中;上述根据上述池化指令,对上述移位寄存器阵列包括的任意两个左右相邻的移位寄存器中的像素值进行池化操作,得到第三部分池化处理结果,并将上述第三部分池化处理结果存储至上述目标移位寄存器,包括:将上述部分移位寄存器中各第一移位寄存器的第三寄存器中的数值搬移至该第一移位寄存器右方的第二移位寄存器的第四寄存器中;将各第二移位寄存器中第三寄存器与第四寄存器中较大的数值作为上述第三部分池化处理结果,存储至各第二移位寄存器的第三寄存器中;将上述第二移位寄存器的第三寄存器中的第三部分池化处理结果,搬移至该第二移位寄存器左方的第一移位寄存器的第三寄存器中;将上述第一移位寄存器的第三寄存器中的第三部分池化处理结果,搬移至该第一移位寄存器下方的目标移位寄存器的第三寄存器中。In some of the illustrated embodiments, the above-mentioned moving each pixel value included in the third sub-feature map to the partial shift register includes: moving each pixel value included in the third sub-feature map to the above-mentioned In the third register of the partial shift register; the above-mentioned pooling operation is performed on the pixel values in any two left and right adjacent shift registers included in the above-mentioned shift register array according to the above-mentioned pooling instruction, so as to obtain a third partial pooling operation. processing results, and storing the above-mentioned third partial pooling processing results in the above-mentioned target shift register, including: moving the value in the third register of each first shift register in the above-mentioned partial shift register to the first shift register In the fourth register of the second shift register on the right side of the register; the larger value in the third register and the fourth register in each second shift register is used as the result of the third part of the pooling process, and is stored in each second shift register. In the third register of the shift register; move the third part of the pooling processing result in the third register of the second shift register to the third register of the first shift register to the left of the second shift register in; the third part of the pooling processing result in the third register of the first shift register is moved to the third register of the target shift register below the first shift register.
在示出的一些实施例中,将上述第四子特征图包括的各像素值分别搬移至上述部分移位寄存器中,包括:将上述第四子特征图包括的各像素值分别搬移至上述部分移位寄存器的第四寄存器中;上述将上述部分移位寄存器中各移位寄存器中的像素值搬移至上述目标移位寄存器,包括:将上述部分移位寄存器中各第一移位寄存器的第四寄存器中的数值搬移至该第一移位寄存器下方的目标移位寄存器的第四寄存器中。In some of the illustrated embodiments, moving each pixel value included in the fourth sub-feature map to the partial shift register includes: moving each pixel value included in the fourth sub-feature map to the aforementioned part In the fourth register of the shift register; the above-mentioned moving the pixel values in each of the shift registers in the above-mentioned partial shift registers to the above-mentioned target shift register includes: The values in the four registers are moved to the fourth register of the target shift register below the first shift register.
在示出的一些实施例中,上述根据上述移位寄存器阵列包括的各目标移位寄存器内的第一部分池化处理结果、第二部分池化处理结果、第三部分池化处理结果及上述第四子特征图对应的像素值,得到与上述目标特征图对应的池化结果,包括:将各目标移位寄存器中的第一寄存器与第二寄存器中较大的数值存储至各目标移位寄存器的第一寄 存器中;将各目标移位寄存器中的第一寄存器与第三寄存器中较大的数值存储至各目标移位寄存器的第一寄存器中;将各目标移位寄存器中的第一寄存器与第四寄存器中较大的数值存储至各目标移位寄存器的第一寄存器中;输出各目标移位寄存器的第一寄存器中存储的数值,得到与上述目标特征图对应的池化结果。In some of the illustrated embodiments, the first partial pooling processing result, the second partial pooling processing result, the third partial pooling processing result and the above-mentioned first partial pooling processing result in each target shift register included in the above-mentioned shift register array The pixel values corresponding to the four sub-feature maps are obtained to obtain the pooling result corresponding to the above target feature map, including: storing the larger value in the first register and the second register in each target shift register into each target shift register in the first register of each target shift register; store the larger value of the first register in each target shift register and the third register in the first register of each target shift register; store the first register in each target shift register The larger value in the fourth register is stored in the first register of each target shift register; the value stored in the first register of each target shift register is output to obtain the pooling result corresponding to the above target feature map.
在示出的一些实施例中,上述移位寄存器阵列外围连接了多个临时寄存器;上述临时寄存器用于存储进行数值搬移操作时,溢出上述移位寄存器阵列的像素值。In some of the illustrated embodiments, a plurality of temporary registers are connected to the periphery of the above-mentioned shift register array; the above-mentioned temporary registers are used to store the pixel values that overflow the above-mentioned shift register array when performing a numerical value transfer operation.
在示出的一些实施例中,上述若干子特征图中至少部分子特征图包括的像素点数量与上述移位寄存器阵列包括的移位寄存器数量一致。In some of the illustrated embodiments, the number of pixels included in at least some of the sub-feature maps in the above-mentioned several sub-feature maps is consistent with the number of shift registers included in the above-mentioned shift register array.
本申请还提出一种池化方法,上述方法可包括:获取原始特征图;将上述原始特征图划分为若干目标特征图;根据前述任一实施例示出的池化方法对各目标特征图进行池化处理,得到各目标特征图对应的池化结果;输出各目标特征图对应的池化结果,得到上述原始特征图对应的池化结果。The present application also proposes a pooling method, which may include: obtaining an original feature map; dividing the original feature map into several target feature maps; pooling each target feature map according to the pooling method shown in any of the foregoing embodiments After processing, the pooling results corresponding to each target feature map are obtained; the pooling results corresponding to each target feature map are output to obtain the pooling results corresponding to the above-mentioned original feature maps.
本申请还提出一种芯片,上述芯片可包括控制器;上述控制器,用于获取目标特征图;对上述目标特征图进行拆分,得到若干子特征图;其中,上述目标特征图中处于同一池化窗口内的至少部分像素值分别处于不同的子特征图,各池化窗口内处于同一位置的像素值处于同一子特征图;对各子特征图中属于不同池化窗口的像素并行处理,得到上述目标特征图对应的池化结果。The application also proposes a chip, the chip may include a controller; the controller is used to obtain a target feature map; the target feature map is split to obtain several sub-feature maps; wherein, the target feature maps are in the same At least part of the pixel values in the pooling window are in different sub-feature maps, and the pixel values in the same position in each pooling window are in the same sub-feature map; the pixels belonging to different pooling windows in each sub-feature map are processed in parallel, The pooling result corresponding to the above target feature map is obtained.
在示出的一些实施例中,上述控制器用于:将上述各子特征图分别包括的像素值加载至移位寄存器阵列中,根据池化指令,对上述各子特征图中处于同一位置的像素值并行池化处理,得到与上述目标特征图对应的池化结果。In some of the illustrated embodiments, the above-mentioned controller is configured to: load the pixel values respectively included in the above-mentioned sub-feature maps into the shift register array; The values are pooled in parallel to obtain the pooling results corresponding to the above target feature maps.
在示出的一些实施例中,上述控制器用于:根据各子特征图中处于同一池化窗口的像素值所处的位置,确定各子特征图分别对应的移位寄存器阵列的移位操作方式;将上述若干子特征图分别加载至移位寄存器阵列中,并针对每个子特征图,按照为该子特征图确定的移位寄存器阵列的移位操作方式对上述移位寄存器阵列中的移位寄存器存储的像素值执行移位操作,根据池化指令并行池化处理得到该子特征图中对应不同池化窗口的部分池化结果;根据各个子特征图对应不同池化窗口的部分池化结果,确定与上述目标特征图对应的池化结果。In some of the illustrated embodiments, the above-mentioned controller is configured to: determine the shift operation mode of the shift register array corresponding to each sub-feature map according to the positions of the pixel values in the same pooling window in each sub-feature map ; Load the above-mentioned several sub-feature maps into the shift register array respectively, and for each sub-feature map, shift the shift register array in the above-mentioned shift register array according to the shift operation mode of the shift register array determined for the sub-feature map. Shift operation is performed on the pixel values stored in the register, and the partial pooling results corresponding to different pooling windows in the sub-feature map are obtained by parallel pooling processing according to the pooling instruction; the partial pooling results corresponding to different pooling windows are obtained according to each sub-feature map. , and determine the pooling result corresponding to the above target feature map.
在示出的一些实施例中,上述控制器用于:将上述目标特征图中处于奇数行、奇数列位置的像素值确定为第一子特征图;将上述目标特征图中处于奇数行、偶数列位置的像素值确定为第二子特征图;将上述目标特征图中处于偶数行、奇数列位置的像素值确定为第三子特征图;将上述目标特征图中处于偶数行、偶数列位置的像素值确定为第四子特征图。In some of the illustrated embodiments, the controller is configured to: determine the pixel values in odd rows and odd columns in the target feature map as the first sub-feature map; The pixel value of the position is determined as the second sub-feature map; the pixel value in the even-numbered row and odd-numbered column position in the above-mentioned target feature map is determined as the third sub-feature map; The pixel value is determined as the fourth sub-feature map.
在示出的一些实施例中,上述控制器用于:将上述第一子特征图包括的各像素值分别搬移至上述移位寄存器阵列包括的至少部分移位寄存器中;将上述第二子特征图包括的各像素值分别搬移至上述部分移位寄存器中,以使与各移位寄存器对应的计算内核根据上述池化指令,对接收的两个像素值进行池化处理,得到第一池化处理结果;将上述第三子特征图包括的各像素值分别搬移至上述部分移位寄存器中,以使与所述部分移位寄存器中各移位寄存器对应的计算内核根据上述池化指令,对上述第一池化处理结果与接收的像素值进行池化处理,得到第二池化处理结果;将上述第四子特征图包括的各像素值分别搬移至上述部分移位寄存器中,以使与所述部分移位寄存器中各移位寄存器对应的计算内核根据上述池化指令,对上述第二池化处理结果与接收的像素值进行池化处理,得到第三池化处理结果;输出与所述部分移位寄存器中各移位寄存器对应的计算内核分别进行池化处理得到的第三池化处理结果,得到与上述目标特征图对应的池化结果。In some of the illustrated embodiments, the controller is configured to: move each pixel value included in the first sub-feature map to at least part of the shift registers included in the shift register array; The included pixel values are respectively moved to the above-mentioned partial shift registers, so that the calculation kernel corresponding to each shift register performs pooling processing on the received two pixel values according to the above-mentioned pooling instructions, and obtains the first pooling processing. Result: each pixel value included in the above-mentioned third sub-feature map is respectively moved to the above-mentioned partial shift registers, so that the calculation kernel corresponding to each of the shift registers in the partial shift registers can perform the above-mentioned pooling instruction according to the above-mentioned pooling instruction. The first pooling processing result and the received pixel values are pooled to obtain the second pooling processing result; the pixel values included in the fourth sub-feature map are respectively moved to the above-mentioned partial shift registers, so as to be consistent with all the pixel values. The computing kernel corresponding to each shift register in the partial shift registers performs pooling processing on the above-mentioned second pooling processing result and the received pixel value according to the above-mentioned pooling instruction to obtain the third pooling processing result; The calculation kernels corresponding to each shift register in the partial shift registers respectively perform pooling processing to obtain the third pooling processing result, and obtain the pooling result corresponding to the above-mentioned target feature map.
在示出的一些实施例中,上述池化处理包括最大池化处理;上述池化指令包括比较两者之间的最大值;上述控制器用于:将上述第一子特征图包括的各像素值分别搬移至上述移位寄存器阵列包括的至少部分移位寄存器的第一寄存器中;及,将上述第二子特征图包括的各像素值分别搬移至上述部分移位寄存器的第二寄存器中,以使与所述部分移位寄存器中各移位寄存器对应的计算内核根据上述池化指令,获取上述第一寄存器与 上述第二寄存器中存储的数值中的最大值,并将上述最大值作为上述第一池化处理结果存储在上述第一寄存器中。In some of the illustrated embodiments, the above-mentioned pooling processing includes maximum pooling processing; the above-mentioned pooling instruction includes comparing the maximum value between the two; the above-mentioned controller is configured to: combine the pixel values included in the above-mentioned first sub-feature map are respectively moved to the first registers of at least part of the shift registers included in the above-mentioned shift register array; and each pixel value included in the above-mentioned second sub-feature map is respectively moved to the second registers of the above-mentioned part of the shift registers, to Make the calculation kernel corresponding to each shift register in the partial shift register obtain the maximum value of the values stored in the first register and the second register according to the pooling instruction, and use the maximum value as the first A pooling result is stored in the above-mentioned first register.
在示出的一些实施例中,上述控制器用于:将上述第三子特征图包括的各像素值分别搬移至上述部分移位寄存器的第二寄存器中,以使与所述部分移位寄存器中各移位寄存器对应的计算内核根据上述池化指令,获取上述第一寄存器与上述第二寄存器中存储的数值中的最大值,并将上述最大值作为上述第二池化处理结果存储在上述第一寄存器中;及,将上述第四子特征图包括的各像素值分别搬移至上述部分移位寄存器的第二寄存器中,以使与所述部分移位寄存器中各移位寄存器对应的计算内核根据上述池化指令,获取上述第一寄存器与上述第二寄存器中存储的数值中的最大值,并将上述最大值作为上述第三池化处理结果存储在上述第一寄存器中。In some of the illustrated embodiments, the controller is configured to: move each pixel value included in the third sub-feature map to the second register of the partial shift register, so as to be consistent with the partial shift register. The computing kernel corresponding to each shift register obtains the maximum value of the values stored in the first register and the second register according to the above-mentioned pooling instruction, and stores the above-mentioned maximum value as the result of the above-mentioned second pooling process in the above-mentioned No. In a register; and, moving each pixel value included in the fourth sub-characteristic map to the second register of the partial shift register, so that the calculation kernel corresponding to each shift register in the partial shift register According to the pooling instruction, the maximum value of the values stored in the first register and the second register is acquired, and the maximum value is stored in the first register as the third pooling processing result.
在示出的一些实施例中,上述控制器用于:输出上述部分移位寄存器的第一寄存器中存储的数值,得到与上述目标特征图对应的池化结果。In some of the illustrated embodiments, the controller is configured to: output the value stored in the first register of the partial shift register to obtain a pooling result corresponding to the target feature map.
本申请还提出一种芯片,上述芯片可包括控制器;上述控制器,用于获取原始特征图;将上述原始特征图划分为若干目标特征图;根据前述任一实施例示出的池化方法对各目标特征图进行池化处理,得到各目标特征图对应的池化结果;输出各目标特征图对应的池化结果,得到上述原始特征图对应的池化结果。The present application also proposes a chip, the chip may include a controller; the controller is used to obtain an original feature map; the original feature map is divided into several target feature maps; Each target feature map is pooled to obtain the pooling result corresponding to each target feature map; the pooling result corresponding to each target feature map is output to obtain the pooling result corresponding to the above-mentioned original feature map.
本申请还提出一种电子设备,包括如前述任一实施例示出的芯片。The present application also provides an electronic device, including the chip shown in any of the foregoing embodiments.
本申请还提出一种计算机可读存储介质,其上存储有计算机程序,上述程序被控制器执行时实现上述任一池化方法。The present application also provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by the controller, any one of the pooling methods described above is implemented.
在上述方案中,由于将目标特征图中处于同一池化窗口内的至少部分像素值分别拆分至不同的子特征图中,并对各子特征图中属于不同池化窗口的像素并行处理,得到上述目标特征图对应的池化结果,因此提升了芯片池化处理效率,减轻了芯片的计算负担,降低了芯片设计的难度。In the above solution, since at least part of the pixel values in the same pooling window in the target feature map are split into different sub-feature maps respectively, and the pixels belonging to different pooling windows in each sub-feature map are processed in parallel, The pooling result corresponding to the above target feature map is obtained, thus improving the chip pooling processing efficiency, reducing the computational burden of the chip, and reducing the difficulty of chip design.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本申请。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not limiting of the present application.
附图说明Description of drawings
为了更清楚地说明本申请一个或多个实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请一个或多个实施例中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in one or more embodiments of the present application or related technologies, the accompanying drawings required in the description of the embodiments or related technologies will be briefly introduced below. Obviously, the accompanying drawings in the following description The drawings are only some of the embodiments described in one or more embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without any creative effort.
图1为本申请实施例示出的一种移位寄存器阵列的结构示意图;FIG. 1 is a schematic structural diagram of a shift register array according to an embodiment of the application;
图2为本申请实施例示出的一种PE的结构示意图;2 is a schematic structural diagram of a PE shown in an embodiment of the application;
图3为本申请实施例示出的一种池化方法的流程图;FIG. 3 is a flowchart of a pooling method according to an embodiment of the present application;
图4为本申请实施例示出的一种目标特征图的拆分过程示意图;4 is a schematic diagram of a splitting process of a target feature map shown in an embodiment of the application;
图5为本申请实施例示出的一种目标特征图的拆分过程示意图;5 is a schematic diagram of a splitting process of a target feature map shown in an embodiment of the application;
图6为本申请实施例示出的一种池化窗口示意图;FIG. 6 is a schematic diagram of a pooling window shown in an embodiment of the present application;
图7为本申请实施例示出的一种针对第一子特征图像素值搬移的示意图;FIG. 7 is a schematic diagram of shifting pixel values of a first sub-feature map according to an embodiment of the present application;
图8为本申请实施例示出的一种针对第二子特征图像素值搬移的示意图;FIG. 8 is a schematic diagram of transferring pixel values for a second sub-feature map according to an embodiment of the present application;
图9为本申请实施例示出的一种针对第三子特征图像素值搬移的示意图;FIG. 9 is a schematic diagram of transferring pixel values for a third sub-feature map according to an embodiment of the present application;
图10为本申请实施例示出的一种池化方法的流程图。FIG. 10 is a flowchart of a pooling method according to an embodiment of the present application.
具体实施方式Detailed ways
下面将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的设备和方法的例子。Exemplary embodiments will be described in detail below, examples of which are illustrated in the accompanying drawings. Where the following description refers to the drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the illustrative examples below are not intended to represent all implementations consistent with this application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as recited in the appended claims.
在本申请使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本申请。在本申请和所附权利要求书中所使用的单数形式的“一种”、“上述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。还应当理解,本文中所使用的词语“如果”,取决于语境,可被解释成为“在……时”或“当……时”或“响应于确定”。The terminology used in this application is for the purpose of describing particular embodiments only and is not intended to limit the application. As used in this application and the appended claims, the singular forms "a," "above," and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It will also be understood that the term "and/or" as used herein refers to and includes any and all possible combinations of one or more of the associated listed items. It should also be understood that the word "if", as used herein, can be interpreted as "at the time of" or "when" or "in response to determining", depending on the context.
以下先介绍AI芯片使用的移位寄存器阵列。The following first introduces the shift register array used by the AI chip.
请参见图1,图1为本申请示出的一种移位寄存器阵列的结构示意图。Please refer to FIG. 1 , which is a schematic structural diagram of a shift register array shown in the present application.
移位寄存器阵列中可包括纵横排列的多个移位寄存器,各移位寄存器可分别唯一对应计算内核(Processing Element,以下简称PE),各PE用于根据移位寄存器中的数值进行运算。如图1所示,可认为上述移位寄存器阵列包括纵横排列的多个PE。任意相邻的两个PE(PE对应的移位寄存器)之间可进行数据搬移;每一行PE可从对应的RAM(Random Access Memory,随机存取存储器)中获取数据。假设上述移位寄存器阵列的大小为8*8;需要输入的特征图大小也为8*8。在执行将特征图输入移位寄存器阵列的操作时,可通过控制器(例如,阵列控制器,也可被称为处理器)将上述特征图拆分为8行像素值,并分别把8行像素值输入每行PE对应的RAM中。然后上述控制器可通过数据搬移指令,将各RAM中的像素值,按照各像素值在上述特征图中的位置顺序,分别搬移至对应的各移位寄存器中,从而完成将特征图输入移位寄存器的操作。The shift register array may include a plurality of shift registers arranged vertically and horizontally, and each shift register may uniquely correspond to a computing core (Processing Element, hereinafter referred to as PE), and each PE is used to perform operations according to the values in the shift register. As shown in FIG. 1 , it can be considered that the above-mentioned shift register array includes a plurality of PEs arranged vertically and horizontally. Data can be moved between any two adjacent PEs (shift registers corresponding to PEs); each row of PEs can obtain data from the corresponding RAM (Random Access Memory). Assume that the size of the above shift register array is 8*8; the size of the feature map that needs to be input is also 8*8. When performing the operation of inputting the feature map into the shift register array, the above-mentioned feature map may be split into 8 rows of pixel values by a controller (eg, an array controller, also referred to as a processor), and the 8 rows of pixel values are divided into 8 rows respectively. The pixel value is input into the RAM corresponding to each row of PE. Then, the above-mentioned controller can move the pixel values in each RAM to the corresponding shift registers according to the position sequence of each pixel value in the above-mentioned feature map through the data moving instruction, so as to complete the input shift of the feature map. operation of the register.
上述PE可响应于指令,对移位寄存器内的数据进行数据运算。The above-mentioned PE can perform data operation on the data in the shift register in response to the instruction.
请参见图2,图2为本申请示出的一种PE的结构示意图。如图2所示,对应于上述移位寄存器的PE可包括寄存器,及ALU(arithmetic and logic unit,算术逻辑单元)。其中,上述寄存器可是对上述移位寄存器进行存储空间划分后得到的寄存器。在一些例子中,上述移位寄存器可根据实际需求配置为若干个相互之间可进行数据搬移的寄存器(例如,图2中示出的寄存器1和寄存器2)。上述PE可根据运算指令,对多个寄存器内的数据进行运算处理。上述ALU用户执行逻辑运算。例如,当PE接收到诸如数值加减或比较大小的运算指令后,可通过上述ALU可对寄存器中存储的数值进行相关运算操作。Please refer to FIG. 2 , which is a schematic structural diagram of a PE shown in this application. As shown in FIG. 2 , the PE corresponding to the above-mentioned shift register may include a register, and an ALU (arithmetic and logic unit, arithmetic logic unit). The above-mentioned register may be a register obtained by dividing the storage space of the above-mentioned shift register. In some examples, the above-mentioned shift register may be configured as several registers (for example, register 1 and register 2 shown in FIG. 2 ) that can move data between each other according to actual requirements. The above-mentioned PE can perform arithmetic processing on the data in the multiple registers according to the arithmetic instruction. The ALU user described above performs logical operations. For example, when the PE receives an operation instruction such as adding or subtracting a value or comparing the size, the above-mentioned ALU can perform a relevant operation on the value stored in the register.
在一些例子中,本申请实施例中上述的移位寄存器阵列包括多个移位寄存器,这里的移位寄存器与各自的PE对应,每个移位寄存器(或者说每个PE)包含多个寄存器(例如:第一寄存器、第二寄存器、第三寄存器、第四寄存器等,这里不限制每个PE的寄存器数量)。In some examples, the above-mentioned shift register array in the embodiments of the present application includes a plurality of shift registers, where the shift registers correspond to respective PEs, and each shift register (or each PE) includes a plurality of registers (For example: the first register, the second register, the third register, the fourth register, etc., the number of registers per PE is not limited here).
本申请提出一种池化方法。该方法将目标特征图中处于同一池化窗口内的至少部分像素值分别拆分至不同的子特征图中,并对各子特征图中属于不同池化窗口的像素并行处理,得到上述目标特征图对应的池化结果,从而提升芯片池化处理效率,减轻芯片的计算负担,降低芯片设计的难度。This application proposes a pooling method. The method splits at least part of the pixel values in the same pooling window in the target feature map into different sub-feature maps respectively, and processes the pixels belonging to different pooling windows in each sub-feature map in parallel to obtain the above target feature The pooling result corresponding to the graph can improve the efficiency of chip pooling processing, reduce the computational burden of the chip, and reduce the difficulty of chip design.
请参见图3,图3为本申请示出的一种池化方法的流程图。如图3所示,上述池化方法可包括步骤S302至步骤S306。Please refer to FIG. 3 , which is a flowchart of a pooling method shown in this application. As shown in FIG. 3 , the above-mentioned pooling method may include steps S302 to S306.
S302,获取目标特征图。S302, acquiring a target feature map.
上述目标特征图,可是需要进行池化处理的特征图。在一些例子中,上述目标特征图可是经过卷积处理后,需进行池化处理的特征图。在一些例子中,上述目标特征图可是经过上述移位寄存器阵列中的各PE进行卷积处理得到的目标特征图。可理解的是,上述目标特征图可存储在各行PE对应的RAM中。The above target feature map is a feature map that needs to be pooled. In some examples, the above target feature map may be a feature map that needs to be pooled after convolution processing. In some examples, the target feature map may be a target feature map obtained by performing convolution processing on each PE in the shift register array. It is understandable that, the above-mentioned target feature map may be stored in the RAM corresponding to each row of PE.
S304,对上述目标特征图进行拆分,得到若干子特征图;其中,上述目标特征图中处于同一池化窗口内的至少部分像素值分别处于不同的子特征图,各池化窗口内处于同一位置的像素值处于同一子特征图。S304, splitting the target feature map to obtain several sub-feature maps; wherein, at least some pixel values in the same pooling window in the target feature map are respectively in different sub-feature maps, and each pooling window is in the same sub-feature map The pixel values of the positions are in the same sub-feature map.
池化处理通常包括根据业务需求设定的预设大小的池化窗口和预设大小的步长。以池化窗口为2*2,步长为2为例,在对特征图进行池化操作时,可理解为从特征图左上角的第一像素值开始,以该第一像素值为左上角元素形成2*2大小的池化窗口。然后针对该池化窗口内包括的各像素值取最大值,完成该池化窗口内的池化操作。之后,按照步长为2,向第一像素值右方滑动两个像素值,并形成2*2的池化窗口。然后,再针对当前池化窗口内的像素值进行池化操作。以此类推,当针对所有池化窗口进行池化操作后,对各池化窗口输出的最大像素值进行组合,即可得到上述特征图对应的池化结果。The pooling process usually includes a pooling window of a preset size and a step size of a preset size set according to business requirements. Taking the pooling window as 2*2 and the step size as 2 as an example, when performing the pooling operation on the feature map, it can be understood as starting from the first pixel value in the upper left corner of the feature map, and taking the first pixel value as the upper left corner. The elements form a pooling window of size 2*2. Then, taking the maximum value for each pixel value included in the pooling window, the pooling operation in the pooling window is completed. After that, according to the step size of 2, slide two pixel values to the right of the first pixel value, and form a 2*2 pooling window. Then, the pooling operation is performed on the pixel values in the current pooling window. By analogy, after the pooling operation is performed on all pooling windows, the maximum pixel value output by each pooling window is combined to obtain the pooling result corresponding to the above feature map.
池化窗口可包括若干像素值。各像素值可处于池化窗口的不同位置。以池化窗口为2*2为例。池化窗口内的4个像素值可分别处于池化窗口的左上角,左下角,右上角和右下角。在一些例子中,在执行S304时,可根据预设池化窗口的大小,及各像素值的位置分布规律,确定满足上述目标特征图中处于同一池化窗口内的至少部分像素值分别处于不同的子特征图,各池化窗口内处于同一位置的像素值处于同一子特征图的条件的拆分方法。然后根据确定的上述拆分方法对上述目标特征图进行拆分,得到若干子特征图。例如,当池化窗口为2*2时,可确定池化窗口内的各像素值分别处于奇数行,奇数列;奇数行,偶数列;偶数行,奇数列;偶数行,偶数列。因此,可将将上述目标特征图中处于奇数行、奇数列位置的像素值确定为第一子特征图;将上述目标特征图中处于奇数行、偶数列位置的像素值确定为第二子特征图;将上述目标特征图中处于偶数行、奇数列位置的像素值确定为第三子特征图;将上述目标特征图中处于偶数行、偶数列位置的像素值确定为第四子特征图。The pooling window may include several pixel values. Each pixel value can be in a different position of the pooling window. Take the pooling window as 2*2 as an example. The 4 pixel values in the pooling window can be located in the upper left corner, lower left corner, upper right corner and lower right corner of the pooling window, respectively. In some examples, when performing S304, according to the size of the preset pooling window and the position distribution rule of each pixel value, it is determined that at least some of the pixel values in the same pooling window in the target feature map that satisfy the above-mentioned target feature map are in different positions, respectively. The sub-feature map of , the splitting method under the condition that the pixel values in the same position in each pooling window are in the same sub-feature map. Then, the target feature map is split according to the determined splitting method to obtain several sub-feature maps. For example, when the pooling window is 2*2, it can be determined that each pixel value in the pooling window is located in odd rows, odd columns; odd rows, even columns; even rows, odd columns; even rows, even columns. Therefore, pixel values in odd rows and odd columns in the target feature map can be determined as the first sub-feature map; pixel values in odd rows and even columns in the target feature map can be determined as the second sub-feature Figure; the pixel value in the position of the even row and the odd column in the above target feature map is determined as the third sub-feature map; the pixel value in the position of the even row and the even column in the above target feature map is determined as the fourth sub-feature map.
由此,即可将上述目标特征图进行拆分,得到满足上述目标特征图中处于同一池化窗口内的至少部分像素值分别处于不同的子特征图,各池化窗口内处于同一位置的像素值处于同一子特征图条件的拆分结果,得到4个子特征图。In this way, the above target feature map can be split, and at least some pixel values in the same pooling window in the above target feature map are obtained in different sub-feature maps respectively, and the pixels in the same position in each pooling window are obtained. The split results with values in the same sub-feature map condition, and 4 sub-feature maps are obtained.
在一些例子中,可预先维护池化处理与拆分方案的对应关系。在对目标特征图进行拆分时,可先确定池化处理的池化窗口,步长等参数。然后根据确定的参数,查询上述对应关系,得到对应的拆分方案,并根据上述拆分方案对上述目标特征图进行拆分。In some examples, the correspondence between pooling and splitting schemes may be maintained in advance. When splitting the target feature map, parameters such as the pooling window and step size of the pooling process can be determined first. Then, according to the determined parameters, the above-mentioned corresponding relationship is queried to obtain a corresponding splitting scheme, and the above-mentioned target feature map is split according to the above-mentioned splitting scheme.
在完成目标特征图的拆分后,可执行S306,对各子特征图中属于不同池化窗口的像素并行处理,得到上述目标特征图对应的池化结果。After the splitting of the target feature map is completed, S306 can be executed to perform parallel processing on pixels belonging to different pooling windows in each sub-feature map to obtain the pooling result corresponding to the above target feature map.
在一些例子中,可将上述若干子特征图分别包括的像素值加载至移位寄存器阵列中,根据池化指令,对上述若干子特征图中处于同一位置的像素值并行池化处理,得到与上述目标特征图对应的池化结果。上述池化指令,可包括与当前池化处理对应的指令。该指令可预先生成。当池化处理为最大池化时,上述池化指令可是比较两者之间的最大值。当池化处理为平均池化时,上述池化指令可是求和或求平均值。上述池化结果,可包括对目标特征图进行池化处理后得到的池化处理结果。In some examples, the pixel values respectively included in the above-mentioned several sub-feature maps can be loaded into the shift register array, and according to the pooling instruction, the pixel values in the same position in the above-mentioned several sub-feature maps can be pooled in parallel to obtain the same The pooling result corresponding to the above target feature map. The above-mentioned pooling instruction may include an instruction corresponding to the current pooling process. This instruction can be pre-generated. When the pooling process is max pooling, the above-mentioned pooling instruction can compare the maximum value between the two. When the pooling process is average pooling, the above-mentioned pooling instructions may be summation or average value. The above pooling result may include a pooling result obtained after the target feature map is pooled.
在一些例子中,在执行S306时,可按照预设搬移方法,依次将各子特征图包括的像素值搬移至移位寄存器阵列包括的移位寄存器,以使各子特征图中处于同一位置的像素值搬移至同一移位寄存器中。上述预设搬移方法,可是按照各像素值在目标特征图中的排序,将各像素值搬移至上述移位寄存器阵列中。在本申请中不特别限定上述搬移方法。可理解的是,按照相同的搬移方法对各子特征图进行像素值搬移即可保证各子特征图中处于同一位置的像素值搬移至同一移位寄存器。In some examples, when S306 is executed, the pixel values included in each sub-feature map may be sequentially moved to the shift registers included in the shift register array according to a preset moving method, so that the pixels in the same position in each sub-feature map are moved. Pixel values are moved into the same shift register. In the above-mentioned preset moving method, each pixel value is moved to the above-mentioned shift register array according to the order of each pixel value in the target feature map. The above-mentioned transfer method is not particularly limited in this application. It is understandable that the pixel values of each sub-feature map are transferred according to the same transfer method to ensure that the pixel values in the same position in each sub-feature map are transferred to the same shift register.
然后,与各移位寄存器对应的PE可根据池化指令,对接收到的像素值进行并行池化处理,得到与上述目标特征图对应的池化结果。例如,以池化处理为最大池化为例。各移位寄存器每收到新的像素值时,可通过与其对应的PE将新收到的像素值与已存储的像素值进行比较,获取最大值,并将最大值覆盖至该移位寄存器。由此,当完成目标特征图拆分出来的各子特征图的输入后,上述移位寄存器中即可包括各池化窗口内的最大像素值。之后,将寄存器阵列中存储的最大像素值输出,即可得到针对上述目标特征 图的池化结果。Then, the PE corresponding to each shift register can perform parallel pooling processing on the received pixel values according to the pooling instruction to obtain the pooling result corresponding to the above target feature map. For example, take the pooling process as max pooling. When each shift register receives a new pixel value, it can compare the newly received pixel value with the stored pixel value through its corresponding PE, obtain the maximum value, and cover the maximum value to the shift register. Therefore, after the input of each sub-feature map split from the target feature map is completed, the above-mentioned shift register can include the maximum pixel value in each pooling window. After that, output the maximum pixel value stored in the register array to obtain the pooling result for the above target feature map.
上述例子中,在目标特征图拆分结果中,同一池化窗口中包括的全部像素值均处于不同的子特征图,此时可将上述若干子特征图分别包括的像素值加载至移位寄存器阵列中,根据池化指令,对上述若干子特征图中处于同一位置的像素值并行池化处理,得到与上述目标特征图对应的池化结果,从而可利用移位寄存器阵列对应的多个PE并行进行各池化窗口的池化处理操作,进而提升了池化处理效率,减轻了芯片的计算负担,降低了芯片设计的难度。In the above example, in the target feature map split result, all pixel values included in the same pooling window are in different sub-feature maps. At this time, the pixel values respectively included in the above-mentioned sub-feature maps can be loaded into the shift register. In the array, according to the pooling instruction, the pixel values in the same position in the above-mentioned several sub-feature maps are pooled in parallel to obtain the pooling result corresponding to the above-mentioned target feature map, so that multiple PEs corresponding to the shift register array can be used. The pooling processing operations of each pooling window are performed in parallel, thereby improving the pooling processing efficiency, reducing the computational burden of the chip, and reducing the difficulty of chip design.
在一些例子中,在执行S306时,可根据各子特征图中处于同一池化窗口的像素值所处的位置,确定各子特征图分别对应的移位寄存器阵列的移位操作方式。In some examples, when S306 is performed, the shift operation mode of the shift register array corresponding to each sub-feature map may be determined according to the positions of the pixel values in the same pooling window in each sub-feature map.
在确定上述移位操作方式后,可将上述若干子特征图分别加载至移位寄存器阵列中,并针对每个子特征图,按照为该子特征图确定的移位寄存器阵列的移位操作方式对上述移位寄存器阵列中的移位寄存器存储的像素值执行移位操作,然后根据池化指令并行池化处理得到该子特征图中对应不同池化窗口的部分池化结果,并根据各个子特征图对应不同池化窗口的部分池化结果,确定与上述目标特征图对应的池化结果。After the above-mentioned shift operation mode is determined, the above-mentioned several sub-feature maps can be loaded into the shift register array respectively, and for each sub-feature map, according to the shift operation mode of the shift register array determined for the sub-feature map The pixel values stored in the shift register array in the above-mentioned shift register array perform a shift operation, and then perform parallel pooling processing according to the pooling instruction to obtain partial pooling results corresponding to different pooling windows in the sub-feature map. The graphs correspond to partial pooling results of different pooling windows, and determine the pooling results corresponding to the above target feature graphs.
上述例子中,在目标特征图拆分结果中,同一池化窗口中有部分像素值处于同一子特征图中,此时可先对各子特征图中处于同一池化窗口中的像素值进行池化,得到各子特征图中对应不同池化窗口的部分池化结果;然后再对各子特征图中对应同一池化窗口的部分池化结果进行再次池化,得到最终池化结果,由此可利用移位寄存器阵列对应的多个PE同并行进行各池化窗口的池化处理操作,进而提升了池化处理效率,减轻了芯片的计算负担,降低了芯片设计的难度。In the above example, in the target feature map split result, some pixel values in the same pooling window are in the same sub-feature map. At this time, the pixel values in the same pooling window in each sub-feature map can be pooled first. The partial pooling results corresponding to different pooling windows in each sub-feature map are obtained; then the partial pooling results corresponding to the same pooling window in each sub-feature map are pooled again to obtain the final pooling result. Multiple PEs corresponding to the shift register array can be used to perform the pooling processing operation of each pooling window in parallel, thereby improving the pooling processing efficiency, reducing the computational burden of the chip, and reducing the difficulty of chip design.
在一些例子中,为了最大化利用移位寄存器阵列对应的PE,进一步提升池化效率,上述若干子特征图中至少部分子特征图包括的像素点数量与上述移位寄存器阵列包括的移位寄存器数量一致。In some examples, in order to maximize the utilization of the PE corresponding to the shift register array and further improve the pooling efficiency, the number of pixels included in at least some of the sub-feature maps in the above-mentioned sub-feature maps is the same as the number of pixels included in the shift register array in the above-mentioned shift register array. The number is the same.
在一些例子中,在确定拆分策略时,可根据移位寄存器阵列包括的移位寄存器数量进行确定,进而保证对目标特征图进行拆分后得到的各子特征图中有部分子特征图或全部的子特征图包括的像素点数量与上述移位寄存器数量一致。In some examples, when determining the splitting strategy, it can be determined according to the number of shift registers included in the shift register array, so as to ensure that each sub-feature map obtained by splitting the target feature map contains some sub-feature maps or The number of pixels included in all the sub-feature maps is the same as the number of the above-mentioned shift registers.
由此,可使上述移位寄存器对应的全部PE并行进行池化操作,从而最大化利用移位寄存器阵列对应的PE,进一步提升池化效率,减轻芯片的计算负担,降低芯片设计的难度。Therefore, all the PEs corresponding to the shift registers can be pooled in parallel, thereby maximizing the utilization of the PEs corresponding to the shift register array, further improving the pooling efficiency, reducing the computational burden of the chip, and reducing the difficulty of chip design.
以下结合具体场景进行实施例说明。Embodiments are described below with reference to specific scenarios.
场景一:目标特征图大小为16*16,池化窗口大小为2*2,步长为2,池化处理为最大池化处理,池化指令包括比较两者之间的最大值。进行池化处理的AI芯片包括的移位寄存器阵列大小为8*8。Scenario 1: The size of the target feature map is 16*16, the size of the pooling window is 2*2, the step size is 2, the pooling process is maximum pooling process, and the pooling instruction includes comparing the maximum value between the two. The size of the shift register array included in the AI chip for pooling is 8*8.
在通过AI芯片进行池化操作时,可对上述目标特征图进行拆分,得到四个子特征图。在一些例子中,可将上述目标特征图中处于奇数行、奇数列位置的像素值确定为第一子特征图;将上述目标特征图中处于奇数行、偶数列位置的像素值确定为第二子特征图;将上述目标特征图中处于偶数行、奇数列位置的像素值确定为第三子特征图;将上述目标特征图中处于偶数行、偶数列位置的像素值确定为第四子特征图。When the pooling operation is performed by the AI chip, the above target feature map can be split to obtain four sub-feature maps. In some examples, pixel values in odd rows and odd columns in the target feature map may be determined as the first sub-feature map; pixel values in odd rows and even columns in the target feature map may be determined as the second sub-feature map sub-feature map; the pixel values in the even-numbered rows and odd-numbered columns in the above-mentioned target feature map are determined as the third sub-feature map; the pixel values in the even-numbered rows and even-numbered columns in the above-mentioned target feature map are determined as the fourth sub-feature picture.
请参见图4,图4为本申请示出的一种目标特征图的拆分过程示意图。如图4所示,目标特征图为16*16。其中,黑色方块指示目标特征图中处于奇数行、奇数列的像素点;深灰色方块指示目标特征图中处于奇数行、偶数列的像素点;白色方块指示目标特征图中处于偶数行、奇数列的像素点;浅灰色方块指目标特征图中处于偶数数行、偶数列的像素点。Please refer to FIG. 4 , which is a schematic diagram of a splitting process of a target feature map shown in this application. As shown in Figure 4, the target feature map is 16*16. Among them, black squares indicate pixels in odd rows and columns in the target feature map; dark gray squares indicate pixels in odd rows and even columns in the target feature map; white squares indicate pixels in even rows and odd columns in the target feature map The light gray squares refer to the pixels in the even-numbered rows and even-numbered columns in the target feature map.
按照前述拆分方法对上述目标特征图进行拆分,得到第一至第四子特征图。其中,各子特征图大小为8*8,与上述移位寄存器阵列的大小一致。完成拆分后,可按照预设搬移方法,将上述第一子特征图包括的各像素值分别搬移至上述移位寄存器阵列对应的 至少部分移位寄存器中。在一些例子中,可按照预设搬移方法,将上述第一子特征图包括的各像素值分别搬移至上述移位寄存器阵列包括的至少部分移位寄存器的第一寄存器中。在一些实施例中,各像素值需要在移位寄存器阵列中的各移位寄存器之间整体移动,那么根据移动方向和移动步长可在移位寄存器阵列中的预设位置预留空闲移位寄存器,以便存储移动后的像素值,假设各像素值需要整体向右移动一个步长,那么至少需要将各像素值存储位置中最右列像素值相邻右侧的空闲移位寄存器进行预留,其他移动方式同理,这里不再赘述。由此可将第一子特征图包括的各像素值全部搬移至上述移位寄存器阵列包括的各移位寄存器中。The target feature map is split according to the aforementioned splitting method to obtain the first to fourth sub-feature maps. The size of each sub-feature map is 8*8, which is consistent with the size of the above-mentioned shift register array. After the splitting is completed, each pixel value included in the above-mentioned first sub-feature map can be moved to at least part of the shift registers corresponding to the above-mentioned shift register array according to the preset moving method. In some examples, each pixel value included in the first sub-feature map may be moved to the first registers of at least part of the shift registers included in the shift register array according to a preset moving method. In some embodiments, each pixel value needs to be moved as a whole among the shift registers in the shift register array, then according to the moving direction and the moving step size, an idle shift can be reserved at a preset position in the shift register array Register in order to store the moved pixel value. Assuming that each pixel value needs to be moved to the right by one step as a whole, then at least the free shift register on the right adjacent to the pixel value of the rightmost column in each pixel value storage location needs to be reserved. , and other moving methods are the same, and will not be repeated here. In this way, all the pixel values included in the first sub-feature map can be transferred to the shift registers included in the shift register array.
然后,可按照上述预设搬移方法,将上述第二子特征图包括的各像素值分别搬移至上述部分移位寄存器中,以使与移位寄存器对应的计算内核可根据上述池化指令,对接收的两个像素值进行池化处理,得到第一池化处理结果。在一些例子中,可按照上述预设搬移方法,将上述第二子特征图包括的各像素值分别搬移至上述部分移位寄存器的第二寄存器中,以使各计算内核可根据上述池化指令,获取上述第一寄存器与上述第二寄存器中存储的数值中的最大值,并将上述最大值作为上述第一池化处理结果存储在上述第一寄存器中。由此,可将同一池化窗口内分别处于第一子特征图与第二子特征图中的像素值进行比较,得到最大值并存储在移位寄存器中。Then, each pixel value included in the second sub-feature map can be respectively moved to the partial shift register according to the above-mentioned preset moving method, so that the computing kernel corresponding to the shift register can The received two pixel values are pooled to obtain a first pooling result. In some examples, each pixel value included in the second sub-feature map can be respectively moved to the second register of the partial shift register according to the above-mentioned preset moving method, so that each computing core can perform the pooling instruction according to the above-mentioned pooling instruction. , obtain the maximum value among the values stored in the first register and the second register, and store the maximum value in the first register as the result of the first pooling process. Therefore, the pixel values in the first sub-feature map and the second sub-feature map respectively in the same pooling window can be compared to obtain the maximum value and store it in the shift register.
之后,可按照上述预设搬移方法,将上述第三子特征图包括的各像素值分别搬移至上述部分移位寄存器中,以使各计算内核根据上述池化指令,对上述第一池化处理结果与接收的像素值进行池化处理,得到第二池化处理结果。在一些例子中,按照上述预设搬移方法,将上述第三子特征图包括的各像素值分别搬移至上述部分移位寄存器的第二寄存器中,以使各计算内核可根据上述池化指令,获取上述第一寄存器与上述第二寄存器中存储的数值中的最大值,并将上述最大值作为上述第二池化处理结果存储在上述第一寄存器中。由此,可将同一池化窗口内分别处于第一子特征图、第二子特征图及第三子特征图中的像素值进行比较,得到最大值并存储在移位寄存器中。Afterwards, each pixel value included in the above-mentioned third sub-feature map can be respectively moved to the above-mentioned partial shift registers according to the above-mentioned preset moving method, so that each computing kernel can process the above-mentioned first pooling process according to the above-mentioned pooling instruction The result is pooled with the received pixel value to obtain a second pooling result. In some examples, according to the above-mentioned preset moving method, each pixel value included in the above-mentioned third sub-feature map is respectively moved to the second register of the above-mentioned partial shift register, so that each computing core can be based on the above-mentioned pooling instruction, Obtain the maximum value among the values stored in the first register and the second register, and store the maximum value in the first register as the second pooling processing result. Therefore, the pixel values in the first sub-feature map, the second sub-feature map and the third sub-feature map respectively in the same pooling window can be compared to obtain the maximum value and store it in the shift register.
再之后,可按照上述预设搬移方法将上述第四子特征图包括的各像素值分别搬移至上述部分移位寄存器中,以使各计算内核根据上述池化指令,对上述第二池化处理结果与接收的像素值进行池化处理,得到第三池化处理结果。在一些例子中,按照上述预设搬移方法将上述第四子特征图包括的各像素值分别搬移至上述部分移位寄存器的第二寄存器中,以使各计算内核可根据上述池化指令,获取上述第一寄存器与上述第二寄存器中存储的数值中的最大值,并将上述最大值作为上述第三池化处理结果存储在上述第一寄存器中。由此,可将同一池化窗口内分别处于第一子特征图、第二子特征图、第三子特征图及第四子特征图中的像素值进行比较,得到最大值并存储在移位寄存器中。After that, each pixel value included in the above-mentioned fourth sub-feature map can be respectively moved to the above-mentioned partial shift register according to the above-mentioned preset moving method, so that each computing kernel can process the above-mentioned second pooling process according to the above-mentioned pooling instruction. The result is pooled with the received pixel value to obtain a third pooling result. In some examples, each pixel value included in the above-mentioned fourth sub-feature map is respectively moved to the second register of the above-mentioned partial shift register according to the above-mentioned preset moving method, so that each computing core can obtain the The maximum value among the values stored in the first register and the second register is stored in the first register as the third pooling processing result. In this way, the pixel values in the first sub-feature map, the second sub-feature map, the third sub-feature map and the fourth sub-feature map in the same pooling window can be compared, and the maximum value can be obtained and stored in the shift in the register.
最后,可输出各计算内核中分别进行池化处理得到的第三池化处理结果,得到与上述目标特征图对应的池化结果。在一些例子中,可输出上述部分移位寄存器的第一寄存器中存储的数值,得到与上述目标特征图对应的池化结果。由此,即可完成针对上述目标特征图的最大池化处理,得到对应的池化结果。Finally, the third pooling processing result obtained by performing pooling processing in each computing core can be outputted to obtain the pooling result corresponding to the above-mentioned target feature map. In some examples, the value stored in the first register of the partial shift register can be output to obtain the pooling result corresponding to the target feature map. In this way, the maximum pooling process for the above target feature map can be completed, and the corresponding pooling result can be obtained.
需要说明的是,以上各子特征图输入移位寄存器的顺序仅为示意性说明。在实际应用中可使用任意输入顺序。It should be noted that, the sequence in which each of the above sub-feature maps is input to the shift register is only for schematic illustration. Any input order can be used in practical applications.
当池化处理为平均池化时,具体过程可参照上述实施例,仅是池化指令有所不同,在此不做详述。When the pooling process is average pooling, the specific process may refer to the above-mentioned embodiment, only the pooling instructions are different, which will not be described in detail here.
场景二:目标特征图大小为17*17,池化窗口大小为3*3,步长为2,池化处理为最大池化处理,池化指令为上述池化指令包括比较两者之间的最大值。进行池化处理的AI芯片包括的移位寄存器阵列大小为9*9。Scenario 2: The size of the target feature map is 17*17, the size of the pooling window is 3*3, the step size is 2, the pooling process is the maximum pooling process, and the pooling command is the above-mentioned pooling command, including comparing the values between the two. maximum value. The size of the shift register array included in the AI chip for pooling is 9*9.
在通过AI芯片进行池化操作时,可对上述目标特征图进行拆分,得到四个子特征图。When the pooling operation is performed by the AI chip, the above target feature map can be split to obtain four sub-feature maps.
请参见图5,图5为本申请示出的一种目标特征图的拆分过程示意图。如图5所示,目标特征图为17*17。其中,黑色方块指示目标特征图中处于奇数行、奇数列的像素点;深灰色方块指示目标特征图中处于奇数行、偶数列的像素点;白色方块指示目标特征图 中处于偶数行、奇数列的像素点;浅灰色方块指目标特征图中处于偶数数行、偶数列的像素点。Please refer to FIG. 5 , which is a schematic diagram of a splitting process of a target feature map shown in the present application. As shown in Figure 5, the target feature map is 17*17. Among them, black squares indicate pixels in odd rows and columns in the target feature map; dark gray squares indicate pixels in odd rows and even columns in the target feature map; white squares indicate pixels in even rows and odd columns in the target feature map The light gray squares refer to the pixels in the even-numbered rows and even-numbered columns in the target feature map.
按照前述拆分方法对上述目标特征图进行拆分,得到第一至第四子特征图。其中,第一子特征图为9*9,第二子特征图9*8,第三子特征图为8*9,第四子特征图为8*8。上述第一子特征图与上述移位寄存器阵列的大小一致。The target feature map is split according to the aforementioned splitting method to obtain the first to fourth sub-feature maps. The first sub-feature map is 9*9, the second sub-feature map is 9*8, the third sub-feature map is 8*9, and the fourth sub-feature map is 8*8. The size of the above-mentioned first sub-feature map is consistent with the size of the above-mentioned shift register array.
请参见图6,图6为本申请示出的一种池化窗口示意图。其中,图6示出的池化窗口包括,以池化窗口大小为3*3,步长为2对上述目标特征图进行池化时的池化窗口。如图6所示,虚线框内表示目标特征图中的一个池化窗口。该池化窗口内可包括4个黑色块,2个深灰色块,2个白色块及1个浅灰色块。在对该池化窗口进行池化操作时,根据各子特征图中处于同一池化窗口的像素值所处的位置,针对每个池化窗口,可先执行S61-S62,确定分别处于左上角,右上角,左下角,右下角位置四个像素值中最大值,即该池化窗口对应的在第一子特征图中的上下左右相邻四个像素值中第一最大值。然后再执行S63-S64确定处于第一行第二列,及第三行第二列位置的两个像素值中的最大值,即该池化窗口对应的在第二子特征图中上下相邻的两个像素值中的第二最大值。然后再执行S65-S66,确定处于第二行第一列和第二行第三列的两个像素值中的最大值,即该池化窗口对应的在第三子特征图中左右相邻的两个像素值中的第三最大值。最后确定上述第一最大值、第二最大值、第三最大值及位置处于中间的像素值(即该池化窗口对应的在第四子特征图中的像素值)中的最大值,并将确定的最大值作为针对该池化窗口的最大池化结果。Please refer to FIG. 6, which is a schematic diagram of a pooling window shown in this application. Among them, the pooling window shown in FIG. 6 includes the pooling window when the pooling window size is 3*3 and the step size is 2 when the above target feature map is pooled. As shown in Figure 6, the dashed box represents a pooling window in the target feature map. The pooling window can include 4 black blocks, 2 dark gray blocks, 2 white blocks and 1 light gray block. When performing the pooling operation on the pooling window, according to the position of the pixel values in the same pooling window in each sub-feature map, for each pooling window, S61-S62 can be executed first, and it is determined that they are in the upper left corner respectively. , the maximum value among the four pixel values in the upper right corner, the lower left corner, and the lower right corner, that is, the first maximum value among the four adjacent pixel values in the upper, lower, left, and right corners corresponding to the pooling window. Then execute S63-S64 to determine the maximum value of the two pixel values in the first row, second column, and the third row, second column, that is, the pooling window corresponds to the upper and lower adjacent sub-feature maps in the second sub-feature map. The second largest of the two pixel values of . Then execute S65-S66 to determine the maximum value of the two pixel values in the first column of the second row and the third column of the second row, that is, the left and right adjacent pixels in the third sub-feature map corresponding to the pooling window are determined. The third largest of the two pixel values. Finally, determine the maximum value among the above-mentioned first maximum value, second maximum value, third maximum value and the pixel value in the middle (that is, the pixel value in the fourth sub-feature map corresponding to the pooling window), and set the The determined maximum value is taken as the result of max pooling for this pooling window.
在一些例子中,可执行S61,按照预设搬移方法,将上述第一子特征图包括的各像素值分别搬移至上述移位寄存器阵列包括的至少部分移位寄存器中。在一些例子中,可按照预设搬移方法,将上述第一子特征图包括的各像素值分别搬移至上述移位寄存器阵列包括的至少部分移位寄存器的第一寄存器中。In some examples, S61 can be executed to move each pixel value included in the first sub-feature map to at least part of the shift registers included in the shift register array according to a preset moving method. In some examples, each pixel value included in the first sub-feature map may be moved to the first registers of at least part of the shift registers included in the shift register array according to a preset moving method.
然后可执行S62,与上述至少部分移位寄存器对应的计算内核可根据上述池化指令,对上述移位寄存器阵列包括的任意四个上下左右相邻的移位寄存器中的像素值进行池化操作,得到第一部分池化处理结果,并将上述第一部分池化处理结果存储至上述四个上下左右相邻的移位寄存器中处于预设位置的目标移位寄存器内。在一些例子中,上述预设位置包括任意四个上下左右相邻的移位寄存器中的左下角位置。可理解的是,上述预设位置为其它位置的方案可参照本实施例。Then, S62 can be executed, and the computing kernel corresponding to the above-mentioned at least part of the shift registers can perform a pooling operation on the pixel values in any four adjacent shift registers included in the above-mentioned shift register array according to the above-mentioned pooling instruction , obtain the first partial pooling processing result, and store the first partial pooling processing result in the target shift register at the preset position among the four adjacent shift registers. In some examples, the above-mentioned preset position includes the position of the lower left corner of any four adjacent shift registers. It is understandable that, for the solution in which the above-mentioned preset positions are other positions, reference may be made to this embodiment.
上述目标移位寄存器包括上述四个上下左右相邻的移位寄存器中的左下角位置的移位寄存器。请参见图7,图7为本申请示出的一种针对第一子特征图像素值搬移的示意图。需要说明的是,移位寄存器阵列中任意上下左右相邻的四个移位寄存器均可看作是图7示出的一组移位寄存器。以图7示出的该组移位寄存器中的第二移位寄存器为例,在另一组移位寄存器组中可能是第一移位寄存器,也可能是第三移位寄存器或目标移位寄存器。图7仅示意性的说明像素值在一组移位寄存器中的移动流向,其它组移位寄存器中像素值的移动流向可参照图7示意出的,在本申请中不做详述。如图7(图7中并未示意出PE)所示,一组移位寄存器中,左上角位置的移位寄存器可看作是第一移位寄存器,第一移位寄存器右方的移位寄存器可看作第二移位寄存器,第二移位寄存器下方的移位寄存器可看作是第三移位寄存器,第三移位寄存器左方的移位寄存器可看作是上述目标移位寄存器。The above-mentioned target shift register includes a shift register located at the lower left corner among the above-mentioned four adjacent shift registers. Please refer to FIG. 7 . FIG. 7 is a schematic diagram of transferring pixel values of the first sub-feature map according to the present application. It should be noted that any four adjacent shift registers in the shift register array can be regarded as a group of shift registers shown in FIG. 7 . Taking the second shift register in the group of shift registers shown in FIG. 7 as an example, in another group of shift registers, it may be the first shift register, or the third shift register or the target shift register. register. FIG. 7 only schematically illustrates the movement flow of pixel values in one group of shift registers, and the movement flow of pixel values in other groups of shift registers can be illustrated with reference to FIG. 7 , and will not be described in detail in this application. As shown in FIG. 7 (PE is not shown in FIG. 7 ), in a group of shift registers, the shift register at the upper left corner can be regarded as the first shift register, and the shift register to the right of the first shift register The register can be regarded as the second shift register, the shift register below the second shift register can be regarded as the third shift register, and the shift register to the left of the third shift register can be regarded as the above-mentioned target shift register .
在图7中,S71,可将上述部分移位寄存器中各第一移位寄存器的第一寄存器中的数值搬移至该第一移位寄存器右方的第二移位寄存器的第二寄存器中。In FIG. 7, S71, the value in the first register of each of the first shift registers in the above-mentioned partial shift registers can be moved to the second register of the second shift register to the right of the first shift register.
然后,各计算内核可将各第二移位寄存器中第一寄存器与第二寄存器中较大的数值存储至各第二移位寄存器的第一寄存器中。Then, each computing core can store the larger value in the first register and the second register in each second shift register into the first register in each second shift register.
S72,可将上述各第二移位寄存器中第一寄存器中的数值搬移至该第二移位寄存器下方的第三移位寄存器的第二寄存器中。S72, the value in the first register of the second shift registers can be moved to the second register of the third shift register below the second shift register.
然后,各计算内核可将各第三移位寄存器中第一寄存器与第二寄存器中较大的数值存储至各第三移位寄存器的第一寄存器中。Then, each computing core can store the larger value in the first register and the second register in each third shift register into the first register in each third shift register.
S73,可将上述各第三移位寄存器中第一寄存器中的数值搬移至该第三移位寄存器左方的目标移位寄存器的第二寄存器中。S73, the value in the first register of the third shift registers can be moved to the second register of the target shift register on the left side of the third shift register.
然后,各计算内核可将各目标移位寄存器中第一寄存器与第二寄存器中较大的数值作为上述第一部分池化处理结果,存储至各目标移位寄存器的第一寄存器中。由此,即可将同一池化窗口内分别处于左上角,右上角,左下角,右下角位置四个像素值中最大值,即第一子特征图中的上下左右相邻四个像素值中第一最大值存储在上述目标移位寄存器中。Then, each computing core can store the larger value in the first register and the second register in each target shift register as the above-mentioned first partial pooling processing result in the first register of each target shift register. In this way, the maximum value of the four pixel values in the upper left corner, upper right corner, lower left corner and lower right corner of the same pooling window can be determined, that is, among the four adjacent pixel values in the first sub-feature map. The first maximum value is stored in the aforementioned target shift register.
之后,可执行S63,以按照上述预设搬移方法,将上述第二子特征图包括的各像素值分别搬移至上述部分移位寄存器中。在一些例子中,可按照上述预设搬移方法,将上述第二子特征图包括的各像素值分别搬移至上述部分移位寄存器的第二寄存器中。After that, S63 can be executed to move each pixel value included in the second sub-feature map to the partial shift register according to the preset moving method. In some examples, each pixel value included in the second sub-feature map can be respectively moved to the second register of the partial shift register according to the preset moving method.
然后,各计算内核可执行S64,根据上述池化指令,对上述移位寄存器阵列包括的任意两个上下相邻的移位寄存器中的像素值进行池化操作,得到第二部分池化处理结果,并将上述第二部分池化处理结果存储至上述目标移位寄存器。Then, each computing core can execute S64, and according to the above-mentioned pooling instruction, perform a pooling operation on the pixel values in any two upper and lower adjacent shift registers included in the above-mentioned shift register array, to obtain a second partial pooling processing result , and store the above-mentioned second partial pooling processing result in the above-mentioned target shift register.
请参见图8,图8为本申请示出的一种针对第二子特征图像素值搬移的示意图。需要说明的是,移位寄存器阵列中任意上下相邻的两个移位寄存器可看作是图8示出的一组移位寄存器。图8仅示意性的说明像素值在一组移位寄存器中的移动流向,其它组移位寄存器中像素值的移动流向可参照图8示意出的,在本申请中不做详述。如图8(图8中并未示意出寄存器)所示,一组移位寄存器中处于上方位置的移位寄存器可看作是第一移位寄存器,第一移位寄存器下方的移位寄存器可看作是上述目标移位寄存器。Please refer to FIG. 8 . FIG. 8 is a schematic diagram of transferring pixel values of the second sub-feature map according to the present application. It should be noted that any two adjacent shift registers in the shift register array can be regarded as a group of shift registers shown in FIG. 8 . FIG. 8 only schematically illustrates the movement flow of pixel values in one group of shift registers, and the movement flow of pixel values in other groups of shift registers can be illustrated with reference to FIG. 8 , and will not be described in detail in this application. As shown in FIG. 8 (the register is not shown in FIG. 8 ), the shift register at the upper position in a group of shift registers can be regarded as the first shift register, and the shift register below the first shift register can be regarded as the first shift register. Think of it as the above target shift register.
S81,将上述部分移位寄存器中各第一移位寄存器的第二寄存器中的数值搬移至该第一移位寄存器下方的目标移位寄存器的第三寄存器中。S81 , moving the value in the second register of each first shift register in the partial shift registers to the third register of the target shift register below the first shift register.
S82,各计算内核可将各目标移位寄存器中第二寄存器与第三寄存器中较大的数值作为上述第二部分池化处理结果,存储至各目标移位寄存器的第二寄存器中。S82, each computing core may store the larger value in the second register and the third register in each target shift register as the result of the above-mentioned second partial pooling processing in the second register of each target shift register.
由此,可将同一池化窗口内分别处于第一行第二列,及第三行第二列位置的两个像素值中的最大值,即第二子特征图中上下相邻的两个像素值中的第二最大值存储在上述目标移位寄存器中。Therefore, the maximum value of the two pixel values in the first row, second column, and the third row and second column position in the same pooling window can be determined, that is, the two adjacent top and bottom in the second sub-feature map. The second largest of the pixel values is stored in the above-mentioned destination shift register.
之后,可执行S65,按照上述预设搬移方法,将上述第三子特征图包括的各像素值分别搬移至上述部分移位寄存器中。在一些例子中,按照上述预设搬移方法,将上述第三子特征图包括的各像素值分别搬移至上述部分移位寄存器的第三寄存器中。After that, S65 can be executed, and each pixel value included in the third sub-feature map is respectively moved to the partial shift register according to the preset moving method. In some examples, each pixel value included in the third sub-feature map is respectively moved to the third register of the partial shift register according to the above-mentioned preset moving method.
然后,各计算内核可执行S66,根据上述池化指令,对上述移位寄存器阵列包括的任意两个左右相邻的移位寄存器中的像素值进行池化操作,得到第三部分池化处理结果,并将上述第三部分池化处理结果存储至上述目标移位寄存器。Then, each computing core can execute S66, and according to the above-mentioned pooling instruction, perform a pooling operation on the pixel values in any two left and right adjacent shift registers included in the above-mentioned shift register array, and obtain a third partial pooling processing result , and store the above-mentioned third partial pooling processing result in the above-mentioned target shift register.
请参见图9,图9为本申请示出的一种针对第三子特征图像素值搬移的示意图。需要说明的是,移位寄存器阵列中任意左右相邻的两个移位寄存器可看作是图9示出的一组移位寄存器。图9仅示意性的说明像素值在一组移位寄存器中的移动流向,其它组移位寄存器中像素值的移动流向可参照图9示意出的,在本申请中不做详述。如图9(图9中并未示意出寄存器)所示,一组移位寄存器中处于左上角位置的移位寄存器可看作是第一移位寄存器,第一移位寄存器右方的移位寄存器可看作第二移位寄存器,第二移位寄存器下方的移位寄存器可看作是第三移位寄存器,第三移位寄存器左方的移位寄存器可看作是上述目标移位寄存器。Please refer to FIG. 9. FIG. 9 is a schematic diagram of transferring pixel values of the third sub-feature map according to the present application. It should be noted that any two left and right adjacent shift registers in the shift register array can be regarded as a group of shift registers shown in FIG. 9 . FIG. 9 only schematically illustrates the movement flow of pixel values in one group of shift registers. The movement flow of pixel values in other groups of shift registers can be illustrated with reference to FIG. 9 , and will not be described in detail in this application. As shown in FIG. 9 (the register is not shown in FIG. 9 ), the shift register in the upper left corner of a group of shift registers can be regarded as the first shift register, and the shift register to the right of the first shift register The register can be regarded as the second shift register, the shift register below the second shift register can be regarded as the third shift register, and the shift register to the left of the third shift register can be regarded as the above-mentioned target shift register .
S91,将上述部分移位寄存器中各第一移位寄存器的第三寄存器中的数值搬移至该第一移位寄存器右方的第二移位寄存器的第四寄存器中。S91 , moving the values in the third registers of the first shift registers in the partial shift registers to the fourth register of the second shift registers to the right of the first shift register.
S92,各计算内核可将各第二移位寄存器中第三寄存器与第四寄存器中较大的数值作为上述第三部分池化处理结果,存储至各第二移位寄存器的第三寄存器中。S92, each computing core may store the larger value in the third register and the fourth register in each second shift register as the above-mentioned third partial pooling processing result in the third register of each second shift register.
至此,即可将同一池化窗口内分别第二行第一列和第二行第三列的两个像素值中的最大值,即第三子特征图中左右相邻的两个像素值中的第三最大值存储在上述第二 移位寄存器中。At this point, the maximum value of the two pixel values in the second row, the first column and the second row, the third column in the same pooling window, that is, the two adjacent pixel values on the left and right in the third sub-feature map. The third maximum value of is stored in the above-mentioned second shift register.
之后,可将上述第三最大值(第三部分池化处理结果)搬移至上述目标移位寄存器中。即执行S93,将上述第二移位寄存器的第三寄存器中的第三部分池化处理结果,搬移至该第二移位寄存器左方的第一移位寄存器的第三寄存器中。S94,将上述第一移位寄存器的第三寄存器中的第三部分池化处理结果,搬移至该第一移位寄存器下方的目标移位寄存器的第三寄存器中。由此,可将同一池化窗口内第二行第一列和第二行第三列的两个像素值中的最大值,即第三子特征图中左右相邻的两个像素值中的第三最大值存储在上述目标移位寄存器中。在一些例子中,在将数据从第二移位寄存器搬移至目标移位寄存器时,也可先将数据搬移至第三移位寄存器,再搬移至上述目标移位寄存器。Afterwards, the above-mentioned third maximum value (the result of the third partial pooling process) can be moved to the above-mentioned target shift register. That is, S93 is executed, and the third part of the pooling processing result in the third register of the second shift register is moved to the third register of the first shift register on the left side of the second shift register. S94: Move the third part of the pooling processing result in the third register of the first shift register to the third register of the target shift register below the first shift register. Therefore, the maximum value of the two pixel values in the second row, the first column and the second row, the third column in the same pooling window, that is, the left and right adjacent pixel values in the third sub-feature map. The third maximum value is stored in the aforementioned target shift register. In some examples, when data is moved from the second shift register to the target shift register, the data can also be moved to the third shift register first, and then moved to the above-mentioned target shift register.
之后,可执行S67,按照上述预设搬移方法,将上述第四子特征图包括的各像素值分别搬移至上述部分移位寄存器中。在一些例子中,可按照上述预设搬移方法,将上述第四子特征图包括的各像素值分别搬移至上述部分移位寄存器的第四寄存器中。After that, S67 can be executed, and each pixel value included in the fourth sub-feature map is respectively moved to the partial shift register according to the above-mentioned preset moving method. In some examples, each pixel value included in the fourth sub-feature map can be respectively moved to the fourth register of the partial shift register according to the above-mentioned preset moving method.
然后,可执行S68,将上述部分移位寄存器中各移位寄存器中的像素值搬移至上述目标移位寄存器。在一些例子中,可将上述部分移位寄存器中各第一移位寄存器的第四寄存器中的数值搬移至该第一移位寄存器下方的目标移位寄存器的第四寄存器中。至此,上述目标移位寄存器中则包括了第一部分池化处理结果(第一最大值),第二部分池化处理结果(第二最大值),第三部分池化处理结果(第三最大值),及处于池化窗口中间的像素值(第四子特征图对应的像素值)。Then, S68 can be executed to move the pixel values in each of the shift registers in the partial shift registers to the target shift register. In some examples, the values in the fourth register of each of the first shift registers in the above-mentioned partial shift registers may be moved to the fourth register of the target shift register below the first shift register. So far, the above target shift register includes the first part of the pooling processing result (the first maximum value), the second part of the pooling processing result (the second maximum value), and the third part of the pooling processing result (the third maximum value) ), and the pixel value in the middle of the pooling window (the pixel value corresponding to the fourth sub-feature map).
最后,可将上述移位寄存器阵列包括的各目标移位寄存器内的第一部分池化处理结果、第二部分池化处理结果、第三部分池化处理结果及上述第四子特征图对应的像素值进行比较,并输出其中的最大值,得到与上述目标特征图对应的池化结果。在一些例子中,各计算内核可将各目标移位寄存器中的第一寄存器与第二寄存器中较大的数值存储至各目标移位寄存器的第一寄存器中。Finally, the first partial pooling processing result, the second partial pooling processing result, the third partial pooling processing result and the pixels corresponding to the fourth sub-feature map in each target shift register included in the shift register array can be Values are compared, and the maximum value among them is output to obtain the pooling result corresponding to the above target feature map. In some examples, each computing core may store the larger value of the first register and the second register of each target shift register into the first register of each target shift register.
然后,可将各目标移位寄存器中的第一寄存器与第三寄存器中较大的数值存储至各目标移位寄存器的第一寄存器中。Then, the larger value of the first register and the third register of each target shift register may be stored in the first register of each target shift register.
之后,可将各目标移位寄存器中的第一寄存器与第四寄存器中较大的数值存储至各目标移位寄存器的第一寄存器中。Afterwards, the larger value of the first register and the fourth register of each target shift register may be stored in the first register of each target shift register.
最后,输出各目标移位寄存器的第一寄存器中存储的数值,得到与上述目标特征图对应的池化结果。由此,即可将对各池化窗口进行最大池化后得到的最大值存储在上述目标移位寄存器中,并通过输出各目标移位寄存器中的最大值,得到与上述目标特征图对应的池化结果。在一些例子中,为了进一步提升池化效率,在上述移位寄存器外围连接了多个临时寄存器。其中,上述临时寄存器用于存储进行数值搬移操作时,溢出上述移位寄存器阵列的像素值。由此,在进行数据搬移过程中,无需将溢出移位寄存器阵列的像素值存储至RAM,仅需将溢出像素值存储至临时寄存器,从而提升了数据搬移效率,进而提升池化效率。Finally, the value stored in the first register of each target shift register is output, and the pooling result corresponding to the above target feature map is obtained. In this way, the maximum value obtained by performing maximum pooling on each pooling window can be stored in the above target shift register, and by outputting the maximum value in each target shift register, the corresponding target feature map can be obtained. Pooling results. In some examples, in order to further improve the pooling efficiency, a plurality of temporary registers are connected to the periphery of the above-mentioned shift register. Wherein, the temporary register is used for storing the pixel value overflowing the shift register array when the numerical value transfer operation is performed. Therefore, in the process of data transfer, it is not necessary to store the pixel values of the overflow shift register array in RAM, but only the overflow pixel values need to be stored in the temporary register, thereby improving the data transfer efficiency and thus the pooling efficiency.
在一些例子中,需进行池化处理的特征图的大小可能大于移位寄存器阵列的大小。本申请提出一种池化方法。请参见图10,图10为本申请示出的一种池化方法的方法流程图。如图10所示,上述方法可包括:In some instances, the size of the feature map to be pooled may be larger than the size of the shift register array. This application proposes a pooling method. Please refer to FIG. 10. FIG. 10 is a method flowchart of a pooling method shown in this application. As shown in Figure 10, the above method may include:
S1002,获取原始特征图。S1002, obtain the original feature map.
S1004,将上述原始特征图划分为若干目标特征图。S1004: Divide the above original feature map into several target feature maps.
S1006,根据上述任意实施例示出的池化方法对各目标特征图进行池化处理,得到各目标特征图对应的池化结果。S1006, performing pooling processing on each target feature map according to the pooling method shown in any of the foregoing embodiments, to obtain a pooling result corresponding to each target feature map.
S1008,输出各目标特征图对应的池化结果,得到上述原始特征图对应的池化结果。S1008, output the pooling result corresponding to each target feature map, and obtain the pooling result corresponding to the above-mentioned original feature map.
在上述方案中,可先将原始特征图进行划分得到若干目标特征图,然后根据上述任意实施例示出的池化方法对各目标特征图进行池化处理,得到各目标特征图对应的 池化结果。最后,输出各目标特征图对应的池化结果,得到上述原始特征图对应的池化结果。由此,可实现对大于移位寄存器阵列的上述原始特征图的进行高效地池化处理。In the above solution, the original feature map can be divided into several target feature maps first, and then each target feature map can be pooled according to the pooling method shown in any of the above embodiments, and the pooling result corresponding to each target feature map can be obtained. . Finally, the pooling result corresponding to each target feature map is output, and the pooling result corresponding to the original feature map above is obtained. Thus, efficient pooling of the above-mentioned original feature maps larger than the shift register array can be achieved.
本申请还提出一种芯片。上述芯片可包括控制器;上述控制器,用于获取目标特征图;对上述目标特征图进行拆分,得到若干子特征图;其中,上述目标特征图中处于同一池化窗口内的至少部分像素值分别处于不同的子特征图,各池化窗口内处于同一位置的像素值处于同一子特征图;对各子特征图中属于不同池化窗口的像素并行处理,得到上述目标特征图对应的池化结果。The present application also proposes a chip. The above-mentioned chip may include a controller; the above-mentioned controller is used to obtain a target feature map; the above-mentioned target feature map is split to obtain several sub-feature maps; wherein, at least some of the pixels in the same pooling window in the above-mentioned target feature map The values are in different sub-feature maps, and the pixel values in the same position in each pooling window are in the same sub-feature map; the pixels belonging to different pooling windows in each sub-feature map are processed in parallel to obtain the pool corresponding to the above target feature map. result.
在示出的一些实施例中,上述控制器用于:将上述若干子特征图分别包括的像素值加载至移位寄存器阵列中,根据池化指令,对上述若干子特征图中处于同一位置的像素值并行池化处理,得到与上述目标特征图对应的池化结果。In some of the illustrated embodiments, the above-mentioned controller is configured to: load the pixel values respectively included in the above-mentioned several sub-feature maps into the shift register array, and, according to the pooling instruction, perform the processing for the pixels in the same position in the above-mentioned several sub-feature maps The values are pooled in parallel to obtain the pooling results corresponding to the above target feature maps.
在示出的一些实施例中,上述控制器用于:根据各子特征图中处于同一池化窗口的像素值所处的位置,确定各子特征图分别对应的移位寄存器阵列的移位操作方式;将上述若干子特征图分别加载至移位寄存器阵列中,并针对每个子特征图,按照为该子特征图确定的移位寄存器阵列的移位操作方式对上述移位寄存器阵列中的移位寄存器存储的像素值执行移位操作,根据池化指令并行池化处理得到该子特征图中对应不同池化窗口的部分池化结果;根据各个子特征图对应不同池化窗口的部分池化结果,确定与上述目标特征图对应的池化结果。In some of the illustrated embodiments, the above-mentioned controller is configured to: determine the shift operation mode of the shift register array corresponding to each sub-feature map according to the positions of the pixel values in the same pooling window in each sub-feature map ; Load the above-mentioned several sub-feature maps into the shift register array respectively, and for each sub-feature map, shift the shift register array in the above-mentioned shift register array according to the shift operation mode of the shift register array determined for the sub-feature map. Shift operation is performed on the pixel values stored in the register, and the partial pooling results corresponding to different pooling windows in the sub-feature map are obtained by parallel pooling processing according to the pooling instruction; the partial pooling results corresponding to different pooling windows are obtained according to each sub-feature map. , and determine the pooling result corresponding to the above target feature map.
在示出的一些实施例中,上述控制器用于:将上述目标特征图中处于奇数行、奇数列位置的像素值确定为第一子特征图;将上述目标特征图中处于奇数行、偶数列位置的像素值确定为第二子特征图;将上述目标特征图中处于偶数行、奇数列位置的像素值确定为第三子特征图;将上述目标特征图中处于偶数行、偶数列位置的像素值确定为第四子特征图。In some of the illustrated embodiments, the controller is configured to: determine the pixel values in odd rows and odd columns in the target feature map as the first sub-feature map; The pixel value of the position is determined as the second sub-feature map; the pixel value in the even-numbered row and odd-numbered column position in the above-mentioned target feature map is determined as the third sub-feature map; The pixel value is determined as the fourth sub-feature map.
在示出的一些实施例中,上述控制器用于:将上述第一子特征图包括的各像素值分别搬移至上述移位寄存器阵列包括的至少部分移位寄存器中;将上述第二子特征图包括的各像素值分别搬移至上述部分移位寄存器中,以使与各移位寄存器对应的计算内核根据上述池化指令,对接收的两个像素值进行池化处理,得到第一池化处理结果;将上述第三子特征图包括的各像素值分别搬移至上述部分移位寄存器中,以使各计算内核根据上述池化指令,对上述第一池化处理结果与接收的像素值进行池化处理,得到第二池化处理结果;将上述第四子特征图包括的各像素值分别搬移至上述部分移位寄存器中,以使各计算内核根据上述池化指令,对上述第二池化处理结果与接收的像素值进行池化处理,得到第三池化处理结果;输出各计算内核分别进行池化处理得到的第三池化处理结果,得到与上述目标特征图对应的池化结果。In some of the illustrated embodiments, the controller is configured to: move each pixel value included in the first sub-feature map to at least part of the shift registers included in the shift register array; The included pixel values are respectively moved to the above-mentioned partial shift registers, so that the calculation kernel corresponding to each shift register performs pooling processing on the received two pixel values according to the above-mentioned pooling instructions, and obtains the first pooling processing. Result; each pixel value included in the above-mentioned third sub-feature map is respectively moved to the above-mentioned partial shift register, so that each computing kernel can pool the above-mentioned first pooling processing result and the received pixel value according to the above-mentioned pooling instruction. to obtain a second pooling result; move each pixel value included in the fourth sub-feature map to the above-mentioned partial shift register, so that each computing core performs the second pooling according to the above-mentioned pooling instruction. The processing result and the received pixel values are pooled to obtain a third pooling result; the third pooling result obtained by performing pooling on each computing core is output to obtain a pooling result corresponding to the above target feature map.
在示出的一些实施例中,上述池化处理包括最大池化处理;上述池化指令包括比较两者之间的最大值;上述控制器用于:将上述第一子特征图包括的各像素值分别搬移至上述移位寄存器阵列包括的至少部分移位寄存器的第一寄存器中;及,将上述第二子特征图包括的各像素值分别搬移至上述部分移位寄存器的第二寄存器中,以使与所述部分移位寄存器中各移位寄存器对应的计算内核根据上述池化指令,获取上述第一寄存器与上述第二寄存器中存储的数值中的最大值,并将上述最大值作为上述第一池化处理结果存储在上述第一寄存器中。In some of the illustrated embodiments, the above-mentioned pooling processing includes maximum pooling processing; the above-mentioned pooling instruction includes comparing the maximum value between the two; the above-mentioned controller is configured to: combine the pixel values included in the above-mentioned first sub-feature map are respectively moved to the first registers of at least part of the shift registers included in the above-mentioned shift register array; and each pixel value included in the above-mentioned second sub-feature map is respectively moved to the second registers of the above-mentioned part of the shift registers, to Make the calculation kernel corresponding to each shift register in the partial shift register obtain the maximum value of the values stored in the first register and the second register according to the pooling instruction, and use the maximum value as the first A pooling result is stored in the above-mentioned first register.
在示出的一些实施例中,上述控制器用于:将上述第三子特征图包括的各像素值分别搬移至上述部分移位寄存器的第二寄存器中,以使与所述部分移位寄存器中各移位寄存器对应的计算内核根据上述池化指令,获取上述第一寄存器与上述第二寄存器中存储的数值中的最大值,并将上述最大值作为上述第二池化处理结果存储在上述第一寄存器中;及,将上述第四子特征图包括的各像素值分别搬移至上述部分移位寄存器的第二寄存器中,以使与所述部分移位寄存器中各移位寄存器对应的计算内核根据上述池化指令,获取上述第一寄存器与上述第二寄存器中存储的数值中的最大值,并将上述最大值作为上述第三池化处理结果存储在上述第一寄存器中。In some of the illustrated embodiments, the controller is configured to: move each pixel value included in the third sub-feature map to the second register of the partial shift register, so as to be consistent with the partial shift register. The computing kernel corresponding to each shift register obtains the maximum value of the values stored in the first register and the second register according to the above-mentioned pooling instruction, and stores the above-mentioned maximum value as the result of the above-mentioned second pooling process in the above-mentioned No. In a register; and, moving each pixel value included in the fourth sub-characteristic map to the second register of the partial shift register, so that the calculation kernel corresponding to each shift register in the partial shift register According to the pooling instruction, the maximum value of the values stored in the first register and the second register is acquired, and the maximum value is stored in the first register as the third pooling processing result.
在示出的一些实施例中,上述控制器用于:输出上述部分移位寄存器的第一寄 存器中存储的数值,得到与上述目标特征图对应的池化结果。In some of the illustrated embodiments, the above-mentioned controller is configured to: output the value stored in the first register of the above-mentioned partial shift register, and obtain the pooling result corresponding to the above-mentioned target feature map.
在示出的一些实施例中,上述控制器用于:将上述第一子特征图包括的各像素值分别搬移至上述移位寄存器阵列包括的至少部分移位寄存器中;根据上述池化指令,对上述移位寄存器阵列包括的任意四个上下左右相邻的移位寄存器中的像素值进行池化操作,得到第一部分池化处理结果,并将上述第一部分池化处理结果存储至上述四个上下左右相邻的移位寄存器中处于预设位置的目标移位寄存器内;将上述第二子特征图包括的各像素值分别搬移至上述部分移位寄存器中;根据上述池化指令,对上述移位寄存器阵列包括的任意两个上下相邻的移位寄存器中的像素值进行池化操作,得到第二部分池化处理结果,并将上述第二部分池化处理结果存储至上述目标移位寄存器;将上述第三子特征图包括的各像素值分别搬移至上述部分移位寄存器中;根据上述池化指令,对上述移位寄存器阵列包括的任意两个左右相邻的移位寄存器中的像素值进行池化操作,得到第三部分池化处理结果,并将上述第三部分池化处理结果存储至上述目标移位寄存器;将上述第四子特征图包括的各像素值分别搬移至上述部分移位寄存器中,并将上述部分移位寄存器中各移位寄存器中的像素值搬移至上述目标移位寄存器。根据上述移位寄存器阵列包括的各目标移位寄存器内的第一部分池化处理结果、第二部分池化处理结果、第三部分池化处理结果及上述第四子特征图,得到与上述目标特征图对应的池化结果。In some of the illustrated embodiments, the controller is configured to: move each pixel value included in the first sub-feature map to at least part of the shift registers included in the shift register array; Perform a pooling operation on the pixel values in any four adjacent shift registers included in the above-mentioned shift register array to obtain a first partial pooling processing result, and store the above-mentioned first partial pooling processing result in the above-mentioned four upper, lower, and lower The left and right adjacent shift registers are located in the target shift register at the preset position; each pixel value included in the above-mentioned second sub-feature map is respectively moved to the above-mentioned partial shift register; according to the above-mentioned pooling instruction, the above-mentioned shift register is moved. Perform a pooling operation on the pixel values in any two up and down adjacent shift registers included in the bit register array to obtain a second partial pooling result, and store the second partial pooling result in the target shift register ; Each pixel value included in the above-mentioned third sub-feature map is moved into the above-mentioned partial shift register respectively; According to the above-mentioned pooling instruction, to the pixel in any two left and right adjacent shift registers included in the above-mentioned shift register array Perform a pooling operation on the values to obtain the third part of the pooling processing result, and store the above-mentioned third part of the pooling processing result in the above-mentioned target shift register; move each pixel value included in the above-mentioned fourth sub-feature map to the above-mentioned part respectively In the shift register, the pixel values in each shift register in the partial shift registers are transferred to the target shift register. According to the first partial pooling processing result, the second partial pooling processing result, the third partial pooling processing result in each target shift register included in the above-mentioned shift register array, and the above-mentioned fourth sub-feature map, the above-mentioned target feature is obtained. The pooling result corresponding to the graph.
在示出的一些实施例中,上述池化处理包括最大池化处理;上述池化指令包括比较两者之间的最大值;上述预设位置包括任意四个上下左右相邻的移位寄存器中的左下角位置;上述目标移位寄存器包括上述四个上下左右相邻的移位寄存器中的左下角位置的移位寄存器;上述控制器用于:将上述第一子特征图包括的各像素值分别搬移至上述移位寄存器阵列包括的至少部分移位寄存器的第一寄存器中;上述控制器用于:将上述部分移位寄存器中各第一移位寄存器的第一寄存器中的数值搬移至该第一移位寄存器右方的第二移位寄存器的第二寄存器中;将各第二移位寄存器中第一寄存器与第二寄存器中较大的数值存储至各第二移位寄存器的第一寄存器中;将上述各第二移位寄存器中第一寄存器中的数值搬移至该第二移位寄存器下方的第三移位寄存器的第二寄存器中;将各第三移位寄存器中第一寄存器与第二寄存器中较大的数值存储至各第三移位寄存器的第一寄存器中;将上述各第三移位寄存器中第一寄存器中的数值搬移至该第三移位寄存器左方的目标移位寄存器的第二寄存器中;将各目标移位寄存器中第一寄存器与第二寄存器中较大的数值作为上述第一部分池化处理结果,存储至各目标移位寄存器的第一寄存器中。In some of the illustrated embodiments, the above-mentioned pooling processing includes maximum pooling processing; the above-mentioned pooling instruction includes comparing the maximum value between the two; the above-mentioned preset position includes any four adjacent shift registers in the upper, lower, left and right. The above-mentioned target shift register includes the shift register at the lower-left corner position in the above-mentioned four adjacent shift registers; the above-mentioned controller is used for: respectively moved to the first register of at least part of the shift registers included in the above-mentioned shift register array; the above-mentioned controller is used for: moving the value in the first register of each first shift register in the above-mentioned partial shift register to the first register In the second register of the second shift register on the right side of the shift register; store the larger value of the first register and the second register in each second shift register into the first register of each second shift register ; Move the numerical value in the first register of the above-mentioned second shift registers to the second register of the third shift register below the second shift register; The larger value in the two registers is stored in the first register of each third shift register; the value in the first register in each of the third shift registers is moved to the target shift on the left of the third shift register In the second register of the register; the larger value in the first register and the second register in each target shift register is stored in the first register of each target shift register as the above-mentioned first partial pooling processing result.
在示出的一些实施例中,上述控制器用于:将上述第二子特征图包括的各像素值分别搬移至上述部分移位寄存器的第二寄存器中;上述控制器用于:将上述部分移位寄存器中各第一移位寄存器的第二寄存器中的数值搬移至该第一移位寄存器下方的目标移位寄存器的第三寄存器中;将各目标移位寄存器中第二寄存器与第三寄存器中较大的数值作为上述第二部分池化处理结果,存储至各目标移位寄存器的第二寄存器中。In some of the illustrated embodiments, the above-mentioned controller is used for: moving each pixel value included in the above-mentioned second sub-feature map to the second register of the above-mentioned partial shift register; the above-mentioned controller is used for: shifting the above-mentioned part The value in the second register of each first shift register in the register is moved to the third register of the target shift register below the first shift register; the second register and the third register in each target shift register are moved. The larger value is stored in the second register of each target shift register as the result of the above-mentioned second partial pooling process.
在示出的一些实施例中,上述控制器用于:将上述第三子特征图包括的各像素值分别搬移至上述部分移位寄存器的第三寄存器中;上述控制器用于:将上述部分移位寄存器中各第一移位寄存器的第三寄存器中的数值搬移至该第一移位寄存器右方的第二移位寄存器的第四寄存器中;将各第二移位寄存器中第三寄存器与第四寄存器中较大的数值作为上述第三部分池化处理结果,存储至各第二移位寄存器的第三寄存器中;将上述第二移位寄存器的第三寄存器中的第三部分池化处理结果,搬移至该第二移位寄存器左方的第一移位寄存器的第三寄存器中;将上述第一移位寄存器的第三寄存器中的第三部分池化处理结果,搬移至该第一移位寄存器下方的目标移位寄存器的第三寄存器中。In some of the illustrated embodiments, the above-mentioned controller is used for: moving each pixel value included in the above-mentioned third sub-feature map to the third register of the above-mentioned partial shift register; the above-mentioned controller is used for: shifting the above-mentioned part The value in the third register of each first shift register in the register is moved to the fourth register of the second shift register to the right of the first shift register; The larger value in the four registers is used as the result of the above-mentioned third part of the pooling processing, and is stored in the third register of each second shift register; the third part in the third register of the above-mentioned second shift register is pooled The result is moved to the third register of the first shift register on the left side of the second shift register; the third part of the pooling processing result in the third register of the first shift register is moved to the first shift register. in the third register of the destination shift register below the shift register.
在示出的一些实施例中,上述控制器用于:将上述第四子特征图包括的各像素值分别搬移至上述部分移位寄存器的第四寄存器中;上述控制器用于:将上述部分移位寄存器中各第一移位寄存器的第四寄存器中的数值搬移至该第一移位寄存器下方的目标移位寄存器的第四寄存器中。In some of the illustrated embodiments, the above-mentioned controller is used for: moving each pixel value included in the above-mentioned fourth sub-feature map to the fourth register of the above-mentioned partial shift register; the above-mentioned controller is used for: shifting the above-mentioned part The value in the fourth register of each first shift register in the registers is moved to the fourth register of the target shift register below the first shift register.
在示出的一些实施例中,上述控制器用于:将各目标移位寄存器中的第一寄存 器与第二寄存器中较大的数值存储至各目标移位寄存器的第一寄存器中;将各目标移位寄存器中的第一寄存器与第三寄存器中较大的数值存储至各目标移位寄存器的第一寄存器中;将各目标移位寄存器中的第一寄存器与第四寄存器中较大的数值存储至各目标移位寄存器的第一寄存器中;输出各目标移位寄存器的第一寄存器中存储的数值,得到与上述目标特征图对应的池化结果。In some of the illustrated embodiments, the above-mentioned controller is used to: store the larger value of the first register and the second register in each target shift register into the first register of each target shift register; store each target shift register The larger value in the first register and the third register in the shift register is stored in the first register of each target shift register; the larger value in the first register and the fourth register in each target shift register is stored Store the value in the first register of each target shift register; output the value stored in the first register of each target shift register, and obtain the pooling result corresponding to the above target feature map.
在示出的一些实施例中,上述移位寄存器阵列外围连接了多个临时寄存器;上述临时寄存器用于存储进行数值搬移操作时,溢出上述移位寄存器阵列的像素值。In some of the illustrated embodiments, a plurality of temporary registers are connected to the periphery of the above-mentioned shift register array; the above-mentioned temporary registers are used to store the pixel values that overflow the above-mentioned shift register array when performing a numerical value transfer operation.
在示出的一些实施例中,上述若干子特征图中至少部分子特征图包括的像素点数量与上述移位寄存器阵列包括的移位寄存器数量一致。In some of the illustrated embodiments, the number of pixels included in at least some of the sub-feature maps in the above-mentioned several sub-feature maps is consistent with the number of shift registers included in the above-mentioned shift register array.
本申请还提出一种芯片。上述芯片可包括控制器;上述控制器,用于获取原始特征图;将上述原始特征图划分为若干目标特征图;根据前述任一实施例示出的池化方法对各目标特征图进行池化处理,得到各目标特征图对应的池化结果;输出各目标特征图对应的池化结果,得到上述原始特征图对应的池化结果。The present application also proposes a chip. The above-mentioned chip may include a controller; the above-mentioned controller is used to obtain an original feature map; the above-mentioned original feature map is divided into several target feature maps; and each target feature map is pooled according to the pooling method shown in any of the foregoing embodiments. , obtain the pooling result corresponding to each target feature map; output the pooling result corresponding to each target feature map, and obtain the pooling result corresponding to the above-mentioned original feature map.
本申请还提出一种电子设备,该设备包括前述任一实施例示出的芯片。例如,该电子设备可是手机等智能终端,或者也可是具有摄像头并可进行图像处理的其他设备。示例性的,当该电子设备对采集到的图像进行池化处理时,可采用本申请实施例的芯片来执行池化任务。由于上述芯片具有较高的池化处理效率,具有更高的性能,因此,使用该芯片可辅助提高池化任务的处理效率,从而提升电子设备性能。The present application also provides an electronic device, which includes the chip shown in any of the foregoing embodiments. For example, the electronic device may be a smart terminal such as a mobile phone, or may be other devices that have a camera and can perform image processing. Exemplarily, when the electronic device performs pooling processing on the collected images, the chip of the embodiment of the present application may be used to perform the pooling task. Since the above chip has higher pooling processing efficiency and higher performance, the use of this chip can assist in improving the processing efficiency of the pooling task, thereby improving the performance of electronic equipment.
本申请还提出一种计算机可读存储介质,其上存储有计算机程序,上述程序被控制器执行时实现上述任一池化方法。The present application also provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by the controller, any one of the pooling methods described above is implemented.
本领域技术人员应明白,本申请一个或多个实施例可提供为方法、系统或计算机程序产品。因此,本申请一个或多个实施例可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请一个或多个实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、0xCD_00-ROM、光学存储器等)上实施的计算机程序产品的形式。As will be appreciated by those skilled in the art, one or more embodiments of the present application may be provided as a method, system or computer program product. Accordingly, one or more embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present application may employ a computer program implemented on one or more computer-usable storage media (including, but not limited to, disk storage, OxCD_00-ROM, optical storage, etc.) having computer-usable program code embodied therein form of the product.
本申请中记载的“和/或”表示至少具有两者中的其中一个,例如,“A和/或B”包括三种方案:A、B、及“A和B”。"And/or" described in this application means at least one of the two, for example, "A and/or B" includes three schemes: A, B, and "A and B".
本申请中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于数据处理设备实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。Each embodiment in this application is described in a progressive manner, and the same and similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the data processing device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for related parts, please refer to the partial description of the method embodiment.
上述对本申请特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的行为或步骤可按照不同于实施例中的顺序来执行并且仍然可实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可的或者可能是有利的。The foregoing describes specific embodiments of the present application. Other embodiments are within the scope of the appended claims. In some cases, the acts or steps recited in the claims can be performed in an order different from that in the embodiments and still achieve desirable results. Additionally, the processes depicted in the figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In certain embodiments, multitasking and parallel processing are also possible or may be advantageous.
本申请中描述的主题及功能操作的实施例可在以下中实现:数字电子电路、有形体现的计算机软件或固件、包括本申请中公开的结构及其结构性等同物的计算机硬件、或者它们中的一个或多个的组合。本申请中描述的主题的实施例可实现为一个或多个计算机程序,即编码在有形非暂时性程序载体上以被数据处理芯片执行或控制数据处理芯片的操作的计算机程序指令中的一个或多个单元。可替代地或附加地,程序指令可被编码在人工生成的传播信号上,例如机器生成的电、光或电磁信号,该信号被生成以将信息编码并传输到合适的接收机芯片以由数据处理芯片执行。计算机存储介质可是机器可读存储设备、机器可读存储基板、随机或串行存取存储器设备、或它们中的一个或多个的组合。Embodiments of the subject matter and functional operations described in this application can be implemented in digital electronic circuits, in tangible embodiment of computer software or firmware, in computer hardware including the structures disclosed in this application and their structural equivalents, or in a combination of one or more. Embodiments of the subject matter described in this application may be implemented as one or more computer programs, ie, one or more of computer program instructions encoded on a tangible, non-transitory program carrier for execution by or to control the operation of a data processing chip. multiple units. Alternatively or additionally, the program instructions may be encoded on an artificially generated propagated signal, such as a machine-generated electrical, optical or electromagnetic signal, which is generated to encode and transmit information to a suitable receiver chip for interpretation by the data. Processing chip execution. The computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of these.
本申请中描述的处理及逻辑流程可由执行一个或多个计算机程序的一个或多个可编程计算机执行,以通过根据输入数据进行操作并生成输出来执行相应的功能。上述 处理及逻辑流程还可由专用逻辑电路—例如FPGA(现场可编程门阵列)或ASIC(专用集成电路)来执行,并且芯片也可实现为专用逻辑电路。The processes and logic flows described in this application can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output. The processes and logic flows described above can also be performed by, and chips can also be implemented as, special purpose logic circuits, such as FPGAs (field programmable gate arrays) or ASICs (application specific integrated circuits).
适合用于执行计算机程序的计算机包括,例如通用和/或专用微处理器,或任何其他类型的中央处理单元。通常,中央处理单元将从只读存储器和/或随机存取存储器接收指令和数据。计算机的基本组件包括用于实施或执行指令的中央处理单元及用于存储指令和数据的一个或多个存储器设备。通常,计算机还将包括用于存储数据的一个或多个大容量存储设备,例如磁盘、磁光盘或光盘等,或者计算机将可操作地与此大容量存储设备耦接以从其接收数据或向其传送数据,抑或两种情况兼而有之。然而,计算机不是必须具有这样的设备。此外,计算机可嵌入在另一设备中,例如移动电话、个人数字助理(PDA)、移动音频或视频播放器、游戏操纵台、全球定位系统(GPS)接收机、或例如通用串行总线(USB)闪存驱动器的便携式存储设备,仅举几例。Computers suitable for the execution of a computer program include, for example, general and/or special purpose microprocessors, or any other type of central processing unit. Typically, the central processing unit will receive instructions and data from read only memory and/or random access memory. The basic components of a computer include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data. Typically, a computer will also include, or be operatively coupled to, one or more mass storage devices for storing data, such as magnetic, magneto-optical or optical disks, to receive data therefrom or to It transmits data, or both. However, the computer does not have to have such a device. Additionally, the computer may be embedded in another device, such as a mobile phone, personal digital assistant (PDA), mobile audio or video player, game console, global positioning system (GPS) receiver, or a universal serial bus (USB) ) flash drives for portable storage devices, to name a few.
适合于存储计算机程序指令和数据的计算机可读介质包括所有形式的非易失性存储器、媒介和存储器设备,例如包括半导体存储器设备(例如EPROM、EEPROM和闪存设备)、磁盘(例如内部硬盘或可移动盘)、磁光盘及0xCD_00ROM和DVD-ROM盘。处理器和存储器可由专用逻辑电路补充或并入专用逻辑电路中。Computer-readable media suitable for storage of computer program instructions and data include all forms of non-volatile memory, media, and memory devices including, for example, semiconductor memory devices (eg, EPROM, EEPROM, and flash memory devices), magnetic disks (eg, internal hard disks or memory devices). removable disk), magneto-optical disk and 0xCD_00ROM and DVD-ROM disks. The processor and memory may be supplemented by or incorporated in special purpose logic circuitry.
虽然本申请包含许多具体实施细节,但是这些不应被解释为限制任何公开的范围或所要求保护的范围,而是主要用于描述特定公开的具体实施例的特征。本申请内在多个实施例中描述的某些特征也可在单个实施例中被组合实施。另一方面,在单个实施例中描述的各种特征也可在多个实施例中分开实施或以任何合适的子组合来实施。此外,虽然特征可如上上述在某些组合中起作用并且甚至最初如此要求保护,但是来自所要求保护的组合中的一个或多个特征在一些情况下可从该组合中去除,并且所要求保护的组合可指向子组合或子组合的变型。While this application contains many specific implementation details, these should not be construed as limiting the scope of any disclosure or what may be claimed, but rather are used primarily to describe features of particular disclosed specific embodiments. Certain features that are described herein in the context of multiple embodiments can also be implemented in combination in a single embodiment. On the other hand, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although features may function as above in certain combinations and are even originally claimed as such, one or more features from a claimed combination may in some instances be removed from the combination and claimed A combination of can point to a subcombination or a variation of a subcombination.
类似地,虽然在附图中以特定顺序描绘了操作,但是这不应被理解为要求这些操作以所示的特定顺序执行或顺次执行、或者要求所有例示的操作被执行,以实现期望的结果。在某些情况下,多任务和并行处理可能是有利的。此外,上述实施例中的各种系统单元和组件的分离不应被理解为在所有实施例中均需要这样的分离,并且应当理解,所描述的程序组件和系统通常可一起集成在单个软件产品中,或者封装成多个软件产品。Similarly, although operations are depicted in the figures in a particular order, this should not be construed as requiring the operations to be performed in the particular order shown or sequentially, or that all illustrated operations be performed, to achieve the desired result. In some cases, multitasking and parallel processing may be advantageous. Furthermore, the separation of the various system elements and components in the above-described embodiments should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems may generally be integrated together in a single software product , or packaged into multiple software products.
由此,主题的特定实施例已被描述。其他实施例在所附权利要求书的范围以内。在某些情况下,权利要求书中记载的动作可以不同的顺序执行并且仍实现期望的结果。此外,附图中描绘的处理并非必需所示的特定顺序或顺次顺序,以实现期望的结果。在某些实现中,多任务和并行处理可能是有利的。Thus, specific embodiments of the subject matter have been described. Other embodiments are within the scope of the appended claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. Furthermore, the processes depicted in the figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some implementations, multitasking and parallel processing may be advantageous.
以上仅为本申请的部分实施例,并不用以限制本申请一个或多个实施例,凡在本申请一个或多个实施例的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请一个或多个实施例保护的范围之内。The above are only some embodiments of the present application, and are not intended to limit one or more embodiments of the present application. Any modifications, equivalent replacements, and improvements made within the spirit and principles of one or more embodiments of the present application etc., shall be included within the protection scope of one or more embodiments of the present application.

Claims (28)

  1. 一种池化方法,包括:A pooling method including:
    获取目标特征图;Get the target feature map;
    对所述目标特征图进行拆分,得到若干子特征图;其中,所述目标特征图中处于同一池化窗口内的至少部分像素值分别处于不同的子特征图,各池化窗口内处于同一位置的像素值处于同一子特征图;Splitting the target feature map to obtain several sub-feature maps; wherein, at least part of the pixel values in the same pooling window in the target feature map are in different sub-feature maps, and each pooling window is in the same sub-feature map The pixel value of the position is in the same sub-feature map;
    对各子特征图中属于不同池化窗口的像素并行处理,得到所述目标特征图对应的池化结果。Pixels belonging to different pooling windows in each sub-feature map are processed in parallel to obtain a pooling result corresponding to the target feature map.
  2. 根据权利要求1所述的方法,其特征在于,所述对各子特征图中属于不同池化窗口的像素并行处理,得到所述目标特征图对应的池化结果,包括:The method according to claim 1, wherein the parallel processing of pixels belonging to different pooling windows in each sub-feature map to obtain a pooling result corresponding to the target feature map, comprising:
    将所述各子特征图分别包括的像素值加载至移位寄存器阵列中;Loading the pixel values respectively included in the sub-feature maps into the shift register array;
    根据池化指令,对所述各子特征图中处于同一位置的像素值并行池化处理,得到与所述目标特征图对应的池化结果。According to the pooling instruction, the pixel values in the same position in each of the sub-feature maps are pooled in parallel to obtain a pooling result corresponding to the target feature map.
  3. 根据权利要求1所述的方法,其特征在于,所述对各子特征图中属于不同池化窗口的像素并行处理,得到所述目标特征图对应的池化结果,包括:The method according to claim 1, wherein the parallel processing of pixels belonging to different pooling windows in each sub-feature map to obtain a pooling result corresponding to the target feature map, comprising:
    根据各子特征图中处于同一池化窗口的像素值所处的位置,确定各子特征图分别对应的移位寄存器阵列的移位操作方式;Determine the shift operation mode of the shift register array corresponding to each sub-feature map according to the position of the pixel value in the same pooling window in each sub-feature map;
    将所述若干子特征图分别加载至移位寄存器阵列中;Loading the several sub-feature maps into the shift register array respectively;
    针对每个所述子特征图,按照为该子特征图确定的移位寄存器阵列的移位操作方式对所述移位寄存器阵列中的移位寄存器存储的像素值执行移位操作,根据池化指令并行池化处理,得到该子特征图中对应不同池化窗口的部分池化结果;For each of the sub-feature maps, a shift operation is performed on the pixel values stored in the shift registers in the shift register array according to the shift operation mode of the shift register array determined for the sub-feature map. Instruct parallel pooling processing to obtain partial pooling results corresponding to different pooling windows in the sub-feature map;
    根据各个子特征图对应不同池化窗口的部分池化结果,确定与所述目标特征图对应的池化结果。According to the partial pooling results of each sub-feature map corresponding to different pooling windows, the pooling result corresponding to the target feature map is determined.
  4. 根据权利要求1-3任一所述的方法,其特征在于,所述对所述目标特征图进行拆分,得到若干子特征图,包括:The method according to any one of claims 1-3, wherein the target feature map is split to obtain several sub-feature maps, including:
    将所述目标特征图中处于奇数行、奇数列位置的像素值确定为第一子特征图;Determining the pixel values at odd-numbered rows and odd-numbered column positions in the target feature map as the first sub-feature map;
    将所述目标特征图中处于奇数行、偶数列位置的像素值确定为第二子特征图;Determining the pixel values at odd-numbered rows and even-numbered column positions in the target feature map as the second sub-feature map;
    将所述目标特征图中处于偶数行、奇数列位置的像素值确定为第三子特征图;Determining the pixel values in the even-numbered rows and odd-numbered column positions in the target feature map as the third sub-feature map;
    将所述目标特征图中处于偶数行、偶数列位置的像素值确定为第四子特征图。The pixel values in the even-numbered rows and even-numbered columns in the target feature map are determined as the fourth sub-feature map.
  5. 根据权利要求4所述的方法,其特征在于,所述将所述若干子特征图分别包括的像素值加载至移位寄存器阵列中,根据池化指令,对所述若干子特征图中处于同一位置的像素值并行池化处理,得到与所述目标特征图对应的池化结果,包括:The method according to claim 4, wherein the pixel values respectively included in the sub-feature maps are loaded into the shift register array, and according to a pooling instruction, the sub-feature maps in the same The pixel values of the positions are pooled in parallel to obtain the pooling results corresponding to the target feature map, including:
    将所述第一子特征图包括的各像素值分别搬移至所述移位寄存器阵列包括的至少部分移位寄存器中;moving each pixel value included in the first sub-feature map to at least part of the shift registers included in the shift register array;
    将所述第二子特征图包括的各像素值分别搬移至所述部分移位寄存器中,以使与所述部分移位寄存器中各移位寄存器对应的计算内核根据所述池化指令,对接收的两个像素值进行池化处理,得到第一池化处理结果;Move each pixel value included in the second sub-feature map to the partial shift register, so that the calculation kernel corresponding to each shift register in the partial shift register, according to the pooling instruction, The received two pixel values are pooled to obtain the first pooling result;
    将所述第三子特征图包括的各像素值分别搬移至所述部分移位寄存器中,以使与所述部分移位寄存器中各移位寄存器对应的计算内核根据所述池化指令,对所述第一池化处理结果与接收的像素值进行池化处理,得到第二池化处理结果;Move each pixel value included in the third sub-feature map to the partial shift register, so that the calculation kernel corresponding to each shift register in the partial shift register, according to the pooling instruction, The first pooling processing result and the received pixel value are subjected to pooling processing to obtain a second pooling processing result;
    将所述第四子特征图包括的各像素值分别搬移至所述部分移位寄存器中,以使与所述部分移位寄存器中各移位寄存器对应的计算内核根据所述池化指令,对所述第二池化处理结果与接收的像素值进行池化处理,得到第三池化处理结果;Move each pixel value included in the fourth sub-feature map to the partial shift register, so that the calculation kernel corresponding to each shift register in the partial shift register, according to the pooling instruction, The second pooling processing result and the received pixel value are subjected to pooling processing to obtain a third pooling processing result;
    输出与所述部分移位寄存器中各移位寄存器对应的计算内核分别进行池化处理得到的第三池化处理结果,得到与所述目标特征图对应的池化结果。A third pooling processing result obtained by performing pooling processing on the computing kernels corresponding to each of the shift registers in the partial shift registers is output, and a pooling result corresponding to the target feature map is obtained.
  6. 根据权利要求5所述的方法,其特征在于,所述池化处理包括最大池化处理;所述池化指令包括比较两者之间的最大值;The method according to claim 5, wherein the pooling process comprises a maximum pooling process; the pooling instruction comprises comparing the maximum value between the two;
    所述将所述第一子特征图包括的各像素值分别搬移至所述移位寄存器阵列包括的至少部分移位寄存器中,包括:The step of moving each pixel value included in the first sub-feature map to at least part of the shift registers included in the shift register array includes:
    将所述第一子特征图包括的各像素值分别搬移至所述移位寄存器阵列包括的至少部分移位寄存器的第一寄存器中;moving each pixel value included in the first sub-feature map to the first register of at least part of the shift registers included in the shift register array;
    所述将所述第二子特征图包括的各像素值分别搬移至所述部分移位寄存器中,以使 与所述部分移位寄存器中各移位寄存器对应的计算内核根据所述池化指令,对接收的两个像素值进行池化处理,得到第一池化处理结果,包括:moving each pixel value included in the second sub-feature map to the partial shift register, so that the calculation kernel corresponding to each shift register in the partial shift register is based on the pooling instruction , perform pooling processing on the two received pixel values, and obtain the first pooling processing result, including:
    将所述第二子特征图包括的各像素值分别搬移至所述部分移位寄存器的第二寄存器中,以使与所述部分移位寄存器中各移位寄存器对应的计算内核根据所述池化指令,获取所述第一寄存器与所述第二寄存器中存储的数值中的最大值,并将所述最大值作为所述第一池化处理结果存储在所述第一寄存器中。Each pixel value included in the second sub-feature map is moved to the second register of the partial shift register, so that the calculation kernel corresponding to each shift register in the partial shift register is based on the pool. A pooling instruction to obtain the maximum value among the values stored in the first register and the second register, and store the maximum value in the first register as the first pooling processing result.
  7. 根据权利要求6所述的方法,其特征在于,所述将所述第三子特征图包括的各像素值分别搬移至所述部分移位寄存器中,以使与所述部分移位寄存器中各移位寄存器对应的计算内核根据所述池化指令,对所述第一池化处理结果与接收的像素值进行池化处理,得到第二池化处理结果,包括:The method according to claim 6, wherein the moving each pixel value included in the third sub-feature map to the partial shift register respectively, so as to be different from the respective pixel values in the partial shift register. The computing kernel corresponding to the shift register performs pooling processing on the first pooling processing result and the received pixel value according to the pooling instruction to obtain a second pooling processing result, including:
    将所述第三子特征图包括的各像素值分别搬移至所述部分移位寄存器的第二寄存器中,以使与所述部分移位寄存器中各移位寄存器对应的计算内核根据所述池化指令,获取所述第一寄存器与所述第二寄存器中存储的数值中的最大值,并将所述最大值作为所述第二池化处理结果存储在所述第一寄存器中;Each pixel value included in the third sub-feature map is moved to the second register of the partial shift register, so that the calculation kernel corresponding to each shift register in the partial shift register is based on the pool. a pooling instruction, obtain the maximum value among the values stored in the first register and the second register, and store the maximum value in the first register as the second pooling processing result;
    所述将所述第四子特征图包括的各像素值分别搬移至所述部分移位寄存器中,以使与所述部分移位寄存器中各移位寄存器对应的计算内核根据所述池化指令,对所述第二池化处理结果与接收的像素值进行池化处理,得到第三池化处理结果,包括:moving each pixel value included in the fourth sub-feature map to the partial shift register, so that the calculation kernel corresponding to each shift register in the partial shift register is based on the pooling instruction , and perform pooling processing on the second pooling processing result and the received pixel values to obtain a third pooling processing result, including:
    将所述第四子特征图包括的各像素值分别搬移至所述部分移位寄存器的第二寄存器中,以使与所述部分移位寄存器中各移位寄存器对应的计算内核根据所述池化指令,获取所述第一寄存器与所述第二寄存器中存储的数值中的最大值,并将所述最大值作为所述第三池化处理结果存储在所述第一寄存器中。Each pixel value included in the fourth sub-feature map is moved to the second register of the partial shift register, so that the calculation kernel corresponding to each shift register in the partial shift register is based on the pool. A pooling instruction is used to obtain the maximum value among the values stored in the first register and the second register, and store the maximum value in the first register as the third pooling processing result.
  8. 根据权利要求7所述的方法,其特征在于,所述输出与所述部分移位寄存器中各移位寄存器对应的计算内核分别进行池化处理得到的第三池化处理结果,得到与所述目标特征图对应的池化结果,包括:The method according to claim 7, wherein the third pooling processing result obtained by outputting the calculation kernel corresponding to each shift register in the partial shift registers is obtained by pooling processing respectively, and obtaining the same result as the Pooling results corresponding to the target feature map, including:
    输出所述部分移位寄存器的第一寄存器中存储的数值,得到与所述目标特征图对应的池化结果。The value stored in the first register of the partial shift register is output to obtain a pooling result corresponding to the target feature map.
  9. 根据权利要求4所述的方法,其特征在于,所述对各子特征图中属于不同池化窗口的像素并行处理,得到所述目标特征图对应的池化结果,包括:The method according to claim 4, wherein the parallel processing of pixels belonging to different pooling windows in each sub-feature map to obtain a pooling result corresponding to the target feature map, comprising:
    将所述第一子特征图包括的各像素值分别搬移至所述移位寄存器阵列包括的至少部分移位寄存器中;moving each pixel value included in the first sub-feature map to at least part of the shift registers included in the shift register array;
    根据所述池化指令,对所述移位寄存器阵列包括的任意四个上下左右相邻的移位寄存器中的像素值进行池化操作,得到第一部分池化处理结果,并将所述第一部分池化处理结果存储至所述四个上下左右相邻的移位寄存器中处于预设位置的目标移位寄存器内;According to the pooling instruction, a pooling operation is performed on the pixel values in any four adjacent shift registers included in the shift register array to obtain a first partial pooling result, and the first partial pooling result is obtained. The pooling processing result is stored in the target shift register in the preset position among the four adjacent shift registers;
    将所述第二子特征图包括的各像素值分别搬移至所述部分移位寄存器中;moving each pixel value included in the second sub-feature map to the partial shift register;
    根据所述池化指令,对所述移位寄存器阵列包括的任意两个上下相邻的移位寄存器中的像素值进行池化操作,得到第二部分池化处理结果,并将所述第二部分池化处理结果存储至所述目标移位寄存器;According to the pooling instruction, a pooling operation is performed on the pixel values in any two vertically adjacent shift registers included in the shift register array to obtain a second partial pooling result, and the second partial pooling result is obtained. Partial pooling processing results are stored in the target shift register;
    将所述第三子特征图包括的各像素值分别搬移至所述部分移位寄存器中;moving each pixel value included in the third sub-feature map to the partial shift register;
    根据所述池化指令,对所述移位寄存器阵列包括的任意两个左右相邻的移位寄存器中的像素值进行池化操作,得到第三部分池化处理结果,并将所述第三部分池化处理结果存储至所述目标移位寄存器;According to the pooling instruction, a pooling operation is performed on the pixel values in any two left and right adjacent shift registers included in the shift register array to obtain a third partial pooling result, and the third partial pooling result is obtained. Partial pooling processing results are stored in the target shift register;
    将所述第四子特征图包括的各像素值分别搬移至所述部分移位寄存器中,并将所述部分移位寄存器中各移位寄存器中的像素值搬移至所述目标移位寄存器;moving each pixel value included in the fourth sub-feature map to the partial shift register respectively, and moving the pixel value in each shift register in the partial shift register to the target shift register;
    根据所述移位寄存器阵列包括的各目标移位寄存器内的第一部分池化处理结果、第二部分池化处理结果、第三部分池化处理结果及所述第四子特征图,得到与所述目标特征图对应的池化结果。According to the first partial pooling processing result, the second partial pooling processing result, the third partial pooling processing result and the fourth sub-feature map in each target shift register included in the shift register array, the corresponding result is obtained. The pooling result corresponding to the target feature map.
  10. 根据权利要求9所述的方法,其特征在于,所述池化处理包括最大池化处理;所述池化指令包括比较两者之间的最大值;所述预设位置包括任意四个上下左右相邻的移位寄存器中的左下角位置;所述目标移位寄存器包括所述四个上下左右相邻的移位寄存器中的左下角位置的移位寄存器;The method according to claim 9, wherein the pooling process includes maximum pooling process; the pooling instruction includes comparing the maximum value between the two; the preset position includes any four up, down, left and right the lower left corner position in the adjacent shift registers; the target shift register includes the shift register at the lower left corner position in the four adjacent shift registers;
    所述将所述第一子特征图包括的各像素值分别搬移至所述移位寄存器阵列包括的 至少部分移位寄存器中,包括:Described moving each pixel value included in the first sub-feature map to at least part of the shift registers included in the shift register array, including:
    将所述第一子特征图包括的各像素值分别搬移至所述移位寄存器阵列包括的至少部分移位寄存器的第一寄存器中;moving each pixel value included in the first sub-feature map to the first register of at least part of the shift registers included in the shift register array;
    所述根据所述池化指令,对所述移位寄存器阵列包括的任意四个上下左右相邻的移位寄存器中的像素值进行池化操作,得到第一部分池化处理结果,并将所述第一部分池化处理结果存储至所述四个上下左右相邻的移位寄存器中处于预设位置的目标移位寄存器内,包括:The pooling operation is performed on the pixel values in any four adjacent shift registers included in the shift register array according to the pooling instruction to obtain a first partial pooling result, and the The first part of the pooling processing result is stored in the target shift register at the preset position among the four adjacent shift registers, including:
    将所述部分移位寄存器中各第一移位寄存器的第一寄存器中的数值搬移至该第一移位寄存器右方的第二移位寄存器的第二寄存器中;moving the value in the first register of each first shift register in the partial shift register to the second register of the second shift register on the right side of the first shift register;
    将各第二移位寄存器中第一寄存器与第二寄存器中较大的数值存储至各第二移位寄存器的第一寄存器中;storing the larger value in the first register and the second register in each second shift register into the first register in each second shift register;
    将所述各第二移位寄存器中第一寄存器中的数值搬移至该第二移位寄存器下方的第三移位寄存器的第二寄存器中;moving the value in the first register of the second shift registers to the second register of the third shift register below the second shift register;
    将各第三移位寄存器中第一寄存器与第二寄存器中较大的数值存储至各第三移位寄存器的第一寄存器中;storing the larger value in the first register and the second register in each third shift register into the first register in each third shift register;
    将所述各第三移位寄存器中第一寄存器中的数值搬移至该第三移位寄存器左方的目标移位寄存器的第二寄存器中;moving the value in the first register of the third shift registers to the second register of the target shift register on the left side of the third shift register;
    将各目标移位寄存器中第一寄存器与第二寄存器中较大的数值作为所述第一部分池化处理结果,存储至各目标移位寄存器的第一寄存器中。The larger value in the first register and the second register in each target shift register is used as the first partial pooling processing result, and is stored in the first register of each target shift register.
  11. 根据权利要求10所述的方法,其特征在于,所述将所述第二子特征图包括的各像素值分别搬移至所述部分移位寄存器中,包括:The method according to claim 10, wherein the moving each pixel value included in the second sub-feature map to the partial shift register respectively comprises:
    将所述第二子特征图包括的各像素值分别搬移至所述部分移位寄存器的第二寄存器中;moving each pixel value included in the second sub-feature map to the second register of the partial shift register;
    所述根据所述池化指令,对所述移位寄存器阵列包括的任意两个上下相邻的移位寄存器中的像素值进行池化操作,得到第二部分池化处理结果,并将所述第二部分池化处理结果存储至所述目标移位寄存器,包括:According to the pooling instruction, a pooling operation is performed on the pixel values in any two upper and lower adjacent shift registers included in the shift register array to obtain a second partial pooling result, and the The second part of the pooling processing result is stored in the target shift register, including:
    将所述部分移位寄存器中各第一移位寄存器的第二寄存器中的数值搬移至该第一移位寄存器下方的目标移位寄存器的第三寄存器中;moving the value in the second register of each first shift register in the partial shift register to the third register of the target shift register below the first shift register;
    将各目标移位寄存器中第二寄存器与第三寄存器中较大的数值作为所述第二部分池化处理结果,存储至各目标移位寄存器的第二寄存器中。The larger value in the second register and the third register in each target shift register is used as the second partial pooling processing result, and is stored in the second register of each target shift register.
  12. 根据权利要求11所述的方法,其特征在于,所述将所述第三子特征图包括的各像素值分别搬移至所述部分移位寄存器中,包括:The method according to claim 11, wherein the moving each pixel value included in the third sub-feature map to the partial shift register respectively comprises:
    将所述第三子特征图包括的各像素值分别搬移至所述部分移位寄存器的第三寄存器中;moving each pixel value included in the third sub-feature map to the third register of the partial shift register;
    所述根据所述池化指令,对所述移位寄存器阵列包括的任意两个左右相邻的移位寄存器中的像素值进行池化操作,得到第三部分池化处理结果,并将所述第三部分池化处理结果存储至所述目标移位寄存器,包括:The pooling operation is performed on the pixel values in any two left and right adjacent shift registers included in the shift register array according to the pooling instruction to obtain a third partial pooling result, and the The third part of the pooling processing result is stored in the target shift register, including:
    将所述部分移位寄存器中各第一移位寄存器的第三寄存器中的数值搬移至该第一移位寄存器右方的第二移位寄存器的第四寄存器中;moving the value in the third register of each first shift register in the partial shift register to the fourth register of the second shift register on the right side of the first shift register;
    将各第二移位寄存器中第三寄存器与第四寄存器中较大的数值作为所述第三部分池化处理结果,存储至各第二移位寄存器的第三寄存器中;Taking the larger value of the third register and the fourth register in each second shift register as the third partial pooling processing result, and storing it in the third register of each second shift register;
    将所述第二移位寄存器的第三寄存器中的第三部分池化处理结果,搬移至该第二移位寄存器左方的第一移位寄存器的第三寄存器中;moving the third part of the pooling processing result in the third register of the second shift register to the third register of the first shift register on the left side of the second shift register;
    将所述第一移位寄存器的第三寄存器中的第三部分池化处理结果,搬移至该第一移位寄存器下方的目标移位寄存器的第三寄存器中。The third part of the pooling processing result in the third register of the first shift register is moved to the third register of the target shift register below the first shift register.
  13. 根据权利要求12所述的方法,其特征在于,所述将所述第四子特征图包括的各像素值分别搬移至所述部分移位寄存器中,包括:The method according to claim 12, wherein the moving each pixel value included in the fourth sub-feature map to the partial shift register respectively comprises:
    将所述第四子特征图包括的各像素值分别搬移至所述部分移位寄存器的第四寄存器中;moving each pixel value included in the fourth sub-feature map to the fourth register of the partial shift register;
    所述将所述部分移位寄存器中各移位寄存器中的像素值搬移至所述目标移位寄存器,包括:The moving the pixel values in each shift register in the partial shift registers to the target shift register includes:
    将所述部分移位寄存器中各第一移位寄存器的第四寄存器中的数值搬移至该第一 移位寄存器下方的目标移位寄存器的第四寄存器中。The value in the fourth register of each first shift register in the partial shift registers is moved to the fourth register of the target shift register below the first shift register.
  14. 根据权利要求13所述的方法,其特征在于,所述根据所述移位寄存器阵列包括的各目标移位寄存器内的第一部分池化处理结果、第二部分池化处理结果、第三部分池化处理结果及所述第四子特征图对应的像素值,得到与所述目标特征图对应的池化结果,包括:The method according to claim 13, wherein, according to the first partial pooling processing result, the second partial pooling processing result and the third partial pooling processing result in each target shift register included in the shift register array The pooling result and the pixel value corresponding to the fourth sub-feature map are obtained, and the pooling result corresponding to the target feature map is obtained, including:
    将各目标移位寄存器中的第一寄存器与第二寄存器中较大的数值存储至各目标移位寄存器的第一寄存器中;storing the larger value in the first register and the second register in each target shift register into the first register in each target shift register;
    将各目标移位寄存器中的第一寄存器与第三寄存器中较大的数值存储至各目标移位寄存器的第一寄存器中;storing the larger value in the first register and the third register in each target shift register into the first register in each target shift register;
    将各目标移位寄存器中的第一寄存器与第四寄存器中较大的数值存储至各目标移位寄存器的第一寄存器中;storing the larger value in the first register and the fourth register in each target shift register into the first register in each target shift register;
    输出各目标移位寄存器的第一寄存器中存储的数值,得到与所述目标特征图对应的池化结果。The value stored in the first register of each target shift register is output, and the pooling result corresponding to the target feature map is obtained.
  15. 根据权利要求9-14任一所述的方法,其特征在于,所述移位寄存器阵列外围连接了多个临时寄存器;所述临时寄存器用于存储进行数值搬移操作时,溢出所述移位寄存器阵列的像素值。The method according to any one of claims 9-14, wherein a plurality of temporary registers are connected to the periphery of the shift register array; and the temporary registers are used to store and overflow the shift register when performing a value moving operation. Array of pixel values.
  16. 根据权利要求1-15任一所述的方法,其特征在于,所述各子特征图中至少部分子特征图包括的像素点数量与所述移位寄存器阵列包括的移位寄存器数量一致。The method according to any one of claims 1-15, wherein the number of pixels included in at least part of the sub-feature maps in each of the sub-feature maps is consistent with the number of shift registers included in the shift register array.
  17. 一种池化方法,包括:A pooling method including:
    获取原始特征图;Get the original feature map;
    将所述原始特征图划分为若干目标特征图;dividing the original feature map into several target feature maps;
    根据权利要求1-16任一所述的池化方法对各目标特征图进行池化处理,得到各目标特征图对应的池化结果;According to the pooling method according to any one of claims 1-16, each target feature map is pooled to obtain a pooling result corresponding to each target feature map;
    输出各目标特征图对应的池化结果,得到所述原始特征图对应的池化结果。The pooling result corresponding to each target feature map is output, and the pooling result corresponding to the original feature map is obtained.
  18. 一种芯片,包括控制器;A chip including a controller;
    所述控制器,用于获取目标特征图;the controller, for acquiring the target feature map;
    对所述目标特征图进行拆分,得到若干子特征图;其中,所述目标特征图中处于同一池化窗口内的至少部分像素值分别处于不同的子特征图,各池化窗口内处于同一位置的像素值处于同一子特征图;Splitting the target feature map to obtain several sub-feature maps; wherein, at least part of the pixel values in the same pooling window in the target feature map are in different sub-feature maps, and each pooling window is in the same sub-feature map The pixel value of the position is in the same sub-feature map;
    对各子特征图中属于不同池化窗口的像素并行处理,得到所述目标特征图对应的池化结果。Pixels belonging to different pooling windows in each sub-feature map are processed in parallel to obtain a pooling result corresponding to the target feature map.
  19. 根据权利要求18所述的芯片,其特征在于,所述控制器用于:The chip according to claim 18, wherein the controller is used for:
    将所述各子特征图分别包括的像素值加载至移位寄存器阵列中;Loading the pixel values respectively included in the sub-feature maps into the shift register array;
    根据池化指令,对所述各子特征图中处于同一位置的像素值并行池化处理,得到与所述目标特征图对应的池化结果。According to the pooling instruction, the pixel values in the same position in each of the sub-feature maps are pooled in parallel to obtain a pooling result corresponding to the target feature map.
  20. 根据权利要求18所述的芯片,其特征在于,所述控制器用于:The chip according to claim 18, wherein the controller is used for:
    根据各子特征图中处于同一池化窗口的像素值所处的位置,确定各子特征图分别对应的移位寄存器阵列的移位操作方式;Determine the shift operation mode of the shift register array corresponding to each sub-feature map according to the position of the pixel value in the same pooling window in each sub-feature map;
    将所述若干子特征图分别加载至移位寄存器阵列中,并针对每个子特征图,按照为该子特征图确定的移位寄存器阵列的移位操作方式对所述移位寄存器阵列中的移位寄存器存储的像素值执行移位操作,根据池化指令并行池化处理得到该子特征图中对应不同池化窗口的部分池化结果;The several sub-feature maps are respectively loaded into the shift register array, and for each sub-feature map, the shift registers in the shift register array are shifted according to the shift operation mode of the shift register array determined for the sub-feature map. The pixel value stored in the bit register is subjected to a shift operation, and the partial pooling results corresponding to different pooling windows in the sub-feature map are obtained by parallel pooling processing according to the pooling instruction;
    根据各个子特征图对应不同池化窗口的部分池化结果,确定与所述目标特征图对应的池化结果。According to the partial pooling results of each sub-feature map corresponding to different pooling windows, the pooling result corresponding to the target feature map is determined.
  21. 根据权利要求18-20任一所述的芯片,其特征在于,所述控制器用于:The chip according to any one of claims 18-20, wherein the controller is used for:
    将所述目标特征图中处于奇数行、奇数列位置的像素值确定为第一子特征图;Determining the pixel values at odd-numbered rows and odd-numbered column positions in the target feature map as the first sub-feature map;
    将所述目标特征图中处于奇数行、偶数列位置的像素值确定为第二子特征图;Determining the pixel values at odd-numbered rows and even-numbered column positions in the target feature map as the second sub-feature map;
    将所述目标特征图中处于偶数行、奇数列位置的像素值确定为第三子特征图;Determining the pixel values in the even-numbered rows and odd-numbered column positions in the target feature map as the third sub-feature map;
    将所述目标特征图中处于偶数行、偶数列位置的像素值确定为第四子特征图。The pixel values in the even-numbered rows and even-numbered columns in the target feature map are determined as the fourth sub-feature map.
  22. 根据权利要求21所述的芯片,其特征在于,所述控制器用于:The chip according to claim 21, wherein the controller is used for:
    将所述第一子特征图包括的各像素值分别搬移至所述移位寄存器阵列包括的至少部分移位寄存器中;moving each pixel value included in the first sub-feature map to at least part of the shift registers included in the shift register array;
    将所述第二子特征图包括的各像素值分别搬移至所述部分移位寄存器中,以使与所述部分移位寄存器中各移位寄存器对应的计算内核根据所述池化指令,对接收的两个像素值进行池化处理,得到第一池化处理结果;Move each pixel value included in the second sub-feature map to the partial shift register, so that the calculation kernel corresponding to each shift register in the partial shift register, according to the pooling instruction, The received two pixel values are pooled to obtain the first pooling result;
    将所述第三子特征图包括的各像素值分别搬移至所述部分移位寄存器中,以使与所述部分移位寄存器中各移位寄存器对应的计算内核根据所述池化指令,对所述第一池化处理结果与接收的像素值进行池化处理,得到第二池化处理结果;Move each pixel value included in the third sub-feature map to the partial shift register, so that the calculation kernel corresponding to each shift register in the partial shift register, according to the pooling instruction, The first pooling processing result and the received pixel value are subjected to pooling processing to obtain a second pooling processing result;
    将所述第四子特征图包括的各像素值分别搬移至所述部分移位寄存器中,以使与所述部分移位寄存器中各移位寄存器对应的计算内核根据所述池化指令,对所述第二池化处理结果与接收的像素值进行池化处理,得到第三池化处理结果;Move each pixel value included in the fourth sub-feature map to the partial shift register, so that the calculation kernel corresponding to each shift register in the partial shift register, according to the pooling instruction, The second pooling processing result and the received pixel value are subjected to pooling processing to obtain a third pooling processing result;
    输出与所述部分移位寄存器中各移位寄存器对应的计算内核分别进行池化处理得到的第三池化处理结果,得到与所述目标特征图对应的池化结果。A third pooling processing result obtained by performing pooling processing on the computing kernels corresponding to each of the shift registers in the partial shift registers is output, and a pooling result corresponding to the target feature map is obtained.
  23. 根据权利要求22所述的芯片,其特征在于,所述池化处理包括最大池化处理;所述池化指令包括比较两者之间的最大值;所述控制器用于:The chip according to claim 22, wherein the pooling process includes a maximum pooling process; the pooling instruction includes comparing the maximum value between the two; the controller is configured to:
    将所述第一子特征图包括的各像素值分别搬移至所述移位寄存器阵列包括的至少部分移位寄存器的第一寄存器中;及,moving each pixel value included in the first sub-feature map to a first register of at least a part of the shift registers included in the shift register array; and,
    将所述第二子特征图包括的各像素值分别搬移至所述部分移位寄存器的第二寄存器中,以使与所述部分移位寄存器中各移位寄存器对应的计算内核根据所述池化指令,获取所述第一寄存器与所述第二寄存器中存储的数值中的最大值,并将所述最大值作为所述第一池化处理结果存储在所述第一寄存器中。Each pixel value included in the second sub-feature map is moved to the second register of the partial shift register, so that the calculation kernel corresponding to each shift register in the partial shift register is based on the pool. A pooling instruction to obtain the maximum value among the values stored in the first register and the second register, and store the maximum value in the first register as the first pooling processing result.
  24. 根据权利要求23所述的芯片,其特征在于,所述控制器用于:The chip according to claim 23, wherein the controller is used for:
    将所述第三子特征图包括的各像素值分别搬移至所述部分移位寄存器的第二寄存器中,以使与所述部分移位寄存器中各移位寄存器对应的计算内核根据所述池化指令,获取所述第一寄存器与所述第二寄存器中存储的数值中的最大值,并将所述最大值作为所述第二池化处理结果存储在所述第一寄存器中;及,Each pixel value included in the third sub-feature map is moved to the second register of the partial shift register, so that the calculation kernel corresponding to each shift register in the partial shift register is based on the pool. a pooling instruction, obtain the maximum value among the values stored in the first register and the second register, and store the maximum value in the first register as the second pooling processing result; and,
    将所述第四子特征图包括的各像素值分别搬移至所述部分移位寄存器的第二寄存器中,以使与所述部分移位寄存器中各移位寄存器对应的计算内核根据所述池化指令,获取所述第一寄存器与所述第二寄存器中存储的数值中的最大值,并将所述最大值作为所述第三池化处理结果存储在所述第一寄存器中。Each pixel value included in the fourth sub-feature map is moved to the second register of the partial shift register, so that the calculation kernel corresponding to each shift register in the partial shift register is based on the pool. A pooling instruction is used to obtain the maximum value among the values stored in the first register and the second register, and store the maximum value in the first register as the third pooling processing result.
  25. 根据权利要求24所述的芯片,其特征在于,所述控制器用于:The chip of claim 24, wherein the controller is used to:
    输出所述部分移位寄存器的第一寄存器中存储的数值,得到与所述目标特征图对应的池化结果。The value stored in the first register of the partial shift register is output to obtain a pooling result corresponding to the target feature map.
  26. 一种芯片,包括控制器;A chip including a controller;
    所述控制器,用于获取原始特征图;the controller, for obtaining the original feature map;
    将所述原始特征图划分为若干目标特征图;dividing the original feature map into several target feature maps;
    根据权利要求1至16任一所述的池化方法对各目标特征图进行池化处理,得到各目标特征图对应的池化结果;According to the pooling method according to any one of claims 1 to 16, each target feature map is pooled to obtain a pooling result corresponding to each target feature map;
    输出各目标特征图对应的池化结果,得到所述原始特征图对应的池化结果。The pooling result corresponding to each target feature map is output, and the pooling result corresponding to the original feature map is obtained.
  27. 一种电子设备,包括如权利要求18-25任一或权利要求26所述的芯片。An electronic device comprising the chip of any one of claims 18-25 or claim 26.
  28. 一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被控制器执行时实现权利要求1至17中任一项所述的方法。A computer-readable storage medium having a computer program stored thereon, the computer program implementing the method of any one of claims 1 to 17 when executed by a controller.
PCT/CN2021/115667 2021-01-29 2021-08-31 Pooling method, and chip, device and storage medium WO2022160703A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110127626.9 2021-01-29
CN202110127626.9A CN112862667A (en) 2021-01-29 2021-01-29 Pooling method, chip, equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2022160703A1 true WO2022160703A1 (en) 2022-08-04

Family

ID=75986898

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/115667 WO2022160703A1 (en) 2021-01-29 2021-08-31 Pooling method, and chip, device and storage medium

Country Status (2)

Country Link
CN (2) CN112862667A (en)
WO (1) WO2022160703A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112862667A (en) * 2021-01-29 2021-05-28 成都商汤科技有限公司 Pooling method, chip, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110135556A (en) * 2019-04-04 2019-08-16 平安科技(深圳)有限公司 Neural network accelerated method, device, computer equipment and storage medium based on systolic arrays
US20200175313A1 (en) * 2018-12-03 2020-06-04 Samsung Electronics Co., Ltd. Method and apparatus with dilated convolution
US20200302215A1 (en) * 2017-10-25 2020-09-24 Nec Corporation Information processing apparatus, information processing method, and non-transitory computer readable medium
CN112862667A (en) * 2021-01-29 2021-05-28 成都商汤科技有限公司 Pooling method, chip, equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110232665B (en) * 2019-06-13 2021-08-20 Oppo广东移动通信有限公司 Maximum pooling method and device, computer equipment and storage medium
CN110490813B (en) * 2019-07-05 2021-12-17 特斯联(北京)科技有限公司 Feature map enhancement method, device, equipment and medium for convolutional neural network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200302215A1 (en) * 2017-10-25 2020-09-24 Nec Corporation Information processing apparatus, information processing method, and non-transitory computer readable medium
US20200175313A1 (en) * 2018-12-03 2020-06-04 Samsung Electronics Co., Ltd. Method and apparatus with dilated convolution
CN110135556A (en) * 2019-04-04 2019-08-16 平安科技(深圳)有限公司 Neural network accelerated method, device, computer equipment and storage medium based on systolic arrays
CN112862667A (en) * 2021-01-29 2021-05-28 成都商汤科技有限公司 Pooling method, chip, equipment and storage medium
CN113052760A (en) * 2021-01-29 2021-06-29 成都商汤科技有限公司 Pooling method, chip, equipment and storage medium

Also Published As

Publication number Publication date
CN113052760A (en) 2021-06-29
CN112862667A (en) 2021-05-28

Similar Documents

Publication Publication Date Title
EP3612936B1 (en) Reducing power consumption in a neural network processor by skipping processing operations
US11966583B2 (en) Data pre-processing method and device, and related computer device and storage medium
US20200074288A1 (en) Convolution operation processing method and related product
CN109063825B (en) Convolutional neural network accelerator
TWI811291B (en) Deep learning accelerator and method for accelerating deep learning operations
TWI777442B (en) Apparatus, method and system for transferring data
US11294599B1 (en) Registers for restricted memory
US20160093343A1 (en) Low power computation architecture
CN111656390B (en) Image transformation for machine learning
TW202145019A (en) Efficient hardware architecture for accelerating grouped convolutions
US20200218777A1 (en) Signal Processing Method and Apparatus
CN110399972B (en) Data processing method and device and electronic equipment
WO2022160703A1 (en) Pooling method, and chip, device and storage medium
JP2020042774A (en) Artificial intelligence inference computing device
US20170078670A1 (en) Analytics Assisted Encoding
JP7033507B2 (en) Neural network processor, neural network processing method, and program
CN116415100A (en) Service processing method, device, processor and computing equipment
US11868873B2 (en) Convolution operator system to perform concurrent convolution operations
CN114154623A (en) Data shift processing method, device and equipment based on convolutional neural network
US10997277B1 (en) Multinomial distribution on an integrated circuit
CN112204585A (en) Processor, electronic device and control method thereof
CN114286990B (en) Auxiliary AI processing in memory
WO2023132840A1 (en) Picture frame processing using machine learning
CN112256431B (en) Cost aggregation method and device, storage medium and terminal
US20240127589A1 (en) Hardware friendly multi-kernel convolution network

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21922298

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21922298

Country of ref document: EP

Kind code of ref document: A1