CN114330684A

CN114330684A - Hardware acceleration method, device and system of pooling algorithm and readable storage medium

Info

Publication number: CN114330684A
Application number: CN202111460867.1A
Authority: CN
Inventors: 王佳东; 蔡权雄; 牛昕宇
Original assignee: Shenzhen Corerain Technologies Co Ltd
Current assignee: Shenzhen Corerain Technologies Co Ltd
Priority date: 2021-12-01
Filing date: 2021-12-01
Publication date: 2022-04-12

Abstract

The invention discloses a hardware acceleration method, a device, a system and a readable storage medium of a pooling algorithm, wherein the method comprises the following steps: when a calculation instruction is detected, acquiring non-filling data in a calculation window through a data sorting module, and sending the non-filling data to a first calculation module; inputting first filling data to a first calculation module through a control module, and obtaining a first calculation result according to the non-filling data and the first filling data through the first calculation module; inputting the first calculation result into a second calculation module through a first calculation module, and inputting second filling data into the second calculation module through a control module; obtaining a second calculation result through a second calculation module according to the first calculation result and the second filling data; according to the invention, the control module inputs the first filling data to the first calculation module and inputs the second filling data to the second calculation module, so that the performance of the hardware accelerator is improved, and the calculation speed of the pooling algorithm is further improved.

Description

Hardware acceleration method, device and system of pooling algorithm and readable storage medium

Technical Field

The invention relates to the technical field of hardware design, in particular to a hardware acceleration method, a device and a system of a pooling algorithm and a readable storage medium.

Background

The pooling (pool) algorithm is a very common algorithm in neural network computing, which reduces the number of output feature vectors in a downsampling manner. Most of the existing hardware accelerators in the neural network are realized by continuously reading padding data from a memory, the padding data are stored in the memory to occupy a storage space, the padding data are read from the memory and sent to a calculation module, transmission time is increased, and the padding data need to be calculated independently when a pooling algorithm is used for calculation, which also needs a large amount of calculation time, so that the performance of the hardware accelerators is greatly reduced, and the calculation speed of the pooling algorithm is further reduced.

Therefore, how to improve the performance of the hardware accelerator to improve the computation speed of the pooling algorithm is an urgent problem to be solved.

Disclosure of Invention

The invention mainly aims to provide a hardware acceleration method, a device, a system and a readable storage medium of a pooling algorithm, aiming at solving the problem of how to improve the performance of a hardware accelerator to improve the calculation speed of the pooling algorithm.

In order to achieve the above object, the present invention provides a hardware acceleration method for a pooling algorithm, which comprises the following steps:

when a calculation instruction is detected, acquiring non-filling data in a calculation window through a data sorting module, and sending the non-filling data to a first calculation module;

inputting first filling data to the first calculation module through a control module, and obtaining a first calculation result according to the non-filling data and the first filling data through the first calculation module;

inputting the first calculation result into a second calculation module through the first calculation module, and inputting second filling data into the second calculation module through the control module;

and obtaining a second calculation result through the second calculation module according to the first calculation result and the second filling data.

Preferably, when a calculation instruction is detected, the step of acquiring non-padding data in the calculation window by the data sorting module includes:

when a calculation instruction is detected, acquiring a corresponding target picture in the calculation instruction, and framing the target picture through a calculation window according to a preset rule;

and identifying and acquiring the non-filling data of the part framed and selected by the calculation window in the target picture through a data sorting module, and respectively sending the non-filling data positioned in the same column in the calculation window to a first calculation module.

Preferably, the step of inputting first padding data to the first calculation module by the control module comprises:

determining a first serial port to be enabled corresponding to the first calculation module according to the number of filling data lines above or below the target picture, the height of the target picture, the column step length accumulated value corresponding to the calculation window and the window height corresponding to the calculation window;

enabling the first serial port to be enabled through the control module so as to input first filling data to the first calculation module.

Preferably, the step of obtaining, by the first calculation module, a first calculation result according to the non-padding data and the first padding data includes:

inputting, by the first computation module, the non-padding data and the first padding data located in each column in the computation window into an adder or a comparator to obtain a first computation result.

Preferably, the step of inputting second padding data to the second calculation module by the control module includes:

determining a second serial port to be enabled corresponding to the second computing module according to the number of columns of the filling data on the left or right of the target picture, the width of the target picture, the row step length accumulated value corresponding to the computing window and the window width corresponding to the computing window;

and according to a first preset clock period, enabling the second serial port to be enabled through the control module so as to input second filling data to the second computing module.

Preferably, the step of obtaining, by the second calculation module, a second calculation result according to the first calculation result and the second padding data includes:

and storing the first calculation result and the second filling data in a register through the second calculation module, and inputting the first calculation result and the second filling data into an adder and a divider according to a second preset clock cycle, or inputting the first calculation result and the second filling data into a comparator to obtain a second calculation result.

Preferably, after the step of obtaining, by the second computing module, a second computing result according to the first computing result and the second padding data, the hardware acceleration method of the pooling algorithm further includes:

controlling the calculation window to move according to a preset moving rule according to the row step length or the column step length corresponding to the calculation window, and executing the following steps: and acquiring non-filling data in the calculation window through a data sorting module to obtain a target characteristic diagram.

In addition, to achieve the above object, the present invention further provides a hardware acceleration apparatus for a pooling algorithm, including:

the acquisition module is used for acquiring non-filling data in a calculation window through the data sorting module when a calculation instruction is detected, and sending the non-filling data to the first calculation module;

the first input module is used for inputting first filling data to the first calculation module through the control module and obtaining a first calculation result according to the non-filling data and the first filling data through the first calculation module;

the second input module is used for inputting the first calculation result into the second calculation module through the first calculation module and inputting second filling data into the second calculation module through the control module;

and the calculation module is used for obtaining a second calculation result according to the first calculation result and the second filling data through the second calculation module.

Preferably, the obtaining module is further configured to:

Preferably, the first input module is further configured to:

Preferably, the first input module further comprises a calculation module, the calculation module is configured to:

Preferably, the second input module is further configured to:

Preferably, the calculation module is further configured to:

In addition, to achieve the above object, the present invention further provides a hardware acceleration system for a pooling algorithm, including: a memory, a processor and a hardware acceleration program of a pooling algorithm stored on the memory and executable on the processor, the hardware acceleration program of the pooling algorithm when executed by the processor implementing the steps of the hardware acceleration method of a pooling algorithm as described above.

In addition, to achieve the above object, the present invention further provides a readable storage medium, which is a computer readable storage medium, on which a hardware acceleration program of a pooling algorithm is stored, and the hardware acceleration program of the pooling algorithm, when executed by a processor, implements the steps of the hardware acceleration method of the pooling algorithm as described above.

According to the hardware acceleration method of the pooling algorithm, when a calculation instruction is detected, non-filling data in a calculation window is obtained through a data sorting module, and the non-filling data is sent to a first calculation module; inputting first filling data to a first calculation module through a control module, and obtaining a first calculation result according to the non-filling data and the first filling data through the first calculation module; inputting the first calculation result into a second calculation module through a first calculation module, and inputting second filling data into the second calculation module through a control module; obtaining a second calculation result through a second calculation module according to the first calculation result and the second filling data; according to the invention, the control module inputs the first filling data to the first calculation module and inputs the second filling data to the second calculation module, so that the performance of the hardware accelerator is improved, and the calculation speed of the pooling algorithm is further improved.

Drawings

FIG. 1 is a system diagram of a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a hardware acceleration method of a pooling algorithm according to a first embodiment of the present invention;

FIG. 3 is a schematic view of a target picture according to the present invention;

FIG. 4 is a schematic diagram showing the connection of modules in the hardware accelerator according to the present invention;

FIG. 5 is a schematic diagram of the connection between the first computing module and the control module according to the present invention;

FIG. 6 is a schematic diagram of the connection between the second computing module and the control module according to the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further described with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, fig. 1 is a system structural diagram of a hardware operating environment according to an embodiment of the present invention.

The system of the embodiment of the invention can be a PC or a server system.

As shown in fig. 1, the system may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.

Those skilled in the art will appreciate that the system architecture shown in FIG. 1 is not intended to be limiting of the system, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

As shown in fig. 1, a memory 1005, which is a type of computer storage medium, may include an operating system, a network communication module, a user interface module, and a hardware acceleration program of a pooling algorithm.

The operating system is a program for managing and controlling a hardware acceleration tool and software resources of the portable pooling algorithm, and supports the operation of a network communication module, a user interface module, a hardware acceleration program of the pooling algorithm and other programs or software; the network communication module is used for managing and controlling the network interface 1002; the user interface module is used to manage and control the user interface 1003.

In the hardware acceleration tool of the pooling algorithm shown in fig. 1, the hardware acceleration tool of the pooling algorithm calls a hardware acceleration program of the pooling algorithm stored in the memory 1005 through the processor 1001 and performs operations in various embodiments of the hardware acceleration method of the pooling algorithm described below.

Based on the hardware structure, the embodiment of the hardware acceleration method of the pooling algorithm is provided.

Referring to fig. 2, fig. 2 is a schematic flow chart of a first embodiment of a hardware acceleration method of the pooling algorithm of the present invention, the method comprising:

step S10, when a calculation instruction is detected, acquiring non-filling data in a calculation window through a data sorting module, and sending the non-filling data to a first calculation module;

step S20, inputting first filling data to the first calculation module through a control module, and obtaining a first calculation result according to the non-filling data and the first filling data through the first calculation module;

step S30, inputting the first calculation result into a second calculation module through the first calculation module, and inputting second padding data into the second calculation module through the control module;

and step S40, obtaining a second calculation result according to the first calculation result and the second filling data through the second calculation module.

The hardware acceleration method of the pooling algorithm of the embodiment is applied to a hardware accelerator of the pooling algorithm of the image processing mechanism, the hardware accelerator of the pooling algorithm can be applied to a terminal or a PC device to improve the calculation speed of the terminal or the PC device for running the pooling algorithm, and for convenience of description, the hardware accelerator is taken as an example for description, and the hardware accelerator comprises but is not limited to a control module, a data sorting module, a first calculation module and a second calculation module. When detecting a calculation instruction, the hardware accelerator acquires a corresponding target picture in the calculation instruction, and selects the target picture through a calculation window frame according to a preset rule; identifying and acquiring non-filling data of a calculated window frame selection part in a target picture through a data sorting module, and sending the non-filling data to a first calculation module; the hardware accelerator inputs first filling data to the first calculation module through the control module, and a first calculation result is obtained through the first calculation module according to the non-filling data and the first filling data; the hardware accelerator inputs the first calculation result into the second calculation module through the first calculation module, and inputs second filling data into the second calculation module through the control module; and the hardware accelerator obtains a second calculation result according to the first calculation result and the second filling data through the second calculation module.

It should be noted that, a target picture obtained by a hardware accelerator is shown in fig. 3, where "P" represents padding data (padding value) in the target picture, and other numbers represent non-padding data in the target picture, where the non-padding data is data carried by the target picture itself in the target picture, it is understood that fig. 3 is merely used for illustration and is not a limitation on the target picture, and the target picture may also insert padding data at other positions or insert more padding data according to actual situations; as shown in fig. 4, fig. 4 is a schematic connection diagram of each module in the hardware accelerator, where the control module is responsible for controlling the data sorting module, the first computing module and the second computing module, the data sorting module inputs the corresponding number into the first computing module, and the first computing module inputs the corresponding data into the second computing module, so as to obtain a result of computing the target picture by using the pooling algorithm.

In the hardware acceleration method of the pooling algorithm of the embodiment, when a calculation instruction is detected, non-filling data in a calculation window is acquired through a data sorting module, and the non-filling data is sent to a first calculation module; inputting first filling data to a first calculation module through a control module, and obtaining a first calculation result according to the non-filling data and the first filling data through the first calculation module; inputting the first calculation result into a second calculation module through a first calculation module, and inputting second filling data into the second calculation module through a control module; obtaining a second calculation result through a second calculation module according to the first calculation result and the second filling data; according to the invention, the control module inputs the first filling data to the first calculation module and inputs the second filling data to the second calculation module, so that the performance of the hardware accelerator is improved, and the calculation speed of the pooling algorithm is further improved.

The respective steps will be described in detail below:

in this embodiment, when the hardware accelerator detects a calculation instruction, it obtains a corresponding target picture in the calculation instruction, and selects the target picture through a calculation window frame according to a preset rule, and then controls the data sorting module to identify and obtain non-padding data of the selected portion of the target picture through the control module, and sends the non-padding data to the first calculation module, it can be understood that, as shown in fig. 3, a black frame at the upper left corner in fig. 3 refers to the calculation window, the calculation window includes padding data "P" and non-padding data "0, 1,8, and 9", the data sorting module identifies and obtains non-padding data "0, 1,8, and 9" in the calculation window, and inputs the non-padding data into the first calculation module.

Specifically, when a calculation instruction is detected, the step of acquiring non-padding data in the calculation window through the data sorting module includes:

step a, when a calculation instruction is detected, acquiring a corresponding target picture in the calculation instruction, and selecting the target picture through a calculation window frame according to a preset rule;

in the step, when detecting a calculation instruction, the hardware accelerator acquires a corresponding target picture in the calculation instruction, and selects the target picture through a calculation window frame according to a preset rule; such as: related research and development personnel set the height and width of a calculation window to be 3, a preset rule is that the target picture is selected from the upper left corner of the target picture through the calculation window, then 1 is used as a row step length and a column step length, the calculation window moves in the row direction of the target picture according to the row step length, the calculation window moves in the column direction of the target picture according to the column step length, and a hardware accelerator selects the target picture through the calculation window frame according to the preset rule. The row step is a step of translating the calculation window in the row direction of the target picture, i.e., in the width direction, and the column step is a step of translating the calculation window in the column direction of the target picture, i.e., in the height direction.

And b, identifying and acquiring the non-filling data of the part framed and selected by the calculation window in the target picture through a data sorting module, and respectively sending the non-filling data positioned in the same column in the calculation window to a first calculation module.

In this step, the hardware accelerator controls the data sorting module to identify and obtain non-padding data of the frame selection part of the calculated window in the target picture through the control module, and packs the non-padding data located in the same column in the calculation window, and respectively sends the packed non-padding data to the first calculation module, for example: referring to fig. 3, when the calculation window is located at the position of the black frame at the upper left corner in fig. 3, the frame in the calculation window selects padding data "P" and non-padding data "0, 1,8, 9" in the target picture, where the non-padding data "0 and 8" are located in the same column, the non-padding data "1 and 9" are located in another column, the control module controls the data sorting module to pack the non-padding data "0 and 8" into a set [8,0], pack the non-padding data "1 and 9" into [9,1], and then respectively send [8,0] and [9,1] to the first calculation module; further, when the calculation window is moved to the position of the gray frame in fig. 3, at this time, the first row in the calculation frame is all filled data "P", the second row and the third row are all non-filled data "0, 1, 2, 8,9, 10", wherein the non-filled data "0 and 8" are located in the same column, the non-filled data "1 and 9" are located in the same column, and the non-filled data "2 and 10" are located in the same column, since the non-filled data "0 and 8", "1 and 9" have been stored in the register of the second calculation module in the last calculation process, the control module controls the data sorting module to pack the non-filled data "2 and 10" into [2,10], and then send [2,10] to the first calculation module; the situation when the calculation window is located at other positions on the target picture is similar to the above situation, and is not repeated here.

in this embodiment, the hardware accelerator controls the first computing module to input the first padding data to the first computing module through the control module and obtains a first computing result according to the non-padding data and the first padding data through the first computing module; such as: referring to fig. 5, fig. 5 is a schematic diagram of a connection between a first computing module and a control module, where PH represents first padding data, pub _ kh0_ en, pub _ kh1_ en, and pub _ kh2_ en represent three to-be-enabled serial ports in the first computing module, the three to-be-enabled serial ports are connected to the control module, and "+/M" represents an adder or a comparator, and in an initial stage, pub _ kh0_ en ═ pub _ kh1_ en ═ pub _ kh2_ en ═ 0, that is, all the three to-be-enabled serial ports are at a low level, when the hardware accelerator recognizes that the first computing module needs to perform computation by using the first padding data, the hardware accelerator controls, by the control module, the corresponding to enable the corresponding to-be-enabled serial ports, so that pub _ kh0_ en, pub _ kh _ 1_ en, or pub _ kh2_ en is equal to 1, that is, the first padding data is input to the first computing module, the first calculation module obtains a first calculation result according to the non-filling data and the first filling data; it should be noted that PH corresponds to the padding data located above and below the target picture, i.e. the "P" values of the first and last lines shown in fig. 3, for example; the number of serial ports to be enabled in the first computing module may be modified correspondingly according to the width of the computing window, for example, when the size of the computing window is 5 rows and 5 columns, the number of serial ports to be enabled in the first computing module should be 5. It should be noted that pub _ kh0_ en, pub _ kh1_ en, and pub _ kh2_ en in the first computation module respectively represent a first row, a second row, and a third row in the computation window, and when which serial port is enabled, the first padding data is input to the corresponding row.

Specifically, the step of inputting the first filling data to the first calculation module through the control module includes:

c, determining a first serial port to be enabled corresponding to the first calculation module according to the number of filling data lines above or below the target picture, the height of the target picture, the column step length accumulated value corresponding to the calculation window and the window height corresponding to the calculation window;

in this step, the hardware accelerator obtains the number of rows of the filling data above the target picture, the height of the target picture, the column step length accumulated value corresponding to the calculation window, and the window height corresponding to the calculation window, and determines a first serial port to be enabled corresponding to the first calculation module according to the number of rows of the filling data above the target picture, the height of the target picture, the column step length accumulated value corresponding to the calculation window, and the window height corresponding to the calculation window, where the first serial port to be enabled may include one or more serial ports to be enabled in the first calculation module.

When judging whether the calculation window contains filling data above the target picture, assuming that PU represents the number of rows of the filling data above the target picture, H represents the height of the target picture, H _ CNT represents the row step accumulated value corresponding to the calculation window, and KH represents the window height corresponding to the calculation window; when determining that PU > H _ CNT is not established, determining that the calculation window does not include padding data above the target picture, and then determining that the first serial port to be enabled is not required, for example, when the calculation window is located at a position where the black frame moves by one row step 1 in the height direction of the target picture in fig. 3, the numerical values in the calculation window are respectively a first row "P, 0, 1", a second row "P, 8, 9", and a third column "P, 16, 17", and at this time, the calculation window does not include padding data above the target picture, and therefore, it is not required to determine the first serial port to be enabled; if PU > H _ CNT is determined to be true, determining that the calculation window contains padding data above the target picture, and further calculating a difference between PU and H _ CNT, where the difference is the number of rows of padding data above the calculation window, for example, when the calculation window is located at a position where a black frame is located as shown in fig. 3, where H _ CNT is 0, PU is 1, PU > H _ CNT is true, the difference between PU and H _ CNT is 1, and the corresponding 1 is greater than 0, so that pub _ kh0_ en is used as the first serial port to be enabled, and for example, when the calculation window is located at a position where the black frame is moved by one column 1 in the width direction of the target picture in fig. 3, the values in the calculation window are respectively "P, P, P" in the first row, "0, 1, 2" in the second row, "and" 8,9, 10 "in the third column," when the calculation window contains padding data above the target picture, and the difference between PU and H _ CNT is 1, corresponding to 1 is greater than 0, so pub _ kh0_ en is used as the first serial port to be enabled, and for example, assuming that the first row and the second row of the calculation window are all the padding data, the difference between PU and H _ CNT is 2, and corresponding to 2 is greater than 0 and 1, so pub _ kh0_ en and pub _ kh1_ en are used as the first serial port to be enabled. It should be noted that the above-mentioned manner for determining the first serial port to be enabled is only used when the calculation window moves in the column direction of the target picture.

When judging whether the calculation window includes padding data below the target picture, when the calculation window frame selects the lower left corner region of the target picture as shown in fig. 3, the first behaviors "P, 48, 49", the second behaviors "P, 56, 57", and the third behavior "P, P, P", where H _ CNT + KH > PU + H are established, it is determined that the filling data below the target picture is included in the calculation window, a difference between H _ CNT + KH and PU + H is calculated, and the difference is the number of rows of the padding data below the calculation window, where the difference is 1, the window height KH is 3, and correspondingly, x is obtained according to the difference + x > -KH, and it is determined that pub _ KH2_ en is used as the first serial port to be enabled; when the data of the calculation window frame is the first behavior "P, 40, 41", the second behavior "P, 48, 49", and the third behavior "P, 56, 57", at this time, the calculation window does not include padding data below the target picture, and then H _ CNT + KH > PU + H is not established, it is not necessary to determine the first serial port to be enabled, and it should be noted that the above-mentioned manner of determining the first serial port to be enabled is only used when the calculation window moves in the column direction, i.e., the height direction, of the target picture.

And d, enabling the first serial port to be enabled through the control module so as to input first filling data to the first calculation module.

In the step, after determining a first serial port to be enabled, the hardware accelerator performs enabling operation on the first serial port to be enabled through the control module so as to input first filling data to the first computing module; such as: as shown in fig. 3, when the calculation window includes padding data above the target picture, the hardware accelerator determines pub _ kh0_ en as a first serial port to be enabled, and the hardware accelerator enables pub _ kh0_ en through the control module, so that pub _ kh0_ en is 1, that is, is located at a high level, and inputs the first padding data to the first calculation module; when the calculation window contains the filling data below the target picture, the hardware accelerator determines pub _ kh2_ en as a first serial port to be enabled, enables pub _ kh2_ en through the control module, enables pub _ kh2_ en to be 1, namely, is located at a high level, and inputs the first filling data to the first calculation module. It should be noted that the calculation window cannot include the fill data above and the fill data below the target picture at the same time; generally speaking, values corresponding to the filling data above and below the target picture are the same, so that the first filling data are recorded in the hardware accelerator when the hardware accelerator acquires the target picture, and the first filling data are input into the first computing module when the corresponding serial port is enabled, so that the first filling data do not need to be stored in the memory, the time for reading the first filling data from the memory is saved, and the performance of the hardware accelerator is improved.

Specifically, the step of obtaining a first calculation result according to the non-padding data and the first padding data by the first calculation module includes:

and e, inputting the non-padding data and the first padding data positioned in each column in the calculation window into an adder or a comparator through the first calculation module to obtain a first calculation result.

In the step, the hardware accelerator controls the first calculation module to input the non-stuffing data and the first stuffing data positioned in each column in the calculation window into an adder or a comparator through the control module to obtain a first calculation result; such as: referring to the target picture of fig. 3, assuming that P is 1 in fig. 3, when the calculation window is located at the position of the black frame, the data sorting module packs non-padding data "0, 8" located in the second column and non-padding data "1, 9" located in the third column to obtain [8,0] and [9,1], respectively, and inputs [8,0] and [9,1] into the first calculation module, the control module controls pub _ kh0_ en of the first calculation window to 1, inputs first padding data PH is 1 into the first calculation module, when the first calculation module obtains [8,0,1] and [9,1,1], when the pooling algorithm is MAX POOL (the maximum value of the calculation window), the first calculation module inputs [8,0,1] and [9,1,1] into the comparator, and the comparator counts the maximum values in [8,0,1] and [9,1,1] respectively, and obtaining a first calculation result of [8,9], and when the pooling algorithm is Average POOL (Average value of calculation window), inputting [8,0,1] and [9,1,1] into an adder by the first calculation module, and adding the numerical values in [8,0,1] and [9,1,1] by the adder respectively to obtain a calculation result of [9,11 ]. The first padding data is the padding data above and below the target picture, that is, the data indicated by "P" above and below the target picture, and the first padding data PH in the first calculation module is equal to P.

in the step, the hardware accelerator inputs a first calculation result into the second calculation module through the first calculation module, and the hardware accelerator inputs second filling data into the second calculation module through the control module by judging whether the calculation window contains the filling data on the left or right of the target picture or not if the calculation window contains the filling data on the left or right of the target picture; such as: referring to fig. 6, fig. 6 is a schematic diagram of a connection between a second computing module and a control module, where PW represents second padding data, pub _ kw0_ en, pub _ kw1_ en, and pub _ kw2_ en represent three serial ports to be enabled in the second computing module, the three serial ports to be enabled are connected to the control module, Reg represents a register, and "+/M" represents an adder or a comparator, and "DIV" represents a divider, and in an initial stage, pub _ kw0_ en ═ pub _ kw1_ en ═ pub _ kw2_ en ═ 0, that is, all of the three serial ports to be enabled are at a low level, when the hardware accelerator recognizes that the second computing module needs to compute using the second padding data, the hardware accelerator controls the corresponding serial ports to be enabled to enable through the control module, so that one or more serial ports of pub _ kw0_ en, pub _ kw1_ en, or pub _ kw2 is equal to a high level, inputting 'PW' to a second calculation module, namely second filling data; it should be noted that, as shown in fig. 3, the "P" values in the first column and the last column correspond to the padding data located on the left and right sides of the target picture, respectively, and when the pooling algorithm is MAX POOL, the second padding data PW is P, and when the pooling algorithm is Average POOL, the second padding data PW is 3P; the number of serial ports to be enabled in the second computing module may be modified correspondingly according to the width of the computing window, for example, when the size of the computing window is 5 rows and 5 columns, the number of serial ports to be enabled in the second computing module should be 5. It should be noted that pub _ kw0_ en, pub _ kw1_ en, and pub _ kw2_ en in the second computation module respectively represent the first column, the second column, and the third column in the computation window, and when which serial port is enabled, the second padding data is input to the corresponding row.

Specifically, the step of inputting the second filling data to the second calculation module through the control module includes:

step f, determining a second serial port to be enabled corresponding to the second computing module according to the number of columns of the filling data on the left or right of the target picture, the width of the target picture, the row step length accumulated value corresponding to the computing window and the window width corresponding to the computing window;

in the step, the hardware accelerator determines a second serial port to be enabled corresponding to a second calculation module according to the number of columns of the filled data on the left of the target picture, the width of the target picture, the row step length accumulated value corresponding to the calculation window and the window width corresponding to the calculation window; wherein the second to-be-enabled serial port may include one or more of the to-be-enabled serial ports in the second computing module.

When judging whether the calculation window contains the filling data on the left side of the target picture, if PL is used for representing the number of rows of the filling data on the left side of the target picture, W is used for representing the width of the target picture, W _ CNT is used for representing the row step accumulated value corresponding to the calculation window, and KW is used for representing the window width corresponding to the calculation window, for example, when the calculation window is located at the position where a black frame moves to the width direction of the target picture by one column step 1 in FIG. 3, the numerical values in the calculation window are respectively the first row "P, P, P", the second rows "0, 1, 2" and the third columns "8, 9, 10", and if PL > W _ CNT is determined not to be established, then the calculation window can be determined not contain the filling data on the left side of the target picture, and the first serial port to be enabled does not need to be determined; for example, when the calculation window is located at the position of the black frame in fig. 3 shifted by one row step 1 in the height direction of the target picture, the values in the calculation window are the first row "P, 0, 1", the second row "P, 8, 9" and the third column "P, 16, 17", and it is determined that PU > H _ CNT is true, it is determined that the calculation window contains padding data on the left side of the target picture, and further, the difference between PL and W _ CNT is calculated, which is the column number of the padding data on the left side of the calculation window, and the column number of the padding data on the left side of the target picture in the calculation window is 1, the difference between PL and W _ CNT is 1, and the corresponding 1 is greater than 0, so pub _ kw0_ en is used as the first serial port to be enabled, it is assumed that the first column and the second column of the calculation window are all padding data, and the difference between PL and W _ CNT is 2, and the corresponding 2 is greater than 0 and 1, therefore, pub _ kh0_ en and pub _ kh1_ en are used as second serial ports to be enabled. It should be noted that the above-mentioned manner for determining the second serial port to be enabled is only used when the calculation window moves in the row direction of the target picture, that is, in the width direction of the target picture.

When judging whether the computing window contains filling data on the right side of the target picture; when the calculation window frame selects the upper right corner area of the target picture as shown in fig. 3, the first behavior "P, P, P" of the calculation window, the second behavior "6, 7, P", and the third behavior "14, 15, P", where W _ CNT + KW > PL + W, is established, it is determined that the number of columns in the calculation window containing the padding data on the right side of the target picture is 1, a difference between W _ CNT + KW and PL + W is calculated, and the difference is the number of rows of padding data on the right side of the calculation window, and at this time, the difference is 1, the window width KW is 3, and accordingly, according to the difference + x > ═ KW, x equals 2, it is determined that pub _ kh2_ en is used as the first serial port to be enabled; for example, when the data selected by the calculation window is the first behavior "P, P, P", the second behavior "5, 6, 7", and the third behavior "13, 14, 15", and the calculation window does not include padding data on the right side of the target picture, W _ CNT + KW > PL + W is not established, the first serial port to be enabled does not need to be determined.

And g, enabling the second serial port to be enabled through the control module according to a first preset clock cycle so as to input second filling data to the second computing module.

In this step, the hardware accelerator performs, according to a first preset clock cycle, an enabling operation on the second serial port to be enabled through the control module, so as to input second padding data to the second computing module, where: as shown in the target picture in fig. 3, when the calculation window includes the padding data on the left side of the target picture, the hardware accelerator determines pub _ kw0_ en as the second serial port to be enabled, and the hardware accelerator enables pub _ kw0_ en through the control module according to the first preset clock period, so that pub _ kw0_ en is 1, that is, is located at a high level, and inputs the second padding data to the second calculation module; when the calculation window contains the filling data on the right side of the target picture, the hardware accelerator determines pub _ kw2_ en as the second serial port to be enabled, the hardware accelerator enables pub _ kw2_ en through the control module, so that pub _ kw2_ en is 1, namely, is located at a high level, and the second filling data is input to the second calculation module. It should be noted that the calculation window cannot include the left padding data and the right padding data of the target picture at the same time; generally, the values corresponding to the left padding data and the right padding data of the target picture are the same, so that the second padding data are recorded in the hardware accelerator when the hardware accelerator acquires the target picture, and the second padding data are input into the second computing module when the corresponding serial port is enabled, so that the second padding data do not need to be stored in the memory, the time for reading the second padding data from the memory is saved, and the performance of the hardware accelerator is improved.

In this embodiment, the hardware accelerator controls the second calculation module to obtain a second calculation result according to the first calculation result and the second padding data through the control module. Such as: referring to fig. 6, fig. 6 is a schematic diagram of a connection between a second computing module and a control module, where PW represents second padding data, pub _ kw0_ en, pub _ kw1_ en, and pub _ kw2_ en represent three serial ports to be enabled in the second computing module, the three serial ports to be enabled are connected to the control module, Reg represents a register, and "+/M" represents an adder or a comparator, and "DIV" represents a divider, and in an initial stage, pub _ kw0_ en ═ pub _ kw1_ en ═ pub _ kw2_ en ═ 0, that is, all of the three serial ports to be enabled are at a low level, when the hardware accelerator recognizes that the second computing module needs to compute using the second padding data, the hardware accelerator controls the corresponding serial ports to be enabled to enable through the control module, so that one or more serial ports of pub _ kw0_ en, pub _ kw1_ en, or pub _ kw2 is equal to a high level, and inputting 'PW' to a second calculation module, namely second filling data, and inputting the first calculation result stored in the register and the second filling data input by the control module into an adder and a divider or inputting the first calculation result and the second filling data into a comparator according to a specific pooling algorithm by the second calculation module to obtain a second calculation result.

Specifically, step S40 includes:

In the step, the hardware accelerator controls the second calculation module to store the first calculation result and the second filling data in the register through the control module, and inputs the first calculation result and the second filling data into the adder and the divider according to a second preset clock cycle, or inputs the first calculation result and the second filling data into the comparator to obtain a second calculation result; such as: referring to the target picture of fig. 3, assuming that P in fig. 3 is 1, when the calculation window is located at the position of the black frame, it is determined that the calculation window contains padding data to the left of the target picture, therefore pub _ kw0_ en needs to be enabled, when the pooling algorithm is MAX POOL (maximum value of the calculation window), a first calculation result is [8,9], the first calculation module inputs the first calculation result [8,9] into the second calculation module, the second calculation module stores "8" in the first calculation result in the register Reg1, stores "9" in the first calculation result in the register Reg2, stores second filling data PW ═ P ═ 1 which enables the control module to input pub _ kw0_ en in the register Reg0, and inputs the first calculation result and the second filling data into the comparator according to a second preset clock cycle, namely [1,8,9] is sent to the comparator to obtain a second calculation result 9; when the pooling algorithm is Average POOL (Average value of calculation window), the first calculation module obtains a calculation result of [9,11], the first calculation module inputs the first calculation result of [9,11] into the second calculation module, the second calculation module stores "9" in the first calculation result in the register Reg1, stores "11" in the first calculation result in the register Reg2, stores second filling data PW 3P 3 in the control module, which enables input of pub _ kw0_ en, in the register Reg0, and inputs the first calculation result and the second filling data into the adder and the divider according to a second preset clock period, that is, the [3,9,11] is input into the adder and the divider, so as to obtain a second calculation result: (11+9+ 3)/3-7.66. It should be noted that the second padding data is padding data on the left and right sides of the target picture, i.e. data indicated by "P" on the left and right sides of the target picture, and is indicated by PW in the second calculation module.

When detecting a calculation instruction, the hardware accelerator of the pooling algorithm of the embodiment acquires a corresponding target picture in the calculation instruction, and frames the target picture through a calculation window according to a preset rule; identifying and acquiring non-filling data of a calculated window frame selection part in a target picture through a data sorting module, and sending the non-filling data to a first calculation module; the hardware accelerator inputs first filling data to the first calculation module through the control module, and a first calculation result is obtained through the first calculation module according to the non-filling data and the first filling data; the hardware accelerator inputs the first calculation result into the second calculation module through the first calculation module, and inputs second filling data into the second calculation module through the control module; the hardware accelerator obtains a second calculation result through the second calculation module according to the first calculation result and the second filling data, so that the performance of the hardware accelerator is improved, and the calculation speed of the pooling algorithm is further improved.

Further, based on the first embodiment of the hardware acceleration method of the pooling algorithm of the present invention, a second embodiment of the hardware acceleration method of the pooling algorithm of the present invention is proposed.

The second embodiment of the hardware acceleration method of the pooling algorithm differs from the first embodiment of the hardware acceleration method of the pooling algorithm in that the step S40 is followed by the steps of:

step h, controlling the calculation window to move according to a preset moving rule according to the row step length or the column step length corresponding to the calculation window, and executing the steps of: and acquiring non-filling data in the calculation window through a data sorting module to obtain a target characteristic diagram.

In this embodiment, the hardware accelerator controls the calculation window to start from the position of the black frame shown in fig. 3 according to the column step corresponding to the calculation window, execute the step of acquiring non-padding data in the calculation window by the data sorting module and the subsequent step, and then control the calculation window to move in the row direction of the target picture, that is, the width direction, and execute the step of acquiring non-padding data in the calculation window by the data sorting module and the subsequent step once every time the calculation window moves, when the calculation window moves to the rightmost side of the target picture, that is, the calculation window includes: the first row "P, P, P", the second row "6, 7, P", and the third row "14, 15, P", and after the non-padding data in the calculation window is obtained by the data sorting module and the calculation is completed in the subsequent steps, the calculation window is controlled to return to the position of the black frame shown in fig. 3, and the calculation module is controlled to move downward according to the column step corresponding to the calculation window, so that the calculation window includes: and controlling the calculation window to move in the first row of P, 0 and 1, the second row of P, 8 and 9, the third row of P, 16 and 17 and so on until the whole picture is calculated, and obtaining the target characteristic graph corresponding to the target picture.

The hardware accelerator of the pooling algorithm of this embodiment controls the calculation window to move according to a preset moving rule according to the row step length or the column step length corresponding to the calculation window, and executes the following steps: the non-filling data in the calculation window is obtained through the data sorting module to obtain the target characteristic diagram, so that the performance of the hardware accelerator is improved, and the calculation speed of the pooling algorithm is improved.

The invention also provides a hardware acceleration device of the pooling algorithm. The hardware accelerator of the pooling algorithm of the invention comprises:

the receiving module is used for generating a corresponding configuration file set according to the IP file set when the IP file set is received;

Preferably, the obtaining module is further configured to:

Preferably, the first input module is further configured to:

Preferably, the second input module is further configured to:

Preferably, the calculation module is further configured to:

The invention also provides a hardware acceleration system of the pooling algorithm.

The hardware acceleration system of the pooling algorithm of the invention comprises: a hardware accelerator, a memory, a processor and a hardware acceleration program of a pooling algorithm stored on the memory and executable on the processor, the hardware acceleration program of the pooling algorithm when executed by the processor implementing the steps of the hardware acceleration method of the pooling algorithm as described above.

The method implemented when the hardware acceleration program of the pooling algorithm running on the processor is executed may refer to each embodiment of the hardware acceleration method of the pooling algorithm of the present invention, and is not described herein again.

The invention also provides a readable storage medium.

The readable storage medium is a computer readable storage medium, and the computer readable storage medium of the present invention stores thereon a hardware acceleration program of a pooling algorithm, which when executed by a processor implements the steps of the hardware acceleration method of a pooling algorithm as described above.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention essentially or contributing to the prior art can be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal system (which may be a mobile phone, a computer, a server, or a network system) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A hardware acceleration method of a pooling algorithm is characterized by comprising the following steps:

2. The hardware acceleration method of a pooling algorithm of claim 1 wherein said step of obtaining non-padding data in a calculation window by a data grooming module upon detection of a calculation instruction and sending said non-padding data to a first calculation module comprises:

3. The hardware acceleration method of a pooling algorithm of claim 2 wherein said step of inputting a first fill data to said first computing module by a control module comprises:

4. The hardware acceleration method of a pooling algorithm of claim 1 wherein said step of deriving a first calculation result from said non-padding data and said first padding data by said first calculation module comprises:

5. A hardware acceleration method of a pooling algorithm as recited in claim 2 wherein said step of inputting second fill data to said second calculation module by said control module comprises:

6. The hardware acceleration method of a pooling algorithm of claim 1 wherein said step of deriving a second calculation result from said first calculation result and said second fill data by said second calculation module comprises:

7. The hardware acceleration method of a pooling algorithm of claim 1 wherein after said step of deriving a second calculation result from said first calculation result and said second padding data by said second calculation module, said hardware acceleration method of a pooling algorithm further comprises:

8. A hardware acceleration apparatus for a pooling algorithm, the hardware acceleration apparatus for the pooling algorithm comprising:

9. A hardware acceleration system for a pooling algorithm, the hardware acceleration system for the pooling algorithm comprising: a hardware accelerator, a memory, a processor and a hardware acceleration program of a pooling algorithm stored on the memory and executable on the processor, the hardware acceleration program of the pooling algorithm when executed by the processor implementing the steps of the hardware acceleration method of the pooling algorithm as claimed in any one of claims 1 to 7.

10. A readable storage medium, characterized in that the readable storage medium is a computer readable storage medium having stored thereon a hardware acceleration program of a pooling algorithm, which when executed by a processor implements the steps of the hardware acceleration method of a pooling algorithm according to any of the claims 1 to 7.