CN110399977A

CN110399977A - Pond arithmetic unit

Info

Publication number: CN110399977A
Application number: CN201810377097.6A
Authority: CN
Inventors: 梁晓峣; 景乃锋; 崔晓松; 陈云
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2018-04-25
Filing date: 2018-04-25
Publication date: 2019-11-01
Also published as: WO2019206161A1

Abstract

The application provides a kind of pond arithmetic unit, which includes: multiple register groups, for storing multiple data；Multiple computing units, for executing pondization operation to multiple data, wherein the data of different computing unit operations are located in the different registers group in multiple register group；The first computing unit in multiple computing unit is used for: to the first data and the first pond operation of progress of the second data in multiple data, obtaining the first operation result；Store first operation result；Third data are obtained from the first register group in multiple register；Second pond operation is carried out to first operation result and the third data.The application may be implemented parallel pond, to can store results of intermediate calculations in computing unit therein, can improve data read-write efficiency, pond efficiency can be improved.

Description

Pond arithmetic unit

Technical field

This application involves field of neural networks, and in particular, to a kind of pond arithmetic unit.

Background technique

Convolutional neural networks are usually applied to image recognition.Convolutional neural networks generally comprise convolutional layer, pond layer and complete Articulamentum.Important tool of the pond layer as dimensionality reduction, after being typically disposed in convolutional layer.The operation of pond layer is known as pond operation.Pond The process for changing operation is that the window sliding of a fixed size is crossed whole image plane, at each moment to covering in window Data carry out operation, such as maximizing or average as output.This window is properly termed as pond window, usual pond The size for changing window is k1*k2, wherein k1 and k2 is respectively the integer for being not less than 2.Pondization includes maximum pond and average pond Change.

Currently, some prior arts do pond operation using general image processor, specifically, controlled using universal command Realize pondization operation.But the drawbacks of this scheme, is, each operation result needs to write back register file, and next operation needs It needs to read out from register file again when wanting, leads to repetitive read-write, to reduce data read-write efficiency.

Summary of the invention

The application provides a kind of pond arithmetic unit, can avoid repetitive read-write, to a certain extent so as to effective Improve data read-write efficiency.

In a first aspect, providing a kind of pond arithmetic unit, which includes: multiple register groups, multiple for storing Data；Multiple computing units, for executing pondization operation to multiple data, wherein the data of different computing unit operations In the different registers group in multiple register group；The first computing unit in multiple computing unit is used for: to this The first data and the second data in multiple data carry out the first pond operation, obtain the first operation result；Store first fortune Calculate result；Third data are obtained from the first register group in multiple register；To first operation result and the third Data carry out the second pond operation.

In the same clock cycle, data acquired in different computing units are located at the different deposits in multiple register groups In device group, it is possible to prevente effectively from reading data conflict, so as to which parallel pond is better achieved, to improve pond operation effect Rate.

It should be understood that the first operation result is the results of intermediate calculations during the corresponding pondization of a pond window operates, among this Calculated result will also participate in subsequent operation (such as second pond operation).First computing unit is by storing the intermediate computations knot Fruit, to directly can execute operation during subsequent arithmetic using the results of intermediate calculations, be not necessarily to from external register heap Data are read, execute pond operation, operation dress in pond provided by the present application using by image processor compared with the existing technology It sets, can effectively improve data read-write efficiency, so as to improve pond efficiency on the whole.

In addition, each computing unit reads a pond operand every time, and operation is carried out for two data every time, this Sample make pond arithmetic unit provided by the present application be not only restricted to pond window size variation influence, in other words, the application The pond arithmetic unit that embodiment provides can be adapted for the pondization operation of arbitrary size pond window.

Therefore, pond arithmetic unit provided by the present application, may be implemented by multiple computing units and multiple register groups Parallel pond operation, can be improved pond efficiency；Further, since each computing unit can store the intermediate meter of pondization operation It calculates as a result, data read-write efficiency therefore can be improved, and then pond efficiency can be improved on the whole, to realize that maximization accelerates pond Change operation.

For the same computing unit, different register groups can be located in the data that the different clocks period obtains, Identical register group can also be located at.

Between the different clocks period, data acquired in different computing units can be located at different register groups, Identical register group can be located at.

Each register group in multiple register groups has a read port.That is, in each clock cycle, a deposit Device group can be read a data.

Optionally, the quantity of multiple computing units is less than or equal to the quantity of multiple register groups.

With reference to first aspect, any in a kind of possible implementation of first aspect, in multiple computing unit A computing unit can read the data in any of multiple register group register group.

Specifically, the connection relationship of multiple register groups and multiple computing units are as follows: each meter in multiple computing units Unit is calculated to connect with whole register groups in multiple register groups respectively.This connection relationship can be described as connecting entirely.

Optionally, multiple register groups and the connection relationship of multiple computing units may be: in multiple computing units Each computing unit is connect with the component register group in multiple register groups respectively.

With reference to first aspect, in a kind of possible implementation of first aspect, including memory module and computing module. Computing module is used for, and carries out the first pond operation to the first data for obtaining from multiple register groups and the second data, acquisition the One operation result, and first operation result is stored in the memory module, which is also used to, and deposits to the memory module First operation result of storage carries out the second pond operation with the third data obtained from multiple register groups.

Under scene of the pondization operation for maximum pond, the first pond operation is comparison operation, that is, compare the first data and Second number, correspondingly, which may include adder or comparator.In the case where pondization operation is the scene in average pond, First pond operation is accumulating operation, i.e., adds up to the first data and the second data, and correspondingly, which includes adding Musical instruments used in a Buddhist or Taoist mass.It should be understood that the computing module further includes multiplier, ask flat for total accumulation result to all operands in the window of pond .

Pond arithmetic unit provided by the present application uses instruction using the related operation in the operation of hardware realization pondization Control.

With reference to first aspect, in a kind of possible implementation of first aspect, which includes maximum Be worth pond operation, which includes: the first data-interface, for receive from the acquisition of multiple register group this One data；Second data-interface, for receiving second data obtained from multiple register group；First memory module is used In storage first data；Second memory module, for storing second data；Computing module, for comparing first data With second data, first operation result is obtained, and first operation result is stored in latch, which is First data are greater than second data；The latch is used for, for storing first operation result, and according to first operation As a result feedback signal is sent to first data-interface and second data-interface, which is used to indicate first data Interface closes and indicates that second data-interface is opened, wherein the second data-interface of the unlatching first is posted for receiving from this The third data obtained in storage group；The computing module is also used to be somebody's turn to do first calculated result and the third data Second pond operation.

Pond arithmetic unit provided by the present application can be used to implement maximum pond.

With reference to first aspect, in a kind of possible implementation of first aspect, which includes average It is worth pond operation；First computing unit specifically includes: the first data-interface, for from multiple register group receive this first Data；Second data-interface, for receiving second data from multiple register group；First memory module, for storing this First data；Second memory module, for storing second data；Adder, for first data and second data It adds up, obtains first operation result；Second memory module is also used to store first operation result；First number According to interface, it is also used to obtain the third data from first register group；The adder, be also used to first operation result with The third data carry out the second pond operation.

First computing unit further include: multiplier, for obtaining the accumulation result of k1*k2 data when the adder When, the average value of the k1*k2 data is obtained multiplied by 1/ (k1*k2) to the accumulation result of the k1*k2 data, wherein k1* K2 is the size that the pondization operates corresponding pond window, and k1 and k2 are respectively the integer for being not less than 2.

Pond arithmetic unit provided by the present application can be used to implement average pond.

With reference to first aspect, in a kind of possible implementation of first aspect, the pond arithmetic unit further include: control Unit processed, for sending control signal to multiple computing unit, which is used to indicate pondization operation as maximum pond Change or average pond；First computing unit is also used to receive the control signal；To in multiple data the first data and During second data carry out the first pond operation, which is specifically used for: when the control signal designation pond When change operation is maximum pond, the operation of maximum value pond is executed to first data and second data；When the control signal refers to Show that pondization operation for average Chi Huashi, carries out the operation of average value pond to first data and second data.

Pond arithmetic unit provided by the present application not only can be used for handling average pond, but also can be used for handling maximum pond Change, so as to improve hardware utilization, reduces hardware cost.

Second aspect provides a kind of computer equipment, the pond arithmetic unit including memory and first aspect offer, In, which is used to store the data of the pending pondization operation of the pond arithmetic unit.

Detailed description of the invention

Fig. 1 is the schematic diagram of pondization operation.

Fig. 2 is the schematic block diagram of pond arithmetic unit provided by the embodiments of the present application.

Fig. 3, Fig. 4, Fig. 5 and Fig. 6 are the structural schematic diagram of computing unit in the embodiment of the present application.

Fig. 7 is the schematic diagram of multiple register group storing datas in the embodiment of the present application.

Fig. 8 to Figure 11 is the schematic diagram that computing unit reads data from register group in the embodiment of the present application.

Figure 12 and Figure 13 is another schematic diagram of multiple register group storing datas in the embodiment of the present application.

Specific embodiment

Below in conjunction with attached drawing, the technical solution in the application is described.

Scheme provided by the embodiments of the present application in order to facilitate understanding combines Fig. 1 to describe the concept in pond initially below.

Pond refers to the operation of pond layer in neural network.The process of pondization operation is, by the window of a fixed size Mouth slided whole image plane, carried out operation, maximizing or averaging to the data covered in window at each moment Value is as output.Wherein, this window is properly termed as pond window.The size of pond window can be k1*k2, wherein k1 and k2 points Not Wei integer not less than 2, k1 can be identical or not identical with the value of k2, in embodiments of the present invention without limiting.

Fig. 1 is the schematic diagram of pondization operation.The size of input picture (image of i.e. pending pondization processing) is 4*4, pond The size for changing window is 2*2.The interval sliding that the pond window that pondization operation is a 2*2 is 2 with step-length on the image of 4*4, often 4 data of a pond window covering obtain an output as a result, all output results constitute output image, as shown in Figure 1, output The size of image is 2*2.

Image data in output image shown in Fig. 1 is obtained by following formula:

O1=op { d1, d2, d3, d4 },

Wherein, d1-d4 indicates the image data (i.e. pixel value) in input picture, and o1 indicates the picture number in output image According to (i.e. pixel value).

The operation mode of operator op can be maximizing (max) or average (avg).When the operation of operator op When mode is maximizing (max), corresponding pondization operation is known as maximum pond.When the operation mode of operator op is to be averaging When being worth (avg), corresponding pondization operation is known as averagely pond.

Data in the input picture of one pond window covering are properly termed as pond operand.For example, the size of pond window For k1*k2, then a pondization operation includes k1*k2 pond operand.

The pond operation being referred to herein can be average pond operation or maximum pond operation.

It can be k*k (i.e. k1=k2=k) with the size of pond window in certain embodiments herein in order to facilitate understanding with description For be described, k is integer not less than 2, but this does not cause to limit to the application.In practical application, k1 and k2's is big It is small can be identical, can not also be identical, it is not limited here.

Fig. 2 is the schematic block diagram of pond arithmetic unit 200 provided by the embodiments of the present application.As shown in Fig. 2, the device 200 include multiple register groups 210 and multiple computing units 220.

Multiple register groups 210 are used for, and store multiple data.

Specifically, multiple data of multiple register group storages are the data of pending pondization operation.For example, in Fig. 1 institute Show in scene, multiple data are the data of the 1st row and the 2nd row of input picture.It is understood that in each register group Including multiple registers.

Specifically, each register group 210 has a read port.In other words, a register group can be read every time A data out.As an example, the register group that the present embodiment is related to can be referred to as Bank, multiple deposit district's groups are as multiple Bank.Multiple computing units 220 are used for, and execute pondization operation to multiple data, wherein the number of different computing unit operations According in the different registers group being located in multiple register group.

Specifically, multiple computing units 220 are for carrying out pondization operation to the data of different pond windows parallel.

For the scene shown in Fig. 1, it is assumed that multiple computing units 220 include 2 computing units (such as: 1 He of computing unit Computing unit 2).Assuming that the data of the 1st row and the 2nd row in input picture shown in Fig. 1 are stored in multiple register groups 210, The size of pond window is 2*2.Then above-mentioned two rows data can be divided into two pond windows, wherein the data of the first pond window It include: 9,5,10 and 32, the data of the second pond window include: 5,3,2 and 2.Computing unit 1 and computing unit 2 can be right respectively The data of the two pond windows carry out pondization operation.For example, computing unit 1 carries out pondization operation to the data of the first pond window, Computing unit 2 carries out pondization operation to the data of the second pond window.Specifically, in clock cycle T, computing unit 1 is from register Group reads pond operand 9, and computing unit 2 reads pond operand 5 from register group；In next clock cycle T+1, calculate Unit 1 reads pond operand 5 from register group, and computing unit 2 reads pond operand 3 from register group；At next one Clock cycle T+2, computing unit 1 read pond operand 10 from register group, and computing unit 2 reads Chi Huacao from register group Count 2；In next one clock cycle T+3, computing unit 1 reads pond operand 32 from register group, and computing unit 2 is from posting Storage group reads pond operand 2.It should be understood that computing unit 1 and computing unit 2 can obtain simultaneously after clock cycle T+3 Obtain the pond result of two pond windows.

It is above-mentioned it is found that parallel pond may be implemented in pond arithmetic unit provided by the embodiments of the present application, in this way can be effective Improve pond operation efficiency.

It should be noted that reading the time of data in order to reduce computing unit from register group, computational efficiency, In are promoted In the embodiment of the present invention, in the same clock cycle, data acquired in different computing units are located in multiple register groups In different registers group.

For example, in clock cycle T, computing unit 1 reads pond operand from register group in the example of fig. 1 above 9, computing unit 2 reads pond operand 5 from register group, wherein pond operand 9 and 5 is located at different registers In group；In next clock cycle T+1, computing unit 1 reads pond operand 5 from register group, and computing unit 2 is from register Group reads pond operand 3, wherein pond operand 5 and 3 is located in different register groups；In next one clock Cycle T+2, computing unit 1 read pond operand 10 from register group, and computing unit 2 reads pond operand from register group 2, wherein pond operand 10 and 2 is located in different register groups；In next one clock cycle T+3, computing unit 1 reads pond operand 32 from register group, and computing unit 2 reads pond operand 2 from register group, wherein pondization operation Number 32 and 2 is located at different register groups.

It should be understood that data acquired in different computing units are located in multiple register groups in the same clock cycle In different registers group, it is possible to prevente effectively from reading data conflict, so as to which parallel pond is better achieved, to improve pond Operation efficiency.

It should be noted that can be located at for the same computing unit in the data that the different clocks period obtains different Register group, can also be located at identical register group, and the embodiment of the present application does not limit this.

The function and structure of each computing unit 220 in multiple computing units 220 be it is identical, in order to facilitate understanding With description, will hereinafter be described by taking the computing unit (being denoted as the first computing unit 220) in multiple computing units 220 as an example The structure and function of computing unit in pond arithmetic unit provided by the embodiments of the present application.Below for the first computing unit Each computing unit 220 of 220 description suitable for multiple computing units 220.

First computing unit 220 is used for, to the first Chi Huayun of the first data and the progress of the second data in multiple data It calculates, obtains the first operation result；Store first operation result；Is obtained from the first register group in multiple register Three data；Second pond operation is carried out to first operation result and the third data.

First data and second data respectively indicate first computing unit and are responsible in a pond window of processing First pond operand and second pond operand, the third data indicate that the third pondization in this pond window operates Number.

In addition, in the embodiment of the present application, each computing unit reads a pond operand every time, and it is directed to two every time A data carry out operation, so that pond arithmetic unit provided by the embodiments of the present application is not only restricted to the size variation of pond window Influence, in other words, pond arithmetic unit provided by the embodiments of the present application can be adapted for the pond of arbitrary size pond window Operation.

Therefore, pond arithmetic unit provided by the embodiments of the present application, can by multiple computing units and multiple register groups To realize parallel pond operation, pond efficiency can be improved；Further, since each computing unit can store pondization operation Results of intermediate calculations, therefore data read-write efficiency can be improved, and then pond efficiency can be improved on the whole, it is maximized with realizing Accelerate pond operation.

Specifically, as shown in Fig. 2, the data in multiple register groups 210 are loaded from dynamic memory 240.It is dynamic State memory is, for example, dynamic random access memory (Dynamic Random Access Memory, DRAM).Dynamic memory Device 240 can be located inside pond arithmetic unit 200, can also be located at the outside of pond arithmetic unit 200.

The quantity of multiple computing units 220 is less than or equal to the quantity of multiple register groups 210.For example, the pond operation Device 200 includes n computing unit 220 and n register group 210.

Optionally, the connection relationship of multiple register groups 210 and multiple computing units 220 are as follows: in multiple computing units Each computing unit is all connected with all register groups in multiple register groups respectively.It should be understood that this connection relationship, so that Each computing unit in multiple computing units can obtain to be stored in any one register group in multiple register group Data.This connection relationship is properly termed as connecting entirely.

The multiple computing units hereinafter occurred are connect entirely with multiple register groups, and what is referred to is exactly, in multiple computing units Each computing unit be all connected with respectively with all register groups in multiple register groups.

Optionally, the connection relationship of multiple register groups 210 and multiple computing units 220 are as follows: in multiple computing units Each computing unit is connect with the component register group in multiple register groups respectively.Specifically, different computing units are connected Register group can be identical, can also be entirely different, can also be not exactly the same, the embodiment of the present application does not limit this.

It, below will be to meter provided in an embodiment of the present invention in order to clearly illustrate how the embodiment of the present invention does pond operation Unit is calculated to be described.It include memory module and computing module in first computing unit 220.Computing module is used for, to from multiple Register group obtain the first data and the second data carry out the first pond operation, acquisition the first operation result, and by this first Operation result is stored in the memory module, which is also used to, to the memory module storage the first operation result with from The third data that multiple register groups obtain carry out the second pond operation.

Under scene of the pondization operation for maximum pond, the first pond operation is comparison operation, that is, compare the first data and Second number, correspondingly, which may include adder or comparator.

Under scene of the pondization operation for average pond, the first pond operation is accumulating operation, i.e., to the first data and the Two data add up, and correspondingly, which includes adder.It should be understood that the computing module further includes multiplier, it is used for Total accumulation result of all operands in the window of pond is averaging.

Optionally, as the first implementation, which includes:

First data-interface, for receiving first data obtained from multiple register group；Second data-interface is used In second data that reception is obtained from multiple register group；First memory module, for storing first data；Second deposits Module is stored up, for storing second data；Computing module, for comparing first data and second data, obtain this first Operation result, and first operation result is stored in latch, which is that first data are greater than second number According to；The latch is used for, for storing first operation result, and according to first operation result to first data-interface with Second data-interface sends feedback signal, which is used to indicate first data-interface and closes and indicate second number It is opened according to interface, wherein the second data-interface of the unlatching is for receiving the third number obtained from first register group According to；The computing module is also used to carry out the second pond operation to first calculated result and the third data.

Specifically, as shown in figure 3, first computing unit 220 includes data-interface 311, data-interface 312, storage mould Block 321, memory module 322, computing module 330 and latch 340.

Data-interface 311 is used for, and obtains data from multiple register groups.

Data-interface 312 is used for, and obtains data from multiple register groups.

Memory module 321 is used for, the data that storing data interface 311 obtains.

Memory module 322 is used for, the data that storing data interface 312 obtains.

Computing module 330 is used for, and obtains first operand from memory module 321, obtains the second operation from memory module 322 Number, and compare first operand and second operand, and comparison result is stored in latch 340.

Latch 340 is used for, when the comparison result is that the first operand is greater than or equal to the second operand, to number The first feedback signal is sent according to interface 311 and data-interface 312, when the comparison result is that the first operand is less than second behaviour When counting, for sending the second feedback signal to data-interface 311 and data-interface 312, wherein first feedback signal is used for Data-interface 311, two interface 312 of turn-on data are closed, second feedback signal is for turn-on data interface 311, closing data Interface 312.

It should be understood that not obtaining data when data-interface 311 (or data-interface 312) is closed from register group, working as data When interface 311 (or data-interface 312) is opened, data are obtained from register group.

By taking the size of pond window is 2*2 as an example, in clock cycle T, data-interface 311 receives first from a register group Data, memory module 321 store first data；In clock cycle T+1, data-interface 312 receives the from a register group Two data, memory module 322 store second data, and computing module 330 obtains operand (i.e. the from memory module 321 One data), obtain second operand (i.e. the second data) from memory module 322, to first operand and second operand into Row compares, and comparison result is stored in latch 340, and latch 340 is used for, when the comparison result is that the first operand is greater than Or when being equal to the second operand, the first feedback signal is sent to data-interface 311 and data-interface 312, when the comparison result When being less than the second operand for the first operand, for sending the second feedback letter to data-interface 311 and data-interface 312 Number, for ease of description with understanding, second operand is greater than or equal to first operand below, latch 340 is to data Interface 311 and 312 is described for sending the first feedback signal；In clock cycle T+2, data-interface 311 is closed, and data connect Mouthfuls 312 open, and receive third data from a register group, and memory module 322 stores third data, computing module 330 from First operand (the larger value compared in clock cycle T+1: the first data) is obtained in memory module 321, from storage mould Second operand (i.e. third data) are obtained in block 322, and first operand and second operand evidence are compared, compare knot Fruit is that first operand is greater than or equal to second operand, comparison result is stored in latch 340, latch 340 is used for, to number The first feedback signal is sent according to interface 311 and data-interface 312；In clock cycle T+3, data-interface 311 is closed, data-interface 312 open, and receive the 4th data from register group, and memory module 322 stores the 4th data, and computing module 330 is from depositing It stores up in module 321 and obtains first operand (the larger value compared in clock cycle T+2: the first data), from memory module Second operand (i.e. the 4th data) are obtained in 322, and first operand and second operand evidence are compared, comparison result It is greater than or equal to second operand for first operand, so far, obtains the pond result of this pondization operation: the first data.

Memory module 321 and memory module 322 all can be registers.

Optionally, data-interface 311 and data-interface 312 can be multiple selector.The input terminal of multiple selector Quantity be equal to the quantity of register group that first computing unit is connected.

Optionally, computing module 330 is adder, the first operation which is used to obtain from memory module 321 Number subtracts the second operand obtained from memory module 322, and the result subtracted each other is stored in latch 340 as comparative result.

Optionally, computing module 330 is comparator, is subtracted for comparing from the first operand that memory module 321 obtains The second operand obtained from memory module 322, and comparison result is stored in latch 340.

It should be understood that the computing unit of the first implementation is suitable for pondization operation as the scene in maximum pond, that is, It says, pond arithmetic unit provided by the embodiments of the present application can be used to implement maximum pond.

Optionally, as second of implementation, which is specifically included:

First data-interface, for receiving first data from multiple register group；Second data-interface is used for from this Multiple register groups receive second data；First memory module, for storing first data；Second memory module, is used for Store second data；Adder obtains the first operation knot for adding up to first data and second data Fruit；Second memory module is also used to store first operation result；First data-interface is also used to from first deposit Device group obtains the third data；The adder is also used to carry out second pond to first operation result and the third data Operation.

First computing unit 220 further include: multiplier, for obtaining the cumulative knot of k1*k2 data when the adder When fruit, the average value of the k1*k2 data is obtained multiplied by 1/ (k1*k2) to the accumulation result of the k1*k2 data, wherein K1*k2 is the size that the pondization operates corresponding pond window, and k1 and k2 are respectively the integer for being not less than 2.

Specifically, as shown in figure 4, first computing unit 220 includes data-interface 411, data-interface 412, storage mould Block 421, memory module 422, adder 430 and multiplier 440.

Data-interface 411 is used for, and obtains data from register group.

Data-interface 412 is used for, and obtains data from register group.

Memory module 421 is used for, the data that storing data interface 411 obtains.

Memory module 422 is used for, the data that storing data interface 412 obtains.

Adder 430 is used for, and obtains first operand from memory module 421, obtains the second operation from memory module 422 Number, and add up to first operand and second operand, and accumulation result is stored in memory module 422, when the first calculating After register group k1*k2 data of reading, adder 430 is used to accumulation result being sent to multiplier 440, k1* unit K2 is the size of pond window.

When accumulation result be stored in memory module 422 when, data-interface 412 close, later, only data-interface 411 be used for from Register group receives data.

Multiplier 440 is used for, and the accumulation result that adder 430 is sent so far obtains this pond multiplied by 1/ (k1*k2) The pond result of operation.

By taking the size of pond window is 2*2 as an example, in clock cycle T, data-interface 411 receives first from a register group Data, memory module 421 store first data；In clock cycle T+1, data-interface 412 receives the from a register group Two data, memory module 422 store second data, and adder 430 obtains first operand (i.e. the from memory module 421 One data), obtain second operand (i.e. the second data) from memory module 422, to first operand and second operand into Row is cumulative, and accumulation result (being denoted as accumulated value 1) is stored in memory module 422, at this moment closes data-interface 412；In clock week Phase T+2, data-interface 412 are closed, and data-interface 411 is opened, and receive third data, memory module from a register group 421 storage third data, adder 430 obtains first operand (i.e. third data) from memory module 421, from memory module Second operand (i.e. accumulated value 1) is obtained in 422, and to first operand and second operand according to adding up, by cumulative knot Fruit (being denoted as accumulated value 2) is stored in memory module 422；In clock cycle T+3, data-interface 412 is closed, and data-interface 411 is opened, And the 4th data are received from a register group, memory module 421 stores the 4th data, and adder 430 is from memory module 421 It obtains first operand (i.e. the 4th data), obtains second operand (i.e. accumulated value 2) from memory module 422, and to first Accumulation result (being denoted as accumulated value 3) is sent to multiplier 440, multiplier 440 according to adding up by operand and second operand By accumulated value 3 multiplied by 1/4, multiplied result is the pond result of this pondization operation.

It is non-limiting as example, it is carried out so that memory module 422 is used to store the accumulation result of adder 430 as an example above It describes, in practical operation, can also be designed to that the accumulation result for storing adder 430 by memory module 421 (at this moment, needs Close data-interface 411, turn-on data interface 412), the present embodiment is not construed as limiting this.

Memory module 421 and memory module 422 all can be registers.

Optionally, data-interface 411 and data-interface 412 can be multiple selector.The input terminal of multiple selector Quantity be equal to the quantity of register group that first computing unit is connected.

It should be understood that the computing unit of second of implementation is suitable for pondization operation as the scene in average pond, that is, It says, pond arithmetic unit provided by the embodiments of the present application can be used for handling average pond.

Optionally, as the third implementation, as shown in Figure 5 and Figure 6, which connects including data Mouth 511, data-interface 512, memory module 521, memory module 522, adder 530, multiplier 540 and latch 550.

Data-interface 511 is used for, and obtains data from register group.

Data-interface 512 is used for, and obtains data from register group.

Memory module 521 is used for, the data that storing data interface 511 obtains.

Memory module 522 is used for, the data that storing data interface 512 obtains.

In the case where pondization operation is maximum pond, as shown in Figure 5.

Adder 530 is used for, and obtains first operand from memory module 521, obtains the second operation from memory module 522 Number, and compare first operand and second operand, and comparison result is stored in latch 550.

Latch 550 is used for, when the comparison result is that the first operand is greater than or equal to the second operand, to number The first feedback signal is sent according to interface 511 and data-interface 512, when the comparison result is that the first operand is less than second behaviour When counting, for sending the second feedback signal to data-interface 511 and data-interface 512, wherein first feedback signal is used for Data-interface 511, two interface 512 of turn-on data are closed, second feedback signal is for turn-on data interface 511, closing data Interface 512.

It should be understood that not obtaining data when data-interface 511 (or data-interface 512) is closed from register group, working as data When interface 511 (or data-interface 512) is opened, data are obtained from register group.

In the case where pondization operation is average pond, as shown in Figure 6.

Adder 530 is used for, and obtains first operand from memory module 521, obtains the second operation from memory module 522 Number, and add up to first operand and second operand, and accumulation result is stored in memory module 522, when the first calculating After register group k1*k2 data of reading, adder 530 is used to accumulation result being sent to multiplier 550, k1* unit K2 is the size of pond window.

When accumulation result be stored in memory module 522 when, data-interface 512 close, later, only data-interface 511 be used for from Register group receives data.

Multiplier 540 is used for, and the accumulation result that adder 530 is sent so far obtains this pond multiplied by 1/ (k1*k2) The pond result of operation.

It should be understood that the first computing unit of the third implementation can support two states, one is for doing maximum The state (such as Fig. 5) in pond, one is the states (such as Fig. 6) for doing average pond.

When the implementation of the first computing unit 220 is the third above-mentioned implementation, as shown in Fig. 2, the Chi Huayun Calculate device further include: control unit 230, for sending control signal to multiple computing unit, which is used to indicate Pondization operation is maximum pond or average pond.

First computing unit 220 is also used to receive the control signal；When control signal designation pondization operation is most When great Chiization, which executes the operation of maximum value pond to first data and second data；When the control The operation of the signal designation pondization is average Chi Huashi, carries out the operation of average value pond to first data and second data.

Specifically, when control signal designation pondization operation is maximum pond, which is switched to Such as the state of Fig. 5；When control signal designation pondization operation is average Chi Huashi, first computing unit 220 be switched to as The state of Fig. 6.

It should be understood that the control unit 230 can also be located at the outside of pond arithmetic unit 200 provided by the present application, this Shen It please not limit this.

Memory module 521 and memory module 522 all can be registers.

Optionally, data-interface 511 and data-interface 512 can be multiple selector.The input terminal of multiple selector Quantity be equal to the quantity of register group that first computing unit 220 is connected.

It should be understood that computing unit shown in fig. 5 both can be adapted for the scene in average pond, it is readily applicable to maximum pond The scene of change, that is to say, that pond arithmetic unit provided by the embodiments of the present application not only can be used for handling average pond, but also can be with Maximum pond is handled, so as to improve hardware utilization, reduces hardware cost.

The mode of multiple register group storing datas is described below.In order to facilitate understanding with description, in Examples below It is described so that multiple computing units are connect entirely with multiple register groups as an example.The embodiments described below is by reasonably converting It is readily applicable to the scene that each computing unit is connect with part register group in multiple register groups in multiple computing units, This partial content also falls into the application protection scope.In Examples below by multiple register groups 210 be multiple Bank for into Row description.

In the embodiment of the present application, storage mode of the data of pending pondization operation in multiple register groups, so that In each reading data procedures (i.e. each clock cycle), the data that different computing units are read are located at different register groups In.It can guarantee that different computing units read data from register group and will not clash.

Assuming that multiple computing units are n computing unit, multiple register groups are n Bank, and the size of pond window is k* The size of k, the image of pending pondization processing are m*m, wherein k is the positive number greater than 1, and n is the positive number greater than 1, and m is greater than 1 Positive number.Assuming that m=n*k.Such as the data of pending pondization operation are the k row n*k column data in the image, then pending pond Change storage mode of the data of operation in n Bank are as follows: the 1st column, the column of kth+1,2k+1 of the jth row in the k row Column ..., (n-1) * k+1 column data be stored respectively in the different registers group of the n register group, the 2nd of the jth row the Column, the column of kth+2,2k+2 column ..., (n-1) * k+2 column data be stored respectively in the different registers group of the n register group In ..., the kth of jth row column, kth+k column, 2k+k column ..., (n-1) * k+k column data be stored respectively in this n deposit In the different registers group of device group, j 1,2 ..., k.

Fig. 7 provides a kind of schematic diagram of storage mode of the data of specific pending pondization operation in multiple Bank. In Fig. 7, k 2, n 9, the i.e. size of pond window are 2*2, and the quantity of computing unit and Bank are 9, as shown in Figure 6 9 computing units and 9 Bank.It include that (schematically provide in Fig. 7 has 5 to multiple registers in each Bank in each Bank A register), then the register array that the register of 9 Bank constitutes a 9 row multiple rows (schematically provides 9 rows 5 column in Fig. 7 Register array).The size of image to be processed is 18*18, it is assumed that the data of pending pondization operation are the 1st of the image The capable data with the 2nd row, it is specific as shown in Figure 7, wherein 4 data of identical patterns indicate the number in the same pond window According to.

Data in 1st row of image and the 2nd row are stored in the mode of 9 Bank are as follows:

Data in 1st row of image are loaded since the 1st row that the r1 in register array is arranged, until occupying 2 column Register (r1 column and r2 column shown in fig. 7) completes the load of the 1st row data.

Data in 2nd row of image add since the first row of (r1+2) column (i.e. r3 column) in register array It carries, the load of the 2nd row data is completed until occupying 2 column registers (r3 column and r4 column shown in fig. 7).

Specifically, as shown in Figure 7.

The flow chart that 9 computing units read data from 9 Bank is as shown in Figure 8, Figure 9, Figure 10 and Figure 11.

As shown in figure 8, the data in 9 Bank in dotted line frame are read out simultaneously in clock cycle T, as shown in figure 8, These data be respectively the 1st column of the 1st row of image, the 3rd column, the 5th column ... the data of the 17th column, i.e., in each pond window First pond operand.Wherein, computing unit 1 reads data " 1 " from Bank1, and computing unit 2 is read from Bank3 Data " 3 ", computing unit 3 read data " 5 " from Bank5, and computing unit 4 reads data " 7 ", computing unit 5 from Bank7 Data " 9 " are read from Bank9, computing unit 6 reads data " 11 " from Bank2, and computing unit 7 reads number from Bank4 According to " 13 ", computing unit 8 reads data " 15 " from Bank6, and computing unit 9 reads data " 17 " from Bank8.

As shown in figure 9, the data in 9 Bank in dotted line frame are read out simultaneously, such as Fig. 9 institute in clock cycle T+1 Show, these data be respectively the 1st row of image the 2nd column, the 4th column, the 6th column ... the 18th column data, i.e., each pond window Second interior pond operand.Wherein, computing unit 1 reads data " 2 " from Bank2, and computing unit 2 is read from Bank4 Access reads data " 6 " according to " 4 ", computing unit 3 from Bank6, and computing unit 4 reads data " 8 " from Bank8, calculates single Member 5 reads data " 10 " from Bank1, and computing unit 6 reads data " 12 " from Bank3, and computing unit 7 is read from Bank5 Access reads data " 16 " according to " 14 ", computing unit 8 from Bank7, and computing unit 9 reads data " 18 " from Bank9.

As shown in Figure 10, the data in clock cycle T+2,9 Bank in dotted line frame are read out simultaneously, such as Figure 10 institute Show, these data be respectively the 2nd row of image the 1st column, the 3rd column, the 5th column ... the 17th column data, i.e., each pond window Interior third pond operand.Wherein, computing unit 1 reads data " 19 " from Bank1, and computing unit 2 is read from Bank3 Access reads data " 5 " according to " 3 ", computing unit 3 from Bank5, and computing unit 4 reads data " 7 " from Bank7, calculates single Member 5 reads data " 9 " from Bank9, and computing unit 6 reads data " 11 " from Bank2, and computing unit 7 is read from Bank4 Data " 13 ", computing unit 8 read data " 15 " from Bank6, and computing unit 9 reads data " 17 " from Bank8.

As shown in figure 11, the data in clock cycle T+2,9 Bank in dotted line frame are read out simultaneously, such as Figure 11 institute Show, these data be respectively the 2nd row of image the 2nd column, the 4th column, the 6th column ... the 18th column data, i.e., each pond window The 4th interior pond operand.Wherein, computing unit 1 reads data " 20 " from Bank2, and computing unit 2 is read from Bank4 Access reads data " 24 " according to " 22 ", computing unit 3 from Bank6, and computing unit 4 reads data " 26 " from Bank8, calculates Unit 5 reads data " 28 " from Bank1, and computing unit 6 reads data " 30 " from Bank3, and computing unit 7 is from Bank5 It reads data " 32 ", computing unit 8 reads data " 34 " from Bank7, and computing unit 9 reads data " 36 " from Bank9.Extremely This, 9 computing units can export 9 pond results simultaneously.

By the description above in association with Fig. 8, Fig. 9, Figure 10 and Figure 11 it is found that each clock cycle, different computing units are read The data taken are located in different Bank.In addition, the same computing unit reads the Bank of data in the different clock cycle It may be different.

The structure of computing unit in above-described embodiment can be structure shown in Fig. 3, be also possible to knot shown in Fig. 4 Structure can also be shown in fig. 5 as a result, the application does not limit this.

As an example, it is assumed that in the embodiment of above-mentioned combination Fig. 7 description, the structure of 9 computing units such as Fig. 5 institute The structure shown, it is specific as shown in Figure 7, it should be appreciated that it is succinct for picture, the structure of computing unit 9 is only provided in Fig. 7, The structure of his 8 computing units is consistent with the structure of computing unit 9.

In the case where pondization operation is maximum pond, 9 computing units shown in fig. 7 are switched to as shown in Figure 5 State.It is described by taking computing unit 9 as an example below, computing unit 1-8 is equally applicable to for the description of computing unit 9, For sake of simplicity, repeating no more.For example, in clock cycle T, as shown in figure 8, data-interface 511 in computing unit 9 is from Bank1 It receives data " 1 ", 521 storing data of memory module " 1 "；In clock cycle T+1, as shown in figure 9, the data in computing unit 9 Interface 512 receives data " 2 " from Bank2, memory module storing data " 2 ", and adder 530 obtains the from memory module 521 One operand " 1 " obtains second operand " 2 " from memory module 522, compares to two operands, (i.e. by comparison result First operand is less than second operand) deposit latch 550；Latch 550 is sent out to data-interface 511 and data-interface 512 The second feedback signal is sent, which open data-interface 511, and data-interface 512 is closed；In clock cycle T+2, As shown in Figure 10, data-interface 511 receives data " 19 " from Bank1,521 storing data of memory module " 19 ", adder 530 First operand " 19 " are obtained from memory module 521, obtain second operand " 2 " (i.e. clock cycle T+1 from memory module 522 The larger value compared), two operands are compared, by comparison result (i.e. first operand is greater than second operand) deposit Latch 550；Latch 550 sends the first feedback signal, first feedback signal to data-interface 511 and data-interface 512 Close data-interface 511, data-interface 512 is opened；In clock cycle T+3, as shown in figure 11, data-interface 512 is from Bank2 Middle reception data " 20 ", 522 storing data of memory module " 20 ", adder 530 obtain first operand from memory module 521 " 19 " (i.e. clock cycle T+2 compare the larger value) obtains second operand " 20 " from memory module 522, operates to two Number compares, and obtains comparison result (i.e. first operand is less than second operand) to get the Chi Huajie operated to this pondization Fruit " 20 ".

In the case where pondization operation is maximum pond, 9 computing units shown in fig. 7 are switched to as shown in Figure 6 State.The process that computing unit reads data from register group is consistent with the maximum description of Chi Huazhong above, and difference is, State shown in fig. 6 is different from the data processing method of state shown in fig. 5, specifically describes the description of combination Fig. 5 as detailed above, For sake of simplicity, which is not described herein again.

It should be understood that the pond that each computing unit obtains is as a result, can write back in multiple register groups.For example, original graph Pond result as in a line is write in different register groups.

It is above-mentioned it is found that pond arithmetic unit provided by the embodiments of the present application, passes through multiple computing units and multiple registers Group, and different computing units can read data from different register groups without blocking, so as to realize parallel pond, this The efficiency of pondization operation can be improved in sample；In addition, include the memory module for storing results of intermediate calculations in computing unit, this Data read-write efficiency can be improved in sample, to improve the efficiency of pondization operation on the whole.

It should be understood that Fig. 7, Fig. 8-Figure 11 are merely illustrative and non-limiting.It on this basis, can for different application scenarios To deduce by adaptability, correspondingly processing method is obtained, these schemes also fall into the protection scope of the application.

In the description above in association with Fig. 7, Fig. 8-Figure 11, it is described by taking m=n*k as an example, it, may in practical application It will appear m > n*k or m < n*k.

For example, be also described by taking the 1st row to row k of image as an example as m > n*k, the 1st row of image to row k In the pondization operations of preceding n*k column can make n computing unit full-load run, i.e., so that n computing unit realization is transported parallel It calculates.After k*k clock cycle, rear (m-k*n) column data of the 1st row of image into row k can only support computing unit 1-5 Pond operation, there is no data in computing unit 6-9, as shown in figure 12 (n is 9, k 2 in Fig. 8), lead to part computing unit Free time leads to the wasting of resources in this way.As m < n*k, it also will appear the above-mentioned problem for leading to the part computing unit free time.

In view of the above-mentioned problems, the embodiment of the present application provides a kind of solution.When n computing unit not full load, will scheme The partial data in other rows as in is stored in n register group, so that n computing unit can read data and carry out parallel Pondization processing.

By taking m > n*k as an example, as shown in figure 13, also by taking n is 9 as an example, for k is 2.1st row of image is shown extremely in Figure 13 The data of 4th row, wherein a kind of pattern indicates the data of the 1st row to the 2nd row, and another pattern indicates the 3rd row to the 4th row Data.

Pondization operation first is carried out to the preceding 9*2 column data in the 1st row of image and the 2nd row, it, will after 2*2 clock cycle Rear (m-2*9) column data in 1st row of image and the 2nd row is stored into 9 register groups (Bank), by the 3rd row of image It stores with the preceding x column data in the 4th row into 9 register groups, and the preceding x column data in the 3rd row and the 4th row is deposited at 9 Position of rear (m-2*9) column data in 9 register groups in the 1st row and the 2nd row of storage location and image in device group Be stitched together, as shown in figure 13, the data after above-mentioned splicing can make, next 2*2 when In the clock period, 9 computing units can be with full-load run.

When m is less than n*k, processing method is similar.In the data of the 1st row and the 2nd row that start to process image, by image The 1st row and the 2nd row data store into 9 register groups, the preceding y column data storage of the 3rd row and the 4th row of image is arrived In 9 register groups, these data can make within the next k*k clock cycle, 9 computing unit full-load runs.

Storage method of the data provided in this embodiment in multiple register groups can become joining method.

It should be understood that by scheme provided by the embodiments of the present application, so that pond arithmetic unit provided by the present application is not for With the image and pond window of size, may be implemented parallel pond, that is, to realize the acceleration of pond operation.

In conclusion pond arithmetic unit provided by the embodiments of the present application, passes through multiple computing units and multiple registers Parallel pond operation may be implemented in group, and pond efficiency can be improved；Further, since each computing unit can store Chi Huacao The results of intermediate calculations of work, therefore data read-write efficiency can be improved, and then pond efficiency can be improved on the whole, to realize most Bigization accelerates pond operation.

Should also be understood that above-mentioned each embodiment, can reasonably be combined according to internal logical relationship, the application to this not It limits.

It should also be understood that in some embodiments above, to wrap 9 computing units and 9 registers in the arithmetic unit of pond It is described for group, it is merely illustrative and non-limiting.In practical application, it can design according to actual needs in the arithmetic unit of pond The quantity of multiple computing units and multiple register groups.

Optionally, the specific form of pond arithmetic unit provided by the embodiments of the present application can be chip.

The embodiment of the present application also provides a kind of computer equipment, the pond operation provided including memory and foregoing embodiments Device, wherein the memory is used to store the data of the pending pondization operation of the pond arithmetic unit.

Those of ordinary skill in the art may be aware that list described in conjunction with the examples disclosed in the embodiments of the present disclosure Member and algorithm steps can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually It is implemented in hardware or software, the specific application and design constraint depending on technical solution.Professional technician Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed Scope of the present application.

It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.

In several embodiments provided herein, it should be understood that disclosed systems, devices and methods, it can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit It divides, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or The mutual coupling, direct-coupling or communication connection discussed can be through some interfaces, the indirect coupling of device or unit It closes or communicates to connect, can be electrical property, mechanical or other forms.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.

It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product It is stored in a computer readable storage medium.Based on this understanding, the technical solution of the application is substantially in other words The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a People's computer, server or network equipment etc.) execute each embodiment the method for the application all or part of the steps. And storage medium above-mentioned includes: that USB flash disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), arbitrary access are deposited The various media that can store program code such as reservoir (Random Access Memory, RAM), magnetic or disk.

It above are only the specific embodiment of the application, but the protection scope of the application is not limited thereto, it is any to be familiar with Those skilled in the art within the technical scope of the present application, can easily think of the change or the replacement, and should all cover Within the protection scope of the application.Therefore, the protection scope of the application should be based on the protection scope of the described claims.

Claims

1. a kind of pond arithmetic unit characterized by comprising

Multiple register groups, for storing multiple data；

Multiple computing units, for executing pondization operation to the multiple data, wherein the data of different computing unit operations In the different registers group in the multiple register group；

The first computing unit in the multiple computing unit is used for:

To the first data and the first pond operation of progress of the second data in the multiple data, the first operation result is obtained；

Store first operation result；

Third data are obtained from the first register group in the multiple register；

Second pond operation is carried out to first operation result and the third data.

2. pond arithmetic unit according to claim 1, it is characterised in that:

Each computing unit in the multiple computing unit can obtain any one register in the multiple register group The data stored in group.

3. pond arithmetic unit according to claim 1 or 2, which is characterized in that further include:

Control unit, for sending control signal to the multiple computing unit, the control signal is used to indicate the pond Operation is maximum pond or average pond；

First computing unit is also used to receive the control signal；

To in the multiple data the first data and the second data carry out the first pond operation during, it is described first count Unit is calculated to be specifically used for:

When the operation of the pondization described in the control signal designation is maximum pond, first data and second data are held The pond operation of row maximum value；

The operation of the pondization described in the control signal designation is average Chi Huashi, to first data and second data into The pond operation of row average value.

4. pond arithmetic unit according to any one of claim 1 to 3, which is characterized in that first pond operation Including the operation of maximum value pond,

First computing unit includes:

First data-interface, for receiving first data obtained from the multiple register group；

Second data-interface, for receiving second data obtained from the multiple register group；

First memory module, for storing first data；

Second memory module, for storing second data；

Computing module obtains first operation result for first data and second data, and will be described First operation result is stored in latch, and the comparison result is that first data are greater than second data；

The latch is used for, for storing first operation result, and according to first operation result to described first Data-interface and second data-interface send feedback signal, and the feedback signal is used to indicate first data-interface and closes It closes and indicates that second data-interface is opened, wherein the second data-interface of the unlatching is posted for receiving from described first The third data obtained in storage group；

The computing module is also used to carry out second pond operation to first calculated result and the third data.

5. pond arithmetic unit according to any one of claim 1 to 3, which is characterized in that first pond operation Including the operation of average value pond；

First computing unit specifically includes:

First data-interface, for receiving first data from the multiple register group；

Second data-interface, for receiving second data from the multiple register group；

First memory module, for storing first data；

Second memory module, for storing second data；

Adder obtains first operation result for adding up to first data and second data；

Second memory module is also used to store first operation result；

First data-interface is also used to obtain the third data from first register group；

The adder is also used to carry out second pond operation to first operation result and the third data.

6. pond arithmetic unit according to claim 5, which is characterized in that first computing unit further include:

Multiplier tires out the k1*k2 data when for obtaining the accumulation result of k1*k2 data when the adder Result is added to obtain the average value of the k1*k2 data multiplied by 1/ (k1*k2), wherein k1*k2 is that pondization operation corresponds to Pond window size, k1 and k2 are respectively the integer for being not less than 2.

7. a kind of computer equipment, including memory and such as pond arithmetic unit of any of claims 1-6, In, the memory is used to store the data of the pending pondization operation of the pond arithmetic unit.