CN108573305A

CN108573305A - A kind of data processing method, equipment and device

Info

Publication number: CN108573305A
Application number: CN201710152660.5A
Authority: CN
Inventors: 胡睿; 方颉翔; 张铧铧
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2017-03-15
Filing date: 2017-03-15
Publication date: 2018-09-25
Anticipated expiration: 2037-03-15
Also published as: CN108573305B

Abstract

An embodiment of the present invention provides a kind of data processing method, equipment and devices, wherein data processing method includes：It obtains and presets convolution kernel, and determine convolution width of frame；It obtains and according to chip buffer memory capacity, preset data amount and the first default line number, determines data column width；Pending data matrix is divided according to data column width, obtains multiple row region；For any column region, the second default line number of extraction waits for that operational data is sent to chip caching, using default convolution kernel, treats operational data and carries out convolution algorithm；After having participated in convolution algorithm after the first row data in operational data, the first row data are deleted, and extract the next line data of corresponding column region, update waits for operational data；Wait for that operational data carries out convolution algorithm to newer, until the All Datarows in region have both participated in convolution algorithm.Chip generated power consumption, raising process performance when carrying out data processing can be reduced through the invention.

Description

A kind of data processing method, equipment and device

Technical field

The present invention relates to chip design art fields, more particularly to a kind of data processing method, equipment and device.

Background technology

CNN (Convolutional Neural Network, convolutional neural networks) belongs to one in deep learning algorithm Kind, it is a kind of algorithm of the working method realization data information extraction of simulation cerebral nerve network.The algorithm utilizes convolutional calculation The preliminary extraction for completing information realizes high performance target detection in conjunction with some nonlinear operations.With deep learning algorithm It continues to develop, CNN is widely applied in the image processing fields such as target detection, data classification, information extraction and matching. Due to the characteristic of CNN algorithms itself, there are a large amount of data to need to repeat to participate in operation, thus to spatial cache in chip slapper have compared with High requirement, the required all information of CNN operations, existing major part core could be stored by needing sufficiently large memory space Piece is unable to reach the requirement for directly storing all required information.

The problem of all information needed can not be directly stored in piece for above-mentioned most of chip, the prior art proposes one Kind CNN implementation methods, this method import all numbers needed for operation from memory again every time before carrying out convolution algorithm According to being calculated.

There is the data blocks that the data largely reused, the prior art are imported from memory every time in being calculated due to CNN In the case where there is a large amount of duplicate data, so as to cause thering is a large amount of bandwidth to be wasted in the process of reading so that Chip produces larger power consumption when carrying out data processing, influences process performance.

Invention content

The embodiment of the present invention is designed to provide a kind of data processing method, equipment and device, with reduce chip into Generated power consumption, raising process performance when row data processing.Specific technical solution is as follows：

In a first aspect, an embodiment of the present invention provides a kind of data processing method, the method includes：

It obtains and presets convolution kernel, and determine the convolution width of frame of the default convolution kernel；

It obtains and according to chip buffer memory capacity, preset data amount and the first default line number, determines data column width, wherein The data column width is greater than or equal to the convolution width of frame；

Pending data matrix is divided into ranks according to the data column width, obtains multiple row region, wherein described to wait for Processing data matrix be store in memory, the matrix that includes all pending datas；

When receiving data processing instructions, for any column region in all column regions, the second default line number is extracted Wait for that operational data is sent to chip caching, to utilize the default convolution kernel, to waited for described in caching operational data into Row convolution algorithm, wherein the second default line number is greater than or equal to the convolution width of frame and is less than or equal to described first Default line number；

After the first row data after in operational data have participated in convolution algorithm, by the first row data from described It is deleted in chip caching, and extraction next line data are sent to the chip caching from corresponding column region, as waiting for operand According to last column data, update wait for operational data；

Using the default convolution kernel, wait for that operational data carries out convolution algorithm to newer, until the institute in the region There are row data to both participate in convolution algorithm, all operation results obtained after convolution algorithm are sent to the memory.

Optionally, described to obtain and according to chip buffer memory capacity, preset data amount and the first default line number, determine that data arrange The step of width, including：

Chip buffer memory capacity and preset data amount are obtained, the chip buffer memory capacity and the preset data amount are divided by, Obtain the maximum value of the data amount check of the chip caching；

The first default line number is obtained, the maximum value of the data amount check and the described first default line number are divided by, institute is obtained State the number of each row of data of chip caching；

Determine that the number of each row of data of the chip caching is data column width.

Optionally, described that pending data matrix is divided into ranks according to the data column width, obtain multiple row region The step of before, the method further includes：

The convolution width of frame is subtracted into preset value, obtains the width of overlapping region, wherein the overlapping region is any Data are arranged arranges the region overlapped with adjacent data；

Determine the width for including the overlapping region in the data column width.

Optionally, the reception data processing instructions extract the second default row for any column region in all areas Several waits for that operational data is sent to before the step of chip caches, and the method further includes：

For any column region in all areas, the first null is added before the first row data, and is arranged described The data of one null are 0；

The data added the second null after last column data, and second null is arranged are 0；

The second default line number of the extraction waits for that operational data is sent to the step of chip caches, including：

That the second default line number is extracted since first null waits for that operational data is sent to the chip caching.

Optionally, described after the first row data after in operational data have participated in convolution algorithm, by described first Row data are deleted from chip caching, and extraction next line data are sent to the chip caching from corresponding column region, As last column data for waiting for operational data, the step of waiting for operational data is updated, including：

When the first row data when in operational data have participated in convolution algorithm, by the first row data from described It is deleted in chip caching, and extraction next line data are sent to the chip caching from corresponding column region, as waiting for operand According to last column data, update wait for operational data；

Alternatively,

After having carried out any secondary convolution algorithm, the first row data are deleted from chip caching, and is deleting When except carrying out any secondary convolution algorithm after the first row data, extraction next line data are sent to described from corresponding column region Chip caches, and as last column data for waiting for operational data, update waits for operational data.

Optionally, described to utilize the default convolution kernel, wait for that operational data carries out convolution algorithm to newer, until described All Datarows in region have both participated in convolution algorithm, and all operation results obtained after convolution algorithm are sent in described The step of depositing, including：

Using the default convolution kernel, waits for that operational data carries out convolution algorithm to newer, obtain convolution results；

The convolution results are stored into the row of next convolutional layer identical with the columns of corresponding column region；

Next convolutional layer is sent to the memory.

Second aspect, an embodiment of the present invention provides a kind of data processing equipment, the equipment includes：

Main control unit for receiving data processing instructions, and sends control command to caching and computing unit is rolled, with control System rolls caching extraction data, control computing unit from memory and carries out convolution algorithm to the data of extraction；

Caching is rolled, for obtaining default convolution kernel, and determines the convolution width of frame of the default convolution kernel；Obtain simultaneously root According to chip buffer memory capacity, preset data amount and the first default line number, data column width is determined, wherein the data column width is big In or equal to the convolution width of frame；After receiving the control command that the main control unit is sent, for according to the data Any column region in the multiple row region that column width divides the pending data matrix in memory into ranks, extraction second Default line number waits for operational data；After the first row data after in operational data have participated in convolution algorithm, by described Data line is deleted, and extracts next line data from corresponding column region, as last column data for waiting for operational data, update Wait for operational data；

Computing unit, for receiving rolling caching send described in after operational data, using described default Convolution kernel, to it is described wait for operational data or it is newer wait for operational data carry out convolution algorithm, until the region in all rows Data have both participated in convolution algorithm, and all operation results obtained after convolution algorithm are sent to the memory.

Optionally, the rolling caching, is specifically additionally operable to：

Chip buffer memory capacity and preset data amount are obtained, the chip buffer memory capacity and the preset data amount are divided by, Obtain the maximum value of the data amount check for rolling caching；

The first default line number is obtained, the maximum value of the data amount check and the described first default line number are divided by, institute is obtained State the number for each row of data for rolling caching；

Determine that the number of each row of data for rolling caching is data column width.

Optionally, the rolling caching, is specifically additionally operable to：

Add before the first row data before extracting the first row data for any column region in all areas The data for adding the first null, and first null being arranged are 0；

Before extracting last column data, the second null is added after last column data, and is arranged described The data of two nulls are 0；

That the second default line number is extracted since first null waits for operational data.

Optionally, the rolling caching, is specifically additionally operable to：

When the first row data when in operational data have participated in convolution algorithm, the first row data are deleted, And next line data are extracted from corresponding column region, as last column data for waiting for operational data, update waits for operational data；

Alternatively,

Optionally, the computing unit, is specifically additionally operable to：

Using the default convolution kernel, to it is described wait for operational data or it is newer wait for that operational data carries out convolution algorithm, obtain To convolution results；

Next convolutional layer is sent to the memory.

The third aspect, an embodiment of the present invention provides a kind of data processing equipment, described device includes：

First determining module for obtaining default convolution kernel, and determines the convolution width of frame of the default convolution kernel；

Second determining module, for obtaining and according to chip buffer memory capacity, preset data amount and the first default line number, determining Data column width, wherein the data column width is greater than or equal to the convolution width of frame；

Division module obtains multiple row area for being divided into ranks to pending data matrix according to the data column width Domain, wherein the pending data matrix be storage in memory, the matrix that includes all pending datas；

Extraction module, for when receiving data processing instructions, for any column region in all column regions, extracting Second default line number waits for that operational data is sent to the chip caching, using the default convolution kernel, to being waited for described in caching Operational data carries out convolution algorithm, wherein the second default line number is greater than or equal to the convolution width of frame and is less than or waits In the described first default line number；

Update module, for after the first row data after in operational data have participated in convolution algorithm, by described Data line is deleted from chip caching, and to be sent to the chip slow for extraction next line data from corresponding column region It deposits, as last column data for waiting for operational data, update waits for operational data；

First computing module waits for that operational data carries out convolution algorithm, directly for utilizing the default convolution kernel to newer Convolution algorithm has been both participated in the All Datarows in the region, all operation results obtained after convolution algorithm have been sent to The memory.

Optionally, second determining module, including：

First operation submodule, for obtaining chip buffer memory capacity and preset data amount, by the chip buffer memory capacity with The preset data amount is divided by, and the maximum value of the data amount check of the chip caching is obtained；

Second operation submodule, for obtaining the first default line number, by the maximum value of the data amount check and described first Default line number is divided by, and the number of each row of data of the chip caching is obtained；

Determination sub-module, for determining that the number of each row of data of the chip caching is data column width.

Optionally, the data processing equipment further includes：

Second computing module obtains the width of overlapping region for the convolution width of frame to be subtracted preset value, wherein The overlapping region is that any data arranges the region overlapped with adjacent data row；

Third determining module, for determining the width for including the overlapping region in the data column width.

Optionally, the data processing equipment further includes：

First setup module, for for any column region in all areas, first to be added before the first row data Null, and the data that first null is arranged are 0；

For adding the second null after last column data, and second null is arranged in second setup module Data are 0；

The extraction module, is specifically additionally operable to：

Optionally, the update module, is specifically used for：

Alternatively,

Optionally, first computing module further includes：

Third operation submodule waits for that operational data carries out convolution algorithm for utilizing the default convolution kernel to newer, Obtain convolution results；

Sub-module stored, for storing the convolution results to next convolutional layer identical with the columns of corresponding column region Row in；

Sending submodule, for sending next convolutional layer to the memory.

A kind of data processing method, equipment and device provided in an embodiment of the present invention are being obtained from memory needed for operation When the data wanted, according to convolution kernel size, chip buffer memory capacity, need determined by the data volume and the first default line number that cache Data column width obtains a data of the row with the width from the pending data stored in memory, convolution is utilized every time The characteristic that data largely repeat in operation ensures to calculate the data that need to only obtain from memory and need to participate in operation every time, reduce To the wideband requirements of memory outside piece, to reduce chip when carrying out data processing generated power consumption, improve process performance.

Description of the drawings

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with Obtain other attached drawings according to these attached drawings.

Fig. 1 is a kind of flow diagram of the data processing method of the embodiment of the present invention；

Fig. 2 is the data row dividing mode schematic diagram of the application example of the embodiment of the present invention；

Fig. 3 is the flow diagram of the convolution algorithm of the application example of the embodiment of the present invention；

Fig. 4 is a kind of structural schematic diagram of the data processing equipment of the embodiment of the present invention；

Fig. 5 is a kind of structural schematic diagram of the data processing equipment of the embodiment of the present invention.

Specific implementation mode

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation describes, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

It is carried to reduce chip generated power consumption, raising process performance, embodiment of the present invention when carrying out data processing A kind of data processing method, equipment and device are supplied.

A kind of data processing method is provided for the embodiments of the invention first below to be introduced.

It should be noted that a kind of executive agent for data processing method that the embodiment of the present invention is provided can be one kind For the chip of data processing, such as DSP (Digital Signal Processor, digital signal processor), ARM (Advanced Reduced Instruction Set Computer Machines, Reduced Instruction Set Computer microprocessor) Or FPGA (Field-Programmable Gate Array, field programmable gate array) etc., or at a kind of data Controller is managed, currently, can also be the other equipment with data-handling capacity, be not construed as limiting here.Wherein, the present invention is realized A kind of mode for data processing method that embodiment is provided can be to be set at chip or data for data processing Manage software, hardware circuit and/or the logic circuit in controller.The application scenarios of the embodiment of the present invention can be image procossing, May be radar scanning, certainly, the application scenarios of other application convolutional neural networks are all suitable for the embodiment of the present invention.

As shown in Figure 1, a kind of data processing method that the embodiment of the present invention is provided, may include steps of：

S101 is obtained and is preset convolution kernel, and determines the convolution width of frame of default convolution kernel.

It should be noted that since the present embodiment is directed to convolutional neural networks, needs to carry out convolution algorithm, preset convolution kernel It can preset, can also be determined according to selected default operation strategy.Default operation strategy can be non-liner revision Any operation strategy of the convolutional neural networks such as activation, pond, the given convolution for carrying out convolution algorithm of each operation strategy Core can determine default convolution kernel then according to the default operation strategy of selection.

It is emphasized that the key for carrying out convolution algorithm is the selection of convolution operator i.e. coefficient matrix, this is Matrix number is exactly convolution kernel, and the width of this coefficient matrix is convolution width of frame, for example, 3 × 3 convolution kernel is often said, wherein 3 As convolution width of frame.

S102 is obtained and according to chip buffer memory capacity, preset data amount and the first default line number, is determined data column width.

Wherein, data column width is greater than or equal to convolution width of frame；Data row include multiple data；Chip includes using The caching of the data of operation is participated in storage；Preset data amount can need the feature for participating in the data of operation to obtain according to , or preset, preset data scale sign carries out the data volume cached required for a convolution algorithm.It needs Bright, when chip buffer memory capacity is very big, a column data of extraction can be cached comprising more row data, to be rolled up Product operation.

Optionally, described to obtain and according to chip buffer memory capacity, preset data amount and the first default line number, determine that data arrange The step of width may include：

First, chip buffer memory capacity and preset data amount are obtained, chip buffer memory capacity and preset data amount are divided by, obtained The maximum value of the data amount check of chip caching.

It should be noted that in the present embodiment, when extracting data from memory, need the pending number of required extraction According to the multiple row region being divided into there are overlapping region, only read data in a certain column region every time, the width of each column region by Chip buffer memory capacity, preset data amount and the first default line number codetermine, and according to chip buffer memory capacity and can preset first Data volume determines the maximum value of the data amount check of chip caching, that is, most numbers that chip can cache.For example, chip Buffer memory capacity is 100kB, and the data volume of each data is 256B, then the maximum value of the number of chip caching can be 400 Data.

Secondly, the first default line number is obtained, the maximum value of data amount check and the first default line number are divided by, it is slow to obtain chip The number for each row of data deposited.

Finally, determine that the number of each row of data of chip caching is data column width.

Wherein, the data column width includes the width of the overlapping region；First default line number can be in being stored in The line number of data in depositing, or preset extractible line number, the convolution frame width of default convolution kernel can also be equal to Degree.Specifically, the first default line number can be determined according to the buffer memory capacity of chip.It should be noted that determining chip caching Data amount check maximum value after, can utilize data amount check maximum value divided by the first default line number, obtain chip caching The number of each row of data, can be using the number of each row of data as data column width, for example, the number threshold value of chip caching is 400 A data, the first default line number are 4 rows, then data column width can be 100.

S103 divides pending data matrix into ranks according to data column width, obtains multiple row region.

Wherein, pending data matrix be storage in memory, the matrix that includes all pending datas.Normal conditions Under, what is stored in memory is initial data, such as in image processing system, and what is stored in memory can be the artwork of acquisition Picture；Can certainly be the data of the two dimensional form obtained after initial data is converted.It should be noted that being carried from memory Access according to when, if it is initial data in memory, the data of two dimensional form are first converted raw data into, then according to above-mentioned The pending data of required extraction is divided into multiple row region, only reads number in a certain column region every time by the data column width arrived According to.It, can be according to data column width in memory in order to reduce the data volume cached every time after obtaining data column width Pending data matrix is divided.Can be a row according to every 6 column data for example, if obtained data column width is 6 Region divides pending data matrix.

It is emphasized that in order to ensure to carry out the result of convolution algorithm to the data after division and to former pending data The result that matrix carries out convolution algorithm is completely the same, can enable the adjacent areas Liang Lie when being divided to pending data matrix There are certain overlapping regions between domain.

Optionally, the step of dividing into ranks to pending data matrix according to data column width, obtaining multiple row region Before, data processing method can also include：

First, convolution width of frame is subtracted into preset value, obtains the width of overlapping region.

Secondly, the width for including overlapping region in data column width is determined.

Wherein, overlapping region is that any data arranges the region overlapped with adjacent data row.Since overlapping region is adjacent The region that two data row overlap, and the first column data in each column region is only involved in the convolution algorithm of this column region, and Other column datas need to participate in the convolution algorithm of other column regions, and therefore, preset value takes 1 under normal circumstances.For example, in convolution frame When width is 3, the width of overlapping region is 2；When convolution width of frame is 5, the width of overlapping region is 4, and so on.It needs Illustrate, when overlapping region is to ensure that convolution algorithm is calculated to boundary, can still obtain required data, ensure result It is completely the same with normal convolution process.

S104, when receiving data processing instructions, for any column region in all column regions, extraction second is default Line number waits for that operational data is sent to chip caching, using default convolution kernel, waits for that operational data carries out convolution algorithm to caching.

Wherein, the second default line number is greater than or equal to convolution width of frame and is less than or equal to the first default line number.It needs Bright, data processing instructions are used for log-on data processing operation, can receive the data processing instructions of user's transmission, start to hold Row data processing increases the interaction of user and data processing, improves user experience.It is, of course, also possible to be collect it is original After data, data processing instructions are generated by acquisition module, starts data processing and is collecting artwork for example, in image procossing When picture, the storage state of memory storage original image is received, data processing is proceeded by.

It is emphasized that after receiving data processing instructions, from the pending data for having divided region in memory Data are extracted, since the spatial cache of chip is limited, when calculating data column width, the first default line number is equivalent to chip institute energy Cache the maximum number of lines with data column width, then when first time extracting data from memory, a certain column region that can extract Number of data lines be up to the first default line number, due to convolution algorithm to be carried out, then the number of data lines that is extracted is minimum default The convolution width of frame of convolution kernel.Operational data then can be treated according to the second default line number to extract, after first time extracts, Default convolution kernel can be directly utilized, operational data is treated and carries out convolution algorithm, the process of convolution algorithm belongs to the prior art, here It repeats no more.

Optionally, described when receiving data processing instructions, for any column region in all areas, extraction second Default line number waits for that operational data is sent to before the step of chip caches, and data processing method can also include：

First, for any column region in all areas, the first null is added before the first row data, and is arranged the The data of one null are 0；

Secondly, the data that the second null is added after last column data, and the second null is arranged are 0.

It is image for such as image it should be noted that for the first row and last column data of a column data Marginal points information, when extracting marginal points information, if directly the data of extraction the first row or last column are likely to result in Loss of data.Therefore, the null that interpolation data is 0 is distinguished before the first row data and after last column data, ensures to carry The integrality for evidence of fetching.

Optionally, it is described extraction the second default line number wait for operational data be sent to chip caching the step of, may include：

That the second default line number is extracted since the first null waits for that operational data is sent to chip caching.

It should be noted that convolutional neural networks need to sequence carries out convolution algorithm since the first row by convolution width of frame, Until data all in a column data of extraction complete convolution algorithm, the convolution algorithm of data row just terminates, a data The operation of row terminates to start the convolution algorithm of next data row again.

S105 delays the first row data from chip after having participated in convolution algorithm after the first row data in operational data Middle deletion is deposited, and extraction next line data are sent to chip caching from corresponding column region, as last for waiting for operational data Row data, update wait for operational data.

Wherein, the next line data in corresponding column region refer to undrawn the first row data in corresponding column region.It needs It is noted that when operational data carries out convolution algorithm, operand is waited for due to first time extraction to first time extraction According to line number be possible to be equal to convolution width of frame, it is also possible to be more than convolution width of frame.If waiting for that the line number of operational data is equal to Convolution width of frame can then delete the first row data after having carried out convolution algorithm, then the respective column extracted region from memory Next line data；If waiting for that the line number of operational data is more than convolution width of frame, convolution fortune can be involved in waiting for operational data After calculating, deleting the first row data for having participated in convolution algorithm, then respective column extracted region next line data from memory also may be used While to have participated in convolution algorithm in the first row data for waiting for operational data, the first row data for having participated in convolution algorithm are deleted It removes, then the respective column extracted region next line data from memory, is not specifically limited here.But certain needs are in the first row Data participated in convolution algorithm, by participated in convolution algorithm the first row data delete after, just can from memory respective column area Next line data are extracted in domain, otherwise can be more than the buffer memory capacity of chip.After receiving new data line, need according to capable Sequencing sets new data line to wait for that last column of operational data, update wait for operational data.

Optionally, described after having participated in convolution algorithm after the first row data in operational data, by the first row data from It is deleted in chip caching, and extraction next line data are sent to chip caching from corresponding column region, as waiting for operational data The step of last column data, update waits for operational data, may include：

When the first row data in operational data have participated in convolution algorithm, the first row data are deleted from chip caching It removes, and extraction next line data are sent to chip caching from corresponding column region, as last column data for waiting for operational data, Update waits for operational data.

It should be noted that in order to ensure the utilization rate of chip caching, the buffer memory capacity of chip is used to the greatest extent, and And in order to ensure that the efficiency of data operation, a specific implementation of the present embodiment can be：Waiting for the first of operational data While row data have participated in convolution algorithm, the first row data are deleted, and immediately under being extracted in the correspondence column region of memory Data line, update wait for operational data.For example, being extracted 5 row data (the 0th row of a certain column region to the 4th from memory Row data) it is stored in chip caching, treating operational data using 3 × 3 convolution kernel carries out convolution algorithm, in the 0th row data While having participated in convolution algorithm, the 0th row data are deleted from chip caching, and extracts from the correspondence column region of memory the 5 row data are sent to chip caching, and using the 5th row data as last column data, update waits for operational data.

Alternatively,

After having carried out any secondary convolution algorithm, the first row data are deleted from chip caching, and deleting the first row When carrying out any secondary convolution algorithm after data, extraction next line data are sent to chip caching from corresponding column region, as waiting for Last column data of operational data, update wait for operational data.

It should be noted that when the second default line number is more than convolution width of frame, another realization method of the present embodiment Can be：It is every carried out a convolution algorithm when, delete the first row data, but do not extract new data line immediately, can be with New data line is extracted in follow-up any convolution algorithm, for example, after All Datarows have been involved in convolution algorithm New data line is extracted again.Can also be：When having carried out a convolution algorithm, the first row data are not deleted immediately, it can be with The first row data are deleted after any secondary convolution algorithm after first time convolution algorithm, are appointed after deleting the first row data When convolution algorithm, next line data are extracted from corresponding column region.For example, the second default line number is 5 rows, then from memory The data for extracting a certain column region the 0th row to the 4th row are stored to chip and are cached, using 3 × 3 convolution kernel, first to the 0th row to the 2 row data carry out convolution algorithm can delete the 0th row data while having carried out convolution algorithm, extract in the column region 5th row data can also only delete the 0th row data, not extract new data line；The 1st row to the 3rd row data is rolled up again Product operation, can delete the 0th row and/or the 1st row data, extract the 5th row and/or the 6th row data in the column region, can also The 0th row and/or the 1st row data are only deleted, do not extract new data line；Convolution fortune is carried out to the 2nd row to the 4th row data again It calculates, deletes the 0th row and/or the 1st row and/or the 2nd row data, extract the 5th row and/or the 6th row and/or the 7th in the column region Row data.

S106 waits for that operational data carries out convolution algorithm, until all rows in region using default convolution kernel to newer Data have both participated in convolution algorithm, and all operation results obtained after convolution algorithm are sent to memory.

It should be noted that convolutional neural networks are exactly gradually to execute convolution algorithm to data using convolution kernel, through convolution The convolution results that operation obtains can be used as the input of convolution algorithm next time, can also be used as characteristic output as judge or Compare foundation.For example, in image procossing, what convolution algorithm obtained can be multiple attributive character of characteristics of image and image Set, output can be used as, the feature to obtain image carries out the judgements such as target detection.It is emphasized that usually will volume The operation result of product operation is stored into memory.

Optionally, using default convolution kernel, wait for that operational data carries out convolution algorithm to newer, and will be obtained after convolution algorithm To all operation results be sent to memory the step of, may include：

First, it using default convolution kernel, waits for that operational data carries out convolution algorithm to newer, obtains convolution results；

Secondly, convolution results are stored into the row of next convolutional layer identical with the columns of corresponding column region；

Finally, next convolutional layer is sent to memory.

It should be noted that storing to the operation knot that the data of memory are all column datas obtained by convolution algorithm Therefore fruit establishes operation result matrix according to columns of the current data for carrying out convolution algorithm in data row, stores operation knot Fruit, the operation result matrix are next convolutional layer.For example, the data of the 1st row carry out convolution algorithm, operation knot in arranging data Fruit stores to the first row of next convolutional layer.The process of convolution algorithm is the prior art, and which is not described herein again.

Using the present embodiment, when obtaining the required data of operation from memory, cached according to convolution kernel size, chip Data column width determined by capacity, the needs data volume and the first default line number that cache, every time from stored in memory wait for from The data for obtaining certain line number with the width in data are managed, the characteristic that data largely repeat in convolution algorithm is utilized, are protected Card calculates the data that need to be only obtained from memory and need to participate in operation every time, reduces the wideband requirements to memory outside piece, to Reduce chip generated power consumption, raising process performance when carrying out data processing；Since data are divided into data row extraction Carry out operation to caching, then it can also the big data of operational data amount using small caching；And by the setting of overlapping region, When ensureing convolutional calculation to boundary, required data can be still obtained, ensure that result and normal convolution process are completely the same.

With reference to specific application example, it is provided for the embodiments of the invention data processing method and is introduced.

The image that resolution ratio is 256 × 160 is stored in memory, the data volume of each data is 256B, then the image is total The memory space of occupancy is 256*256*160=100MB, selects the convolution algorithm of convolutional neural networks, and it is ruler to preset convolution kernel The very little convolution kernel for being 5 × 5, it is assumed that chip buffer memory capacity is 100kB.The mode that data row divide is carried out as shown in Fig. 2, due to volume The width of product frame is 5, then the width of overlapping region 202 is 2.And as it is assumed that chip buffer memory capacity is 100KB, each number According to data volume be 256B, then the size of convolution kernel be 5 × 5 in the case of, data column width 201 is set as 100* 1024/256/(5+1)≈66。

In order to reduce the number of data row, to reduce the number of repetition load, in this way, being directed to above-mentioned image, need anti- The data volume for being added with load isWherein, 256 be image Width, 66 be data column width, and 2 be the width of side overlapping region, and 160 be total line number of image, and 256B is a data Data volume.Therefore, under this condition, the image that the resolution ratio of 1 100MB is 256 × 160 is often handled, the data of 640KB are had It is loaded twice, invalid load data accounting is 640/ (100*1024)=0.63%.

By analysis, the 1st row are carried out to the data of the 66th row from the data for extracting former two dimensional form in memory for the first time Convolution algorithm；Second of data that the 65th row are arranged to the 130th from the data for extracting former two dimensional form in memory, carries out convolution fortune It calculates；It is 66 to press width successively, carries out the extraction of data row, carries out convolution algorithm.

Specifically, the process of extraction data row progress convolution algorithm is as shown in Figure 3 each time, it is assumed that default convolution kernel is ruler The very little convolution kernel for being 5 × 5 and the first default line number are equal to convolution width of frame, and 50 row data are shared in pending data matrix, have Steps are as follows for body：

The first step reads the 0th row 311, the 1st row 312, the 2nd row 313, the 3rd row 314 and the 4th of first data row 310 Data in four rows 315, wherein the 0th row 311 is null, 311 data of the 0th row is set as 0, to 311 to the 4th row of the 0th row 315 carry out convolution algorithms, the 1st arrange the 1st row by result is stored to next convolution layer data；While carrying out convolution algorithm, The data read in the 5th row 316 are stored to caching.

Second step reads the 1st row 312, the 2nd row 313, the 3rd row 314, the 4th row 315 and the 5th of first data row 310 Data in row 316 carry out convolution algorithm to 312 to the 5th row 316 of the 1st row, and result is stored to the of next convolution layer data 1 arranges the 2nd row；The data read simultaneously in the 6th row 317 are stored to caching.

Third walks, according to second step, according to each every line number for reading three row data and being successively read first data row 310 According to.

4th step, after the calculating of 47 rows, to the 47th row 318, the 48th row 319, the 49th row 3110, the 50th row 3111 and The data of 51st row 3112 carry out convolution algorithm, wherein the 51st row 3112 is null, sets the data in the 51st row 3112 to 0, the 1st the 50th row is arranged by result is stored to next convolution layer data.

5th step, be loaded into again the 0th row 321 of second data row 320, the 1st row 322, the 2nd row 323, the 3rd row 324 and Data in 4th row 325, wherein the 0th row 321 is null, and the data in the 0th row 321 are set as 0, to the 0th row 321 to the 4 rows 325 carry out convolution algorithm, the 2nd arrange the 1st row by result is stored to next convolution layer data；Carrying out the same of convolution algorithm When, the data read in the 5th row 326 are stored to caching.

6th step repeats the 4th step and the 5th step, until the 47th row 331, the 48th row to n-th data row 330 332, the data of the 49th row 333, the 50th row 334 and the 51st row 335 carry out convolution algorithm, wherein the 51st row 335 is null, will Data in 51st row 335 are set as 0, result are stored to the 50th row of Nth column of next convolution layer data, wherein Nth column is Last row of the data stored in memory.

Finally, obtained next convolutional layer is sent to memory.

Compared with prior art, in the present solution, when obtaining the required data of operation from memory, according to convolution kernel ruler Data column width determined by the data volume and the first default line number that very little, chip buffer memory capacity, needs cache, every time from memory A data of the row with the width are obtained in the pending data of storage, and the spy that data largely repeat in convolution algorithm is utilized Property, ensure to calculate the data that need to only obtain from memory and need to participate in operation every time, reduce the wideband requirements to memory outside piece, To reduce chip generated power consumption, raising process performance when carrying out data processing；Since data are divided into data row Extraction to caching carries out operation, then can also the big data of operational data amount using small caching；And pass through overlapping region Setting when ensureing convolutional calculation to boundary, can still obtain required data, ensure result and normal convolution process complete one It causes.

Corresponding to above-described embodiment, an embodiment of the present invention provides a kind of data processing equipments, as shown in figure 4, at data Managing equipment may include：

Main control unit 410 for receiving data processing instructions, and sends control command and extremely rolls caching and computing unit, Caching extraction data, control computing unit from memory are rolled with control, and convolution algorithm is carried out to the data of extraction；

Caching 420 is rolled, for obtaining default convolution kernel, and determines the convolution width of frame of the default convolution kernel；It obtains And according to chip buffer memory capacity, preset data amount and the first default line number, determine data column width, wherein the data col width Degree is greater than or equal to the convolution width of frame；After receiving the control command that the main control unit is sent, for according to described Any column region in the multiple row region that data column width divides the pending data matrix in memory into ranks, extraction Second default line number waits for operational data；After the first row data after in operational data have participated in convolution algorithm, by institute The deletion of the first row data is stated, and extracts next line data from corresponding column region, as last column data for waiting for operational data, Update waits for operational data；

Computing unit 430, for receiving rolling caching send described in after operational data, using described pre- If convolution kernel, to it is described wait for operational data or it is newer wait for that operational data carries out convolution algorithm, until all in the region Row data have both participated in convolution algorithm, and all operation results obtained after convolution algorithm are sent to the memory.

Using the present embodiment, when obtaining the required data of operation from memory, cached according to convolution kernel size, chip Data column width determined by capacity, the needs data volume and the first default line number that cache, every time from stored in memory wait for from It manages and obtains a data of the row with the width in data, the characteristic that data largely repeat in convolution algorithm is utilized, ensure each The data that need to be only obtained from memory and need to participate in operation are calculated, the wideband requirements to memory outside piece are reduced, to reduce core Piece generated power consumption, raising process performance when carrying out data processing；Since data are divided into the extraction of data row to caching Carry out operation, then it can also the big data of operational data amount using small caching；And by the setting of overlapping region, ensure volume When product is calculated to boundary, required data can be still obtained, ensure that result and normal convolution process are completely the same.

Optionally, the rolling caching 420, specifically can be also used for：

Alternatively,

Optionally, the computing unit 430, specifically can be also used for：

Next convolutional layer is sent to the memory.

It should be noted that the data processing equipment of the embodiment of the present invention is the equipment using above-mentioned data processing method, Then all embodiments of above-mentioned data processing method are suitable for the equipment, and can reach same or analogous advantageous effect.

Corresponding to above-described embodiment, an embodiment of the present invention provides a kind of data processing equipments, as shown in figure 5, at data Managing device may include：

First determining module 510 for obtaining default convolution kernel, and determines the convolution width of frame of the default convolution kernel；

Second determining module 520, for obtaining and according to chip buffer memory capacity, preset data amount and the first default line number, Determine data column width, wherein the data column width is greater than or equal to the convolution width of frame；

Division module 530, for for being divided into ranks to pending data matrix according to the data column width, obtaining Multiple row region, wherein the pending data matrix be storage in memory, the matrix that includes all pending datas；

Extraction module 540, for when receiving data processing instructions, for any column region in all column regions, That extracts the second default line number waits for that operational data is sent to the chip caching, using the default convolution kernel, to the institute of caching It states and waits for that operational data carries out convolution algorithm, wherein the second default line number is greater than or equal to the convolution width of frame and is less than Or it is equal to the described first default line number；

Update module 550 will be described for after the first row data after in operational data have participated in convolution algorithm The first row data are deleted from chip caching, and to be sent to the chip slow for extraction next line data from corresponding column region It deposits, as last column data for waiting for operational data, update waits for operational data；

First computing module 560 waits for that operational data carries out convolution fortune for utilizing the default convolution kernel to newer It calculates, until the All Datarows in the region have both participated in convolution algorithm, all operation results that will be obtained after convolution algorithm It is sent to the memory.

Optionally, second determining module 520 can also include：

Optionally, the data processing equipment can also include：

Optionally, the extraction module 540, specifically can be also used for：

Optionally, the update module 550, specifically can be used for：

Alternatively,

Optionally, first computing module 560 can also include：

Sending submodule, for sending next convolutional layer to the memory.

It should be noted that the data processing equipment of the embodiment of the present invention is the device using above-mentioned data processing method, Then all embodiments of above-mentioned data processing method are suitable for the device, and can reach same or analogous advantageous effect.

It is understood that data processing equipment can include simultaneously in another embodiment of the embodiment of the present invention：First Determining module 510, the second determining module 520, division module 530, extraction module 540, update module 550, the first computing module 560, the second computing module, third determining module, the first setup module, the second setup module.

It should be noted that herein, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also include other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.

Each embodiment in this specification is all made of relevant mode and describes, identical similar portion between each embodiment Point just to refer each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality For applying example, since it is substantially similar to the method embodiment, so description is fairly simple, related place is referring to embodiment of the method Part explanation.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention It is interior.

Claims

1. a kind of data processing method, which is characterized in that the method includes：

It obtains and according to chip buffer memory capacity, preset data amount and the first default line number, determines data column width, wherein described Data column width is greater than or equal to the convolution width of frame；

Pending data matrix is divided into ranks according to the data column width, obtains multiple row region, wherein described pending Data matrix be storage in memory, the matrix that includes all pending datas；

When receiving data processing instructions, for any column region in all column regions, the second default line number of extraction waits for Operational data is sent to the chip caching, using the default convolution kernel, to waiting for that operational data carries out convolution described in caching Operation, wherein the second default line number is greater than or equal to the convolution width of frame and is less than or equal to the described first default row Number；

After the first row data after in operational data have participated in convolution algorithm, by the first row data from the chip It is deleted in caching, and extraction next line data are sent to the chip caching from corresponding column region, as waiting for operational data Last column data, update wait for operational data；

Using the default convolution kernel, wait for that operational data carries out convolution algorithm to newer, until all rows in the region Data have both participated in convolution algorithm, and all operation results obtained after convolution algorithm are sent to the memory.

2. data processing method according to claim 1, which is characterized in that it is described acquisition and according to chip buffer memory capacity, Preset data amount and the first default line number, the step of determining data column width, including：

Chip buffer memory capacity and preset data amount are obtained, the chip buffer memory capacity and the preset data amount are divided by, obtained The maximum value of the data amount check of the chip caching；

The first default line number is obtained, the maximum value of the data amount check and the described first default line number are divided by, the core is obtained The number of each row of data of piece caching；

3. data processing method according to claim 1, which is characterized in that described to treat place according to the data column width Before managing the step of data matrix divides into ranks, obtains multiple row region, the method further includes：

The convolution width of frame is subtracted into preset value, obtains the width of overlapping region, wherein the overlapping region is any data It arranges and arranges the region overlapped with adjacent data；

4. data processing method according to claim 1, which is characterized in that it is described when receiving data processing instructions, For any column region in all areas, the second default line number of extraction waits for that operational data is sent to the step of the chip caching Before rapid, the method further includes：

For any column region in all areas, the first null is added before the first row data, and it is empty to be arranged described first Capable data are 0；

5. data processing method according to claim 1, which is characterized in that described in first waited in operational data After row data have participated in convolution algorithm, the first row data are deleted from chip caching, and from corresponding column region Extraction next line data are sent to the chip caching, and as last column data for waiting for operational data, update waits for operational data The step of, including：

When the first row data when in operational data have participated in convolution algorithm, by the first row data from the chip It is deleted in caching, and extraction next line data are sent to the chip caching from corresponding column region, as waiting for operational data Last column data, update wait for operational data；

Alternatively,

After having carried out any secondary convolution algorithm, the first row data are deleted from chip caching, and deleting institute It states when carrying out any secondary convolution algorithm after the first row data, extraction next line data are sent to the chip from corresponding column region Caching, as last column data for waiting for operational data, update waits for operational data.

6. data processing method according to claim 1, which is characterized in that it is described to utilize the default convolution kernel, to more New waits for that operational data carries out convolution algorithm, until the All Datarows in the region have both participated in convolution algorithm, by convolution The step of all operation results obtained after operation are sent to the memory, including：

Next convolutional layer is sent to the memory.

7. a kind of data processing equipment, which is characterized in that the equipment includes：

Main control unit for receiving data processing instructions, and sends control command to caching and computing unit is rolled, to control rolling Dynamic caching extracts data from memory, control computing unit carries out convolution algorithm to the data of extraction；

Caching is rolled, for obtaining default convolution kernel, and determines the convolution width of frame of the default convolution kernel；It obtains and according to core Piece buffer memory capacity, preset data amount and the first default line number, determine data column width, wherein the data column width be more than or Equal to the convolution width of frame；After receiving the control command that the main control unit is sent, for according to the data col width Any column region in the multiple row region divided into ranks to the pending data matrix in memory is spent, extraction second is default Line number waits for operational data；After the first row data after in operational data have participated in convolution algorithm, by the first row Data are deleted, and extract next line data from corresponding column region, and as last column data for waiting for operational data, update is to be shipped Count evidence；

Computing unit, for receive rolling caching send described in after operational data, utilize the default convolution Core, to it is described wait for operational data or it is newer wait for operational data carry out convolution algorithm, until the region in All Datarows Convolution algorithm has been both participated in, all operation results obtained after convolution algorithm are sent to the memory.

8. data processing equipment according to claim 7, which is characterized in that the rolling caching is specifically additionally operable to：

Chip buffer memory capacity and preset data amount are obtained, the chip buffer memory capacity and the preset data amount are divided by, obtained The maximum value of the data amount check for rolling caching；

The first default line number is obtained, the maximum value of the data amount check and the described first default line number are divided by, the rolling is obtained The number of each row of data of dynamic caching；

9. data processing equipment according to claim 7, which is characterized in that the rolling caching is specifically additionally operable to：

10. data processing equipment according to claim 7, which is characterized in that the rolling caching is specifically additionally operable to：

For any column region in all areas, before extracting the first row data, is added before the first row data One null, and the data that first null is arranged are 0；

Before extracting last column data, the second null is added after last column data, and it is empty to be arranged described second Capable data are 0；

11. data processing equipment according to claim 7, which is characterized in that the rolling caching is specifically additionally operable to：

When the first row data when in operational data have participated in convolution algorithm, the first row data are deleted, and from Next line data are extracted in corresponding column region, as last column data for waiting for operational data, update waits for operational data；

Alternatively,

12. data processing equipment according to claim 7, which is characterized in that the computing unit is specifically additionally operable to：

Using the default convolution kernel, to it is described wait for operational data or it is newer wait for that operational data carries out convolution algorithm, rolled up Product result；

Next convolutional layer is sent to the memory.

13. a kind of data processing equipment, which is characterized in that described device includes：

Division module is used to divide pending data matrix into ranks according to the data column width, obtains multiple row region, In, the pending data matrix be storage in memory, the matrix that includes all pending datas；

Extraction module, for when receiving data processing instructions, for any column region in all column regions, extracting second Default line number waits for that operational data is sent to the chip caching, using the default convolution kernel, to waiting for operation described in caching Data carry out convolution algorithm, wherein the second default line number is greater than or equal to the convolution width of frame and is less than or equal to institute State the first default line number；

Update module, for after the first row data after in operational data have participated in convolution algorithm, by the first row Data are deleted from chip caching, and extraction next line data are sent to the chip caching from corresponding column region, make To wait for that last column data of operational data, update wait for operational data；

First computing module waits for that operational data carries out convolution algorithm, until institute for utilizing the default convolution kernel to newer The All Datarows stated in region have both participated in convolution algorithm, all operation results obtained after convolution algorithm are sent to described Memory.

14. data processing equipment according to claim 13, which is characterized in that second determining module, including：

First operation submodule, for obtaining chip buffer memory capacity and preset data amount, by the chip buffer memory capacity with it is described Preset data amount is divided by, and the maximum value of the data amount check of the chip caching is obtained；

Second operation submodule presets the maximum value of the data amount check and described first for obtaining the first default line number Line number is divided by, and the number of each row of data of the chip caching is obtained；

15. data processing equipment according to claim 13, which is characterized in that the data processing equipment further includes：

Second computing module obtains the width of overlapping region, wherein described for the convolution width of frame to be subtracted preset value Overlapping region is that any data arranges the region overlapped with adjacent data row；

16. data processing equipment according to claim 13, which is characterized in that the data processing equipment further includes：

First setup module, for for any column region in all areas, the first null to be added before the first row data, And the data that first null is arranged are 0；

For adding the second null after last column data, and the data of second null are arranged in second setup module It is 0；

The extraction module, is specifically additionally operable to：

17. data processing equipment according to claim 13, which is characterized in that the update module is specifically used for：

Alternatively,

18. data processing equipment according to claim 13, which is characterized in that first computing module further includes：

Third operation submodule waits for that operational data carries out convolution algorithm to newer, obtains for utilizing the default convolution kernel Convolution results；

Sub-module stored, for storing the convolution results to the row of next convolutional layer identical with the columns of corresponding column region In；

Sending submodule, for sending next convolutional layer to the memory.