CN108573305A - A kind of data processing method, equipment and device - Google Patents
A kind of data processing method, equipment and device Download PDFInfo
- Publication number
- CN108573305A CN108573305A CN201710152660.5A CN201710152660A CN108573305A CN 108573305 A CN108573305 A CN 108573305A CN 201710152660 A CN201710152660 A CN 201710152660A CN 108573305 A CN108573305 A CN 108573305A
- Authority
- CN
- China
- Prior art keywords
- data
- row
- convolution
- caching
- chip
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7807—System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
- G06F15/781—On-chip cache; Off-chip memory
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Neurology (AREA)
- Artificial Intelligence (AREA)
- Microelectronics & Electronic Packaging (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
An embodiment of the present invention provides a kind of data processing method, equipment and devices, wherein data processing method includes:It obtains and presets convolution kernel, and determine convolution width of frame;It obtains and according to chip buffer memory capacity, preset data amount and the first default line number, determines data column width;Pending data matrix is divided according to data column width, obtains multiple row region;For any column region, the second default line number of extraction waits for that operational data is sent to chip caching, using default convolution kernel, treats operational data and carries out convolution algorithm;After having participated in convolution algorithm after the first row data in operational data, the first row data are deleted, and extract the next line data of corresponding column region, update waits for operational data;Wait for that operational data carries out convolution algorithm to newer, until the All Datarows in region have both participated in convolution algorithm.Chip generated power consumption, raising process performance when carrying out data processing can be reduced through the invention.
Description
Technical field
The present invention relates to chip design art fields, more particularly to a kind of data processing method, equipment and device.
Background technology
CNN (Convolutional Neural Network, convolutional neural networks) belongs to one in deep learning algorithm
Kind, it is a kind of algorithm of the working method realization data information extraction of simulation cerebral nerve network.The algorithm utilizes convolutional calculation
The preliminary extraction for completing information realizes high performance target detection in conjunction with some nonlinear operations.With deep learning algorithm
It continues to develop, CNN is widely applied in the image processing fields such as target detection, data classification, information extraction and matching.
Due to the characteristic of CNN algorithms itself, there are a large amount of data to need to repeat to participate in operation, thus to spatial cache in chip slapper have compared with
High requirement, the required all information of CNN operations, existing major part core could be stored by needing sufficiently large memory space
Piece is unable to reach the requirement for directly storing all required information.
The problem of all information needed can not be directly stored in piece for above-mentioned most of chip, the prior art proposes one
Kind CNN implementation methods, this method import all numbers needed for operation from memory again every time before carrying out convolution algorithm
According to being calculated.
There is the data blocks that the data largely reused, the prior art are imported from memory every time in being calculated due to CNN
In the case where there is a large amount of duplicate data, so as to cause thering is a large amount of bandwidth to be wasted in the process of reading so that
Chip produces larger power consumption when carrying out data processing, influences process performance.
Invention content
The embodiment of the present invention is designed to provide a kind of data processing method, equipment and device, with reduce chip into
Generated power consumption, raising process performance when row data processing.Specific technical solution is as follows:
In a first aspect, an embodiment of the present invention provides a kind of data processing method, the method includes:
It obtains and presets convolution kernel, and determine the convolution width of frame of the default convolution kernel;
It obtains and according to chip buffer memory capacity, preset data amount and the first default line number, determines data column width, wherein
The data column width is greater than or equal to the convolution width of frame;
Pending data matrix is divided into ranks according to the data column width, obtains multiple row region, wherein described to wait for
Processing data matrix be store in memory, the matrix that includes all pending datas;
When receiving data processing instructions, for any column region in all column regions, the second default line number is extracted
Wait for that operational data is sent to chip caching, to utilize the default convolution kernel, to waited for described in caching operational data into
Row convolution algorithm, wherein the second default line number is greater than or equal to the convolution width of frame and is less than or equal to described first
Default line number;
After the first row data after in operational data have participated in convolution algorithm, by the first row data from described
It is deleted in chip caching, and extraction next line data are sent to the chip caching from corresponding column region, as waiting for operand
According to last column data, update wait for operational data;
Using the default convolution kernel, wait for that operational data carries out convolution algorithm to newer, until the institute in the region
There are row data to both participate in convolution algorithm, all operation results obtained after convolution algorithm are sent to the memory.
Optionally, described to obtain and according to chip buffer memory capacity, preset data amount and the first default line number, determine that data arrange
The step of width, including:
Chip buffer memory capacity and preset data amount are obtained, the chip buffer memory capacity and the preset data amount are divided by,
Obtain the maximum value of the data amount check of the chip caching;
The first default line number is obtained, the maximum value of the data amount check and the described first default line number are divided by, institute is obtained
State the number of each row of data of chip caching;
Determine that the number of each row of data of the chip caching is data column width.
Optionally, described that pending data matrix is divided into ranks according to the data column width, obtain multiple row region
The step of before, the method further includes:
The convolution width of frame is subtracted into preset value, obtains the width of overlapping region, wherein the overlapping region is any
Data are arranged arranges the region overlapped with adjacent data;
Determine the width for including the overlapping region in the data column width.
Optionally, the reception data processing instructions extract the second default row for any column region in all areas
Several waits for that operational data is sent to before the step of chip caches, and the method further includes:
For any column region in all areas, the first null is added before the first row data, and is arranged described
The data of one null are 0;
The data added the second null after last column data, and second null is arranged are 0;
The second default line number of the extraction waits for that operational data is sent to the step of chip caches, including:
That the second default line number is extracted since first null waits for that operational data is sent to the chip caching.
Optionally, described after the first row data after in operational data have participated in convolution algorithm, by described first
Row data are deleted from chip caching, and extraction next line data are sent to the chip caching from corresponding column region,
As last column data for waiting for operational data, the step of waiting for operational data is updated, including:
When the first row data when in operational data have participated in convolution algorithm, by the first row data from described
It is deleted in chip caching, and extraction next line data are sent to the chip caching from corresponding column region, as waiting for operand
According to last column data, update wait for operational data;
Alternatively,
After having carried out any secondary convolution algorithm, the first row data are deleted from chip caching, and is deleting
When except carrying out any secondary convolution algorithm after the first row data, extraction next line data are sent to described from corresponding column region
Chip caches, and as last column data for waiting for operational data, update waits for operational data.
Optionally, described to utilize the default convolution kernel, wait for that operational data carries out convolution algorithm to newer, until described
All Datarows in region have both participated in convolution algorithm, and all operation results obtained after convolution algorithm are sent in described
The step of depositing, including:
Using the default convolution kernel, waits for that operational data carries out convolution algorithm to newer, obtain convolution results;
The convolution results are stored into the row of next convolutional layer identical with the columns of corresponding column region;
Next convolutional layer is sent to the memory.
Second aspect, an embodiment of the present invention provides a kind of data processing equipment, the equipment includes:
Main control unit for receiving data processing instructions, and sends control command to caching and computing unit is rolled, with control
System rolls caching extraction data, control computing unit from memory and carries out convolution algorithm to the data of extraction;
Caching is rolled, for obtaining default convolution kernel, and determines the convolution width of frame of the default convolution kernel;Obtain simultaneously root
According to chip buffer memory capacity, preset data amount and the first default line number, data column width is determined, wherein the data column width is big
In or equal to the convolution width of frame;After receiving the control command that the main control unit is sent, for according to the data
Any column region in the multiple row region that column width divides the pending data matrix in memory into ranks, extraction second
Default line number waits for operational data;After the first row data after in operational data have participated in convolution algorithm, by described
Data line is deleted, and extracts next line data from corresponding column region, as last column data for waiting for operational data, update
Wait for operational data;
Computing unit, for receiving rolling caching send described in after operational data, using described default
Convolution kernel, to it is described wait for operational data or it is newer wait for operational data carry out convolution algorithm, until the region in all rows
Data have both participated in convolution algorithm, and all operation results obtained after convolution algorithm are sent to the memory.
Optionally, the rolling caching, is specifically additionally operable to:
Chip buffer memory capacity and preset data amount are obtained, the chip buffer memory capacity and the preset data amount are divided by,
Obtain the maximum value of the data amount check for rolling caching;
The first default line number is obtained, the maximum value of the data amount check and the described first default line number are divided by, institute is obtained
State the number for each row of data for rolling caching;
Determine that the number of each row of data for rolling caching is data column width.
Optionally, the rolling caching, is specifically additionally operable to:
The convolution width of frame is subtracted into preset value, obtains the width of overlapping region, wherein the overlapping region is any
Data are arranged arranges the region overlapped with adjacent data;
Determine the width for including the overlapping region in the data column width.
Optionally, the rolling caching, is specifically additionally operable to:
Add before the first row data before extracting the first row data for any column region in all areas
The data for adding the first null, and first null being arranged are 0;
Before extracting last column data, the second null is added after last column data, and is arranged described
The data of two nulls are 0;
That the second default line number is extracted since first null waits for operational data.
Optionally, the rolling caching, is specifically additionally operable to:
When the first row data when in operational data have participated in convolution algorithm, the first row data are deleted,
And next line data are extracted from corresponding column region, as last column data for waiting for operational data, update waits for operational data;
Alternatively,
After having carried out any secondary convolution algorithm, the first row data are deleted from chip caching, and is deleting
When except carrying out any secondary convolution algorithm after the first row data, extraction next line data are sent to described from corresponding column region
Chip caches, and as last column data for waiting for operational data, update waits for operational data.
Optionally, the computing unit, is specifically additionally operable to:
Using the default convolution kernel, to it is described wait for operational data or it is newer wait for that operational data carries out convolution algorithm, obtain
To convolution results;
The convolution results are stored into the row of next convolutional layer identical with the columns of corresponding column region;
Next convolutional layer is sent to the memory.
The third aspect, an embodiment of the present invention provides a kind of data processing equipment, described device includes:
First determining module for obtaining default convolution kernel, and determines the convolution width of frame of the default convolution kernel;
Second determining module, for obtaining and according to chip buffer memory capacity, preset data amount and the first default line number, determining
Data column width, wherein the data column width is greater than or equal to the convolution width of frame;
Division module obtains multiple row area for being divided into ranks to pending data matrix according to the data column width
Domain, wherein the pending data matrix be storage in memory, the matrix that includes all pending datas;
Extraction module, for when receiving data processing instructions, for any column region in all column regions, extracting
Second default line number waits for that operational data is sent to the chip caching, using the default convolution kernel, to being waited for described in caching
Operational data carries out convolution algorithm, wherein the second default line number is greater than or equal to the convolution width of frame and is less than or waits
In the described first default line number;
Update module, for after the first row data after in operational data have participated in convolution algorithm, by described
Data line is deleted from chip caching, and to be sent to the chip slow for extraction next line data from corresponding column region
It deposits, as last column data for waiting for operational data, update waits for operational data;
First computing module waits for that operational data carries out convolution algorithm, directly for utilizing the default convolution kernel to newer
Convolution algorithm has been both participated in the All Datarows in the region, all operation results obtained after convolution algorithm have been sent to
The memory.
Optionally, second determining module, including:
First operation submodule, for obtaining chip buffer memory capacity and preset data amount, by the chip buffer memory capacity with
The preset data amount is divided by, and the maximum value of the data amount check of the chip caching is obtained;
Second operation submodule, for obtaining the first default line number, by the maximum value of the data amount check and described first
Default line number is divided by, and the number of each row of data of the chip caching is obtained;
Determination sub-module, for determining that the number of each row of data of the chip caching is data column width.
Optionally, the data processing equipment further includes:
Second computing module obtains the width of overlapping region for the convolution width of frame to be subtracted preset value, wherein
The overlapping region is that any data arranges the region overlapped with adjacent data row;
Third determining module, for determining the width for including the overlapping region in the data column width.
Optionally, the data processing equipment further includes:
First setup module, for for any column region in all areas, first to be added before the first row data
Null, and the data that first null is arranged are 0;
For adding the second null after last column data, and second null is arranged in second setup module
Data are 0;
The extraction module, is specifically additionally operable to:
That the second default line number is extracted since first null waits for that operational data is sent to the chip caching.
Optionally, the update module, is specifically used for:
When the first row data when in operational data have participated in convolution algorithm, by the first row data from described
It is deleted in chip caching, and extraction next line data are sent to the chip caching from corresponding column region, as waiting for operand
According to last column data, update wait for operational data;
Alternatively,
After having carried out any secondary convolution algorithm, the first row data are deleted from chip caching, and is deleting
When except carrying out any secondary convolution algorithm after the first row data, extraction next line data are sent to described from corresponding column region
Chip caches, and as last column data for waiting for operational data, update waits for operational data.
Optionally, first computing module further includes:
Third operation submodule waits for that operational data carries out convolution algorithm for utilizing the default convolution kernel to newer,
Obtain convolution results;
Sub-module stored, for storing the convolution results to next convolutional layer identical with the columns of corresponding column region
Row in;
Sending submodule, for sending next convolutional layer to the memory.
A kind of data processing method, equipment and device provided in an embodiment of the present invention are being obtained from memory needed for operation
When the data wanted, according to convolution kernel size, chip buffer memory capacity, need determined by the data volume and the first default line number that cache
Data column width obtains a data of the row with the width from the pending data stored in memory, convolution is utilized every time
The characteristic that data largely repeat in operation ensures to calculate the data that need to only obtain from memory and need to participate in operation every time, reduce
To the wideband requirements of memory outside piece, to reduce chip when carrying out data processing generated power consumption, improve process performance.
Description of the drawings
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
Obtain other attached drawings according to these attached drawings.
Fig. 1 is a kind of flow diagram of the data processing method of the embodiment of the present invention;
Fig. 2 is the data row dividing mode schematic diagram of the application example of the embodiment of the present invention;
Fig. 3 is the flow diagram of the convolution algorithm of the application example of the embodiment of the present invention;
Fig. 4 is a kind of structural schematic diagram of the data processing equipment of the embodiment of the present invention;
Fig. 5 is a kind of structural schematic diagram of the data processing equipment of the embodiment of the present invention.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation describes, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
It is carried to reduce chip generated power consumption, raising process performance, embodiment of the present invention when carrying out data processing
A kind of data processing method, equipment and device are supplied.
A kind of data processing method is provided for the embodiments of the invention first below to be introduced.
It should be noted that a kind of executive agent for data processing method that the embodiment of the present invention is provided can be one kind
For the chip of data processing, such as DSP (Digital Signal Processor, digital signal processor), ARM
(Advanced Reduced Instruction Set Computer Machines, Reduced Instruction Set Computer microprocessor)
Or FPGA (Field-Programmable Gate Array, field programmable gate array) etc., or at a kind of data
Controller is managed, currently, can also be the other equipment with data-handling capacity, be not construed as limiting here.Wherein, the present invention is realized
A kind of mode for data processing method that embodiment is provided can be to be set at chip or data for data processing
Manage software, hardware circuit and/or the logic circuit in controller.The application scenarios of the embodiment of the present invention can be image procossing,
May be radar scanning, certainly, the application scenarios of other application convolutional neural networks are all suitable for the embodiment of the present invention.
As shown in Figure 1, a kind of data processing method that the embodiment of the present invention is provided, may include steps of:
S101 is obtained and is preset convolution kernel, and determines the convolution width of frame of default convolution kernel.
It should be noted that since the present embodiment is directed to convolutional neural networks, needs to carry out convolution algorithm, preset convolution kernel
It can preset, can also be determined according to selected default operation strategy.Default operation strategy can be non-liner revision
Any operation strategy of the convolutional neural networks such as activation, pond, the given convolution for carrying out convolution algorithm of each operation strategy
Core can determine default convolution kernel then according to the default operation strategy of selection.
It is emphasized that the key for carrying out convolution algorithm is the selection of convolution operator i.e. coefficient matrix, this is
Matrix number is exactly convolution kernel, and the width of this coefficient matrix is convolution width of frame, for example, 3 × 3 convolution kernel is often said, wherein 3
As convolution width of frame.
S102 is obtained and according to chip buffer memory capacity, preset data amount and the first default line number, is determined data column width.
Wherein, data column width is greater than or equal to convolution width of frame;Data row include multiple data;Chip includes using
The caching of the data of operation is participated in storage;Preset data amount can need the feature for participating in the data of operation to obtain according to
, or preset, preset data scale sign carries out the data volume cached required for a convolution algorithm.It needs
Bright, when chip buffer memory capacity is very big, a column data of extraction can be cached comprising more row data, to be rolled up
Product operation.
Optionally, described to obtain and according to chip buffer memory capacity, preset data amount and the first default line number, determine that data arrange
The step of width may include:
First, chip buffer memory capacity and preset data amount are obtained, chip buffer memory capacity and preset data amount are divided by, obtained
The maximum value of the data amount check of chip caching.
It should be noted that in the present embodiment, when extracting data from memory, need the pending number of required extraction
According to the multiple row region being divided into there are overlapping region, only read data in a certain column region every time, the width of each column region by
Chip buffer memory capacity, preset data amount and the first default line number codetermine, and according to chip buffer memory capacity and can preset first
Data volume determines the maximum value of the data amount check of chip caching, that is, most numbers that chip can cache.For example, chip
Buffer memory capacity is 100kB, and the data volume of each data is 256B, then the maximum value of the number of chip caching can be 400
Data.
Secondly, the first default line number is obtained, the maximum value of data amount check and the first default line number are divided by, it is slow to obtain chip
The number for each row of data deposited.
Finally, determine that the number of each row of data of chip caching is data column width.
Wherein, the data column width includes the width of the overlapping region;First default line number can be in being stored in
The line number of data in depositing, or preset extractible line number, the convolution frame width of default convolution kernel can also be equal to
Degree.Specifically, the first default line number can be determined according to the buffer memory capacity of chip.It should be noted that determining chip caching
Data amount check maximum value after, can utilize data amount check maximum value divided by the first default line number, obtain chip caching
The number of each row of data, can be using the number of each row of data as data column width, for example, the number threshold value of chip caching is 400
A data, the first default line number are 4 rows, then data column width can be 100.
S103 divides pending data matrix into ranks according to data column width, obtains multiple row region.
Wherein, pending data matrix be storage in memory, the matrix that includes all pending datas.Normal conditions
Under, what is stored in memory is initial data, such as in image processing system, and what is stored in memory can be the artwork of acquisition
Picture;Can certainly be the data of the two dimensional form obtained after initial data is converted.It should be noted that being carried from memory
Access according to when, if it is initial data in memory, the data of two dimensional form are first converted raw data into, then according to above-mentioned
The pending data of required extraction is divided into multiple row region, only reads number in a certain column region every time by the data column width arrived
According to.It, can be according to data column width in memory in order to reduce the data volume cached every time after obtaining data column width
Pending data matrix is divided.Can be a row according to every 6 column data for example, if obtained data column width is 6
Region divides pending data matrix.
It is emphasized that in order to ensure to carry out the result of convolution algorithm to the data after division and to former pending data
The result that matrix carries out convolution algorithm is completely the same, can enable the adjacent areas Liang Lie when being divided to pending data matrix
There are certain overlapping regions between domain.
Optionally, the step of dividing into ranks to pending data matrix according to data column width, obtaining multiple row region
Before, data processing method can also include:
First, convolution width of frame is subtracted into preset value, obtains the width of overlapping region.
Secondly, the width for including overlapping region in data column width is determined.
Wherein, overlapping region is that any data arranges the region overlapped with adjacent data row.Since overlapping region is adjacent
The region that two data row overlap, and the first column data in each column region is only involved in the convolution algorithm of this column region, and
Other column datas need to participate in the convolution algorithm of other column regions, and therefore, preset value takes 1 under normal circumstances.For example, in convolution frame
When width is 3, the width of overlapping region is 2;When convolution width of frame is 5, the width of overlapping region is 4, and so on.It needs
Illustrate, when overlapping region is to ensure that convolution algorithm is calculated to boundary, can still obtain required data, ensure result
It is completely the same with normal convolution process.
S104, when receiving data processing instructions, for any column region in all column regions, extraction second is default
Line number waits for that operational data is sent to chip caching, using default convolution kernel, waits for that operational data carries out convolution algorithm to caching.
Wherein, the second default line number is greater than or equal to convolution width of frame and is less than or equal to the first default line number.It needs
Bright, data processing instructions are used for log-on data processing operation, can receive the data processing instructions of user's transmission, start to hold
Row data processing increases the interaction of user and data processing, improves user experience.It is, of course, also possible to be collect it is original
After data, data processing instructions are generated by acquisition module, starts data processing and is collecting artwork for example, in image procossing
When picture, the storage state of memory storage original image is received, data processing is proceeded by.
It is emphasized that after receiving data processing instructions, from the pending data for having divided region in memory
Data are extracted, since the spatial cache of chip is limited, when calculating data column width, the first default line number is equivalent to chip institute energy
Cache the maximum number of lines with data column width, then when first time extracting data from memory, a certain column region that can extract
Number of data lines be up to the first default line number, due to convolution algorithm to be carried out, then the number of data lines that is extracted is minimum default
The convolution width of frame of convolution kernel.Operational data then can be treated according to the second default line number to extract, after first time extracts,
Default convolution kernel can be directly utilized, operational data is treated and carries out convolution algorithm, the process of convolution algorithm belongs to the prior art, here
It repeats no more.
Optionally, described when receiving data processing instructions, for any column region in all areas, extraction second
Default line number waits for that operational data is sent to before the step of chip caches, and data processing method can also include:
First, for any column region in all areas, the first null is added before the first row data, and is arranged the
The data of one null are 0;
Secondly, the data that the second null is added after last column data, and the second null is arranged are 0.
It is image for such as image it should be noted that for the first row and last column data of a column data
Marginal points information, when extracting marginal points information, if directly the data of extraction the first row or last column are likely to result in
Loss of data.Therefore, the null that interpolation data is 0 is distinguished before the first row data and after last column data, ensures to carry
The integrality for evidence of fetching.
Optionally, it is described extraction the second default line number wait for operational data be sent to chip caching the step of, may include:
That the second default line number is extracted since the first null waits for that operational data is sent to chip caching.
It should be noted that convolutional neural networks need to sequence carries out convolution algorithm since the first row by convolution width of frame,
Until data all in a column data of extraction complete convolution algorithm, the convolution algorithm of data row just terminates, a data
The operation of row terminates to start the convolution algorithm of next data row again.
S105 delays the first row data from chip after having participated in convolution algorithm after the first row data in operational data
Middle deletion is deposited, and extraction next line data are sent to chip caching from corresponding column region, as last for waiting for operational data
Row data, update wait for operational data.
Wherein, the next line data in corresponding column region refer to undrawn the first row data in corresponding column region.It needs
It is noted that when operational data carries out convolution algorithm, operand is waited for due to first time extraction to first time extraction
According to line number be possible to be equal to convolution width of frame, it is also possible to be more than convolution width of frame.If waiting for that the line number of operational data is equal to
Convolution width of frame can then delete the first row data after having carried out convolution algorithm, then the respective column extracted region from memory
Next line data;If waiting for that the line number of operational data is more than convolution width of frame, convolution fortune can be involved in waiting for operational data
After calculating, deleting the first row data for having participated in convolution algorithm, then respective column extracted region next line data from memory also may be used
While to have participated in convolution algorithm in the first row data for waiting for operational data, the first row data for having participated in convolution algorithm are deleted
It removes, then the respective column extracted region next line data from memory, is not specifically limited here.But certain needs are in the first row
Data participated in convolution algorithm, by participated in convolution algorithm the first row data delete after, just can from memory respective column area
Next line data are extracted in domain, otherwise can be more than the buffer memory capacity of chip.After receiving new data line, need according to capable
Sequencing sets new data line to wait for that last column of operational data, update wait for operational data.
Optionally, described after having participated in convolution algorithm after the first row data in operational data, by the first row data from
It is deleted in chip caching, and extraction next line data are sent to chip caching from corresponding column region, as waiting for operational data
The step of last column data, update waits for operational data, may include:
When the first row data in operational data have participated in convolution algorithm, the first row data are deleted from chip caching
It removes, and extraction next line data are sent to chip caching from corresponding column region, as last column data for waiting for operational data,
Update waits for operational data.
It should be noted that in order to ensure the utilization rate of chip caching, the buffer memory capacity of chip is used to the greatest extent, and
And in order to ensure that the efficiency of data operation, a specific implementation of the present embodiment can be:Waiting for the first of operational data
While row data have participated in convolution algorithm, the first row data are deleted, and immediately under being extracted in the correspondence column region of memory
Data line, update wait for operational data.For example, being extracted 5 row data (the 0th row of a certain column region to the 4th from memory
Row data) it is stored in chip caching, treating operational data using 3 × 3 convolution kernel carries out convolution algorithm, in the 0th row data
While having participated in convolution algorithm, the 0th row data are deleted from chip caching, and extracts from the correspondence column region of memory the
5 row data are sent to chip caching, and using the 5th row data as last column data, update waits for operational data.
Alternatively,
After having carried out any secondary convolution algorithm, the first row data are deleted from chip caching, and deleting the first row
When carrying out any secondary convolution algorithm after data, extraction next line data are sent to chip caching from corresponding column region, as waiting for
Last column data of operational data, update wait for operational data.
It should be noted that when the second default line number is more than convolution width of frame, another realization method of the present embodiment
Can be:It is every carried out a convolution algorithm when, delete the first row data, but do not extract new data line immediately, can be with
New data line is extracted in follow-up any convolution algorithm, for example, after All Datarows have been involved in convolution algorithm
New data line is extracted again.Can also be:When having carried out a convolution algorithm, the first row data are not deleted immediately, it can be with
The first row data are deleted after any secondary convolution algorithm after first time convolution algorithm, are appointed after deleting the first row data
When convolution algorithm, next line data are extracted from corresponding column region.For example, the second default line number is 5 rows, then from memory
The data for extracting a certain column region the 0th row to the 4th row are stored to chip and are cached, using 3 × 3 convolution kernel, first to the 0th row to the
2 row data carry out convolution algorithm can delete the 0th row data while having carried out convolution algorithm, extract in the column region
5th row data can also only delete the 0th row data, not extract new data line;The 1st row to the 3rd row data is rolled up again
Product operation, can delete the 0th row and/or the 1st row data, extract the 5th row and/or the 6th row data in the column region, can also
The 0th row and/or the 1st row data are only deleted, do not extract new data line;Convolution fortune is carried out to the 2nd row to the 4th row data again
It calculates, deletes the 0th row and/or the 1st row and/or the 2nd row data, extract the 5th row and/or the 6th row and/or the 7th in the column region
Row data.
S106 waits for that operational data carries out convolution algorithm, until all rows in region using default convolution kernel to newer
Data have both participated in convolution algorithm, and all operation results obtained after convolution algorithm are sent to memory.
It should be noted that convolutional neural networks are exactly gradually to execute convolution algorithm to data using convolution kernel, through convolution
The convolution results that operation obtains can be used as the input of convolution algorithm next time, can also be used as characteristic output as judge or
Compare foundation.For example, in image procossing, what convolution algorithm obtained can be multiple attributive character of characteristics of image and image
Set, output can be used as, the feature to obtain image carries out the judgements such as target detection.It is emphasized that usually will volume
The operation result of product operation is stored into memory.
Optionally, using default convolution kernel, wait for that operational data carries out convolution algorithm to newer, and will be obtained after convolution algorithm
To all operation results be sent to memory the step of, may include:
First, it using default convolution kernel, waits for that operational data carries out convolution algorithm to newer, obtains convolution results;
Secondly, convolution results are stored into the row of next convolutional layer identical with the columns of corresponding column region;
Finally, next convolutional layer is sent to memory.
It should be noted that storing to the operation knot that the data of memory are all column datas obtained by convolution algorithm
Therefore fruit establishes operation result matrix according to columns of the current data for carrying out convolution algorithm in data row, stores operation knot
Fruit, the operation result matrix are next convolutional layer.For example, the data of the 1st row carry out convolution algorithm, operation knot in arranging data
Fruit stores to the first row of next convolutional layer.The process of convolution algorithm is the prior art, and which is not described herein again.
Using the present embodiment, when obtaining the required data of operation from memory, cached according to convolution kernel size, chip
Data column width determined by capacity, the needs data volume and the first default line number that cache, every time from stored in memory wait for from
The data for obtaining certain line number with the width in data are managed, the characteristic that data largely repeat in convolution algorithm is utilized, are protected
Card calculates the data that need to be only obtained from memory and need to participate in operation every time, reduces the wideband requirements to memory outside piece, to
Reduce chip generated power consumption, raising process performance when carrying out data processing;Since data are divided into data row extraction
Carry out operation to caching, then it can also the big data of operational data amount using small caching;And by the setting of overlapping region,
When ensureing convolutional calculation to boundary, required data can be still obtained, ensure that result and normal convolution process are completely the same.
With reference to specific application example, it is provided for the embodiments of the invention data processing method and is introduced.
The image that resolution ratio is 256 × 160 is stored in memory, the data volume of each data is 256B, then the image is total
The memory space of occupancy is 256*256*160=100MB, selects the convolution algorithm of convolutional neural networks, and it is ruler to preset convolution kernel
The very little convolution kernel for being 5 × 5, it is assumed that chip buffer memory capacity is 100kB.The mode that data row divide is carried out as shown in Fig. 2, due to volume
The width of product frame is 5, then the width of overlapping region 202 is 2.And as it is assumed that chip buffer memory capacity is 100KB, each number
According to data volume be 256B, then the size of convolution kernel be 5 × 5 in the case of, data column width 201 is set as 100*
1024/256/(5+1)≈66。
In order to reduce the number of data row, to reduce the number of repetition load, in this way, being directed to above-mentioned image, need anti-
The data volume for being added with load isWherein, 256 be image
Width, 66 be data column width, and 2 be the width of side overlapping region, and 160 be total line number of image, and 256B is a data
Data volume.Therefore, under this condition, the image that the resolution ratio of 1 100MB is 256 × 160 is often handled, the data of 640KB are had
It is loaded twice, invalid load data accounting is 640/ (100*1024)=0.63%.
By analysis, the 1st row are carried out to the data of the 66th row from the data for extracting former two dimensional form in memory for the first time
Convolution algorithm;Second of data that the 65th row are arranged to the 130th from the data for extracting former two dimensional form in memory, carries out convolution fortune
It calculates;It is 66 to press width successively, carries out the extraction of data row, carries out convolution algorithm.
Specifically, the process of extraction data row progress convolution algorithm is as shown in Figure 3 each time, it is assumed that default convolution kernel is ruler
The very little convolution kernel for being 5 × 5 and the first default line number are equal to convolution width of frame, and 50 row data are shared in pending data matrix, have
Steps are as follows for body:
The first step reads the 0th row 311, the 1st row 312, the 2nd row 313, the 3rd row 314 and the 4th of first data row 310
Data in four rows 315, wherein the 0th row 311 is null, 311 data of the 0th row is set as 0, to 311 to the 4th row of the 0th row
315 carry out convolution algorithms, the 1st arrange the 1st row by result is stored to next convolution layer data;While carrying out convolution algorithm,
The data read in the 5th row 316 are stored to caching.
Second step reads the 1st row 312, the 2nd row 313, the 3rd row 314, the 4th row 315 and the 5th of first data row 310
Data in row 316 carry out convolution algorithm to 312 to the 5th row 316 of the 1st row, and result is stored to the of next convolution layer data
1 arranges the 2nd row;The data read simultaneously in the 6th row 317 are stored to caching.
Third walks, according to second step, according to each every line number for reading three row data and being successively read first data row 310
According to.
4th step, after the calculating of 47 rows, to the 47th row 318, the 48th row 319, the 49th row 3110, the 50th row 3111 and
The data of 51st row 3112 carry out convolution algorithm, wherein the 51st row 3112 is null, sets the data in the 51st row 3112 to
0, the 1st the 50th row is arranged by result is stored to next convolution layer data.
5th step, be loaded into again the 0th row 321 of second data row 320, the 1st row 322, the 2nd row 323, the 3rd row 324 and
Data in 4th row 325, wherein the 0th row 321 is null, and the data in the 0th row 321 are set as 0, to the 0th row 321 to the
4 rows 325 carry out convolution algorithm, the 2nd arrange the 1st row by result is stored to next convolution layer data;Carrying out the same of convolution algorithm
When, the data read in the 5th row 326 are stored to caching.
6th step repeats the 4th step and the 5th step, until the 47th row 331, the 48th row to n-th data row 330
332, the data of the 49th row 333, the 50th row 334 and the 51st row 335 carry out convolution algorithm, wherein the 51st row 335 is null, will
Data in 51st row 335 are set as 0, result are stored to the 50th row of Nth column of next convolution layer data, wherein Nth column is
Last row of the data stored in memory.
Finally, obtained next convolutional layer is sent to memory.
Compared with prior art, in the present solution, when obtaining the required data of operation from memory, according to convolution kernel ruler
Data column width determined by the data volume and the first default line number that very little, chip buffer memory capacity, needs cache, every time from memory
A data of the row with the width are obtained in the pending data of storage, and the spy that data largely repeat in convolution algorithm is utilized
Property, ensure to calculate the data that need to only obtain from memory and need to participate in operation every time, reduce the wideband requirements to memory outside piece,
To reduce chip generated power consumption, raising process performance when carrying out data processing;Since data are divided into data row
Extraction to caching carries out operation, then can also the big data of operational data amount using small caching;And pass through overlapping region
Setting when ensureing convolutional calculation to boundary, can still obtain required data, ensure result and normal convolution process complete one
It causes.
Corresponding to above-described embodiment, an embodiment of the present invention provides a kind of data processing equipments, as shown in figure 4, at data
Managing equipment may include:
Main control unit 410 for receiving data processing instructions, and sends control command and extremely rolls caching and computing unit,
Caching extraction data, control computing unit from memory are rolled with control, and convolution algorithm is carried out to the data of extraction;
Caching 420 is rolled, for obtaining default convolution kernel, and determines the convolution width of frame of the default convolution kernel;It obtains
And according to chip buffer memory capacity, preset data amount and the first default line number, determine data column width, wherein the data col width
Degree is greater than or equal to the convolution width of frame;After receiving the control command that the main control unit is sent, for according to described
Any column region in the multiple row region that data column width divides the pending data matrix in memory into ranks, extraction
Second default line number waits for operational data;After the first row data after in operational data have participated in convolution algorithm, by institute
The deletion of the first row data is stated, and extracts next line data from corresponding column region, as last column data for waiting for operational data,
Update waits for operational data;
Computing unit 430, for receiving rolling caching send described in after operational data, using described pre-
If convolution kernel, to it is described wait for operational data or it is newer wait for that operational data carries out convolution algorithm, until all in the region
Row data have both participated in convolution algorithm, and all operation results obtained after convolution algorithm are sent to the memory.
Using the present embodiment, when obtaining the required data of operation from memory, cached according to convolution kernel size, chip
Data column width determined by capacity, the needs data volume and the first default line number that cache, every time from stored in memory wait for from
It manages and obtains a data of the row with the width in data, the characteristic that data largely repeat in convolution algorithm is utilized, ensure each
The data that need to be only obtained from memory and need to participate in operation are calculated, the wideband requirements to memory outside piece are reduced, to reduce core
Piece generated power consumption, raising process performance when carrying out data processing;Since data are divided into the extraction of data row to caching
Carry out operation, then it can also the big data of operational data amount using small caching;And by the setting of overlapping region, ensure volume
When product is calculated to boundary, required data can be still obtained, ensure that result and normal convolution process are completely the same.
Optionally, the rolling caching 420, specifically can be also used for:
Chip buffer memory capacity and preset data amount are obtained, the chip buffer memory capacity and the preset data amount are divided by,
Obtain the maximum value of the data amount check for rolling caching;
The first default line number is obtained, the maximum value of the data amount check and the described first default line number are divided by, institute is obtained
State the number for each row of data for rolling caching;
Determine that the number of each row of data for rolling caching is data column width.
Optionally, the rolling caching 420, specifically can be also used for:
The convolution width of frame is subtracted into preset value, obtains the width of overlapping region, wherein the overlapping region is any
Data are arranged arranges the region overlapped with adjacent data;
Determine the width for including the overlapping region in the data column width.
Optionally, the rolling caching 420, specifically can be also used for:
Add before the first row data before extracting the first row data for any column region in all areas
The data for adding the first null, and first null being arranged are 0;
Before extracting last column data, the second null is added after last column data, and is arranged described
The data of two nulls are 0;
That the second default line number is extracted since first null waits for operational data.
Optionally, the rolling caching 420, specifically can be also used for:
When the first row data when in operational data have participated in convolution algorithm, the first row data are deleted,
And next line data are extracted from corresponding column region, as last column data for waiting for operational data, update waits for operational data;
Alternatively,
After having carried out any secondary convolution algorithm, the first row data are deleted from chip caching, and is deleting
When except carrying out any secondary convolution algorithm after the first row data, extraction next line data are sent to described from corresponding column region
Chip caches, and as last column data for waiting for operational data, update waits for operational data.
Optionally, the computing unit 430, specifically can be also used for:
Using the default convolution kernel, to it is described wait for operational data or it is newer wait for that operational data carries out convolution algorithm, obtain
To convolution results;
The convolution results are stored into the row of next convolutional layer identical with the columns of corresponding column region;
Next convolutional layer is sent to the memory.
It should be noted that the data processing equipment of the embodiment of the present invention is the equipment using above-mentioned data processing method,
Then all embodiments of above-mentioned data processing method are suitable for the equipment, and can reach same or analogous advantageous effect.
Corresponding to above-described embodiment, an embodiment of the present invention provides a kind of data processing equipments, as shown in figure 5, at data
Managing device may include:
First determining module 510 for obtaining default convolution kernel, and determines the convolution width of frame of the default convolution kernel;
Second determining module 520, for obtaining and according to chip buffer memory capacity, preset data amount and the first default line number,
Determine data column width, wherein the data column width is greater than or equal to the convolution width of frame;
Division module 530, for for being divided into ranks to pending data matrix according to the data column width, obtaining
Multiple row region, wherein the pending data matrix be storage in memory, the matrix that includes all pending datas;
Extraction module 540, for when receiving data processing instructions, for any column region in all column regions,
That extracts the second default line number waits for that operational data is sent to the chip caching, using the default convolution kernel, to the institute of caching
It states and waits for that operational data carries out convolution algorithm, wherein the second default line number is greater than or equal to the convolution width of frame and is less than
Or it is equal to the described first default line number;
Update module 550 will be described for after the first row data after in operational data have participated in convolution algorithm
The first row data are deleted from chip caching, and to be sent to the chip slow for extraction next line data from corresponding column region
It deposits, as last column data for waiting for operational data, update waits for operational data;
First computing module 560 waits for that operational data carries out convolution fortune for utilizing the default convolution kernel to newer
It calculates, until the All Datarows in the region have both participated in convolution algorithm, all operation results that will be obtained after convolution algorithm
It is sent to the memory.
Using the present embodiment, when obtaining the required data of operation from memory, cached according to convolution kernel size, chip
Data column width determined by capacity, the needs data volume and the first default line number that cache, every time from stored in memory wait for from
It manages and obtains a data of the row with the width in data, the characteristic that data largely repeat in convolution algorithm is utilized, ensure each
The data that need to be only obtained from memory and need to participate in operation are calculated, the wideband requirements to memory outside piece are reduced, to reduce core
Piece generated power consumption, raising process performance when carrying out data processing;Since data are divided into the extraction of data row to caching
Carry out operation, then it can also the big data of operational data amount using small caching;And by the setting of overlapping region, ensure volume
When product is calculated to boundary, required data can be still obtained, ensure that result and normal convolution process are completely the same.
Optionally, second determining module 520 can also include:
First operation submodule, for obtaining chip buffer memory capacity and preset data amount, by the chip buffer memory capacity with
The preset data amount is divided by, and the maximum value of the data amount check of the chip caching is obtained;
Second operation submodule, for obtaining the first default line number, by the maximum value of the data amount check and described first
Default line number is divided by, and the number of each row of data of the chip caching is obtained;
Determination sub-module, for determining that the number of each row of data of the chip caching is data column width.
Optionally, the data processing equipment can also include:
Second computing module obtains the width of overlapping region for the convolution width of frame to be subtracted preset value, wherein
The overlapping region is that any data arranges the region overlapped with adjacent data row;
Third determining module, for determining the width for including the overlapping region in the data column width.
Optionally, the data processing equipment can also include:
First setup module, for for any column region in all areas, first to be added before the first row data
Null, and the data that first null is arranged are 0;
For adding the second null after last column data, and second null is arranged in second setup module
Data are 0;
Optionally, the extraction module 540, specifically can be also used for:
That the second default line number is extracted since first null waits for that operational data is sent to the chip caching.
Optionally, the update module 550, specifically can be used for:
When the first row data when in operational data have participated in convolution algorithm, by the first row data from described
It is deleted in chip caching, and extraction next line data are sent to the chip caching from corresponding column region, as waiting for operand
According to last column data, update wait for operational data;
Alternatively,
After having carried out any secondary convolution algorithm, the first row data are deleted from chip caching, and is deleting
When except carrying out any secondary convolution algorithm after the first row data, extraction next line data are sent to described from corresponding column region
Chip caches, and as last column data for waiting for operational data, update waits for operational data.
Optionally, first computing module 560 can also include:
Third operation submodule waits for that operational data carries out convolution algorithm for utilizing the default convolution kernel to newer,
Obtain convolution results;
Sub-module stored, for storing the convolution results to next convolutional layer identical with the columns of corresponding column region
Row in;
Sending submodule, for sending next convolutional layer to the memory.
It should be noted that the data processing equipment of the embodiment of the present invention is the device using above-mentioned data processing method,
Then all embodiments of above-mentioned data processing method are suitable for the device, and can reach same or analogous advantageous effect.
It is understood that data processing equipment can include simultaneously in another embodiment of the embodiment of the present invention:First
Determining module 510, the second determining module 520, division module 530, extraction module 540, update module 550, the first computing module
560, the second computing module, third determining module, the first setup module, the second setup module.
It should be noted that herein, relational terms such as first and second and the like are used merely to a reality
Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation
In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to
Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those
Element, but also include other elements that are not explicitly listed, or further include for this process, method, article or equipment
Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that
There is also other identical elements in process, method, article or equipment including the element.
Each embodiment in this specification is all made of relevant mode and describes, identical similar portion between each embodiment
Point just to refer each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality
For applying example, since it is substantially similar to the method embodiment, so description is fairly simple, related place is referring to embodiment of the method
Part explanation.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all
Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention
It is interior.
Claims (18)
1. a kind of data processing method, which is characterized in that the method includes:
It obtains and presets convolution kernel, and determine the convolution width of frame of the default convolution kernel;
It obtains and according to chip buffer memory capacity, preset data amount and the first default line number, determines data column width, wherein described
Data column width is greater than or equal to the convolution width of frame;
Pending data matrix is divided into ranks according to the data column width, obtains multiple row region, wherein described pending
Data matrix be storage in memory, the matrix that includes all pending datas;
When receiving data processing instructions, for any column region in all column regions, the second default line number of extraction waits for
Operational data is sent to the chip caching, using the default convolution kernel, to waiting for that operational data carries out convolution described in caching
Operation, wherein the second default line number is greater than or equal to the convolution width of frame and is less than or equal to the described first default row
Number;
After the first row data after in operational data have participated in convolution algorithm, by the first row data from the chip
It is deleted in caching, and extraction next line data are sent to the chip caching from corresponding column region, as waiting for operational data
Last column data, update wait for operational data;
Using the default convolution kernel, wait for that operational data carries out convolution algorithm to newer, until all rows in the region
Data have both participated in convolution algorithm, and all operation results obtained after convolution algorithm are sent to the memory.
2. data processing method according to claim 1, which is characterized in that it is described acquisition and according to chip buffer memory capacity,
Preset data amount and the first default line number, the step of determining data column width, including:
Chip buffer memory capacity and preset data amount are obtained, the chip buffer memory capacity and the preset data amount are divided by, obtained
The maximum value of the data amount check of the chip caching;
The first default line number is obtained, the maximum value of the data amount check and the described first default line number are divided by, the core is obtained
The number of each row of data of piece caching;
Determine that the number of each row of data of the chip caching is data column width.
3. data processing method according to claim 1, which is characterized in that described to treat place according to the data column width
Before managing the step of data matrix divides into ranks, obtains multiple row region, the method further includes:
The convolution width of frame is subtracted into preset value, obtains the width of overlapping region, wherein the overlapping region is any data
It arranges and arranges the region overlapped with adjacent data;
Determine the width for including the overlapping region in the data column width.
4. data processing method according to claim 1, which is characterized in that it is described when receiving data processing instructions,
For any column region in all areas, the second default line number of extraction waits for that operational data is sent to the step of the chip caching
Before rapid, the method further includes:
For any column region in all areas, the first null is added before the first row data, and it is empty to be arranged described first
Capable data are 0;
The data added the second null after last column data, and second null is arranged are 0;
The second default line number of the extraction waits for that operational data is sent to the step of chip caches, including:
That the second default line number is extracted since first null waits for that operational data is sent to the chip caching.
5. data processing method according to claim 1, which is characterized in that described in first waited in operational data
After row data have participated in convolution algorithm, the first row data are deleted from chip caching, and from corresponding column region
Extraction next line data are sent to the chip caching, and as last column data for waiting for operational data, update waits for operational data
The step of, including:
When the first row data when in operational data have participated in convolution algorithm, by the first row data from the chip
It is deleted in caching, and extraction next line data are sent to the chip caching from corresponding column region, as waiting for operational data
Last column data, update wait for operational data;
Alternatively,
After having carried out any secondary convolution algorithm, the first row data are deleted from chip caching, and deleting institute
It states when carrying out any secondary convolution algorithm after the first row data, extraction next line data are sent to the chip from corresponding column region
Caching, as last column data for waiting for operational data, update waits for operational data.
6. data processing method according to claim 1, which is characterized in that it is described to utilize the default convolution kernel, to more
New waits for that operational data carries out convolution algorithm, until the All Datarows in the region have both participated in convolution algorithm, by convolution
The step of all operation results obtained after operation are sent to the memory, including:
Using the default convolution kernel, waits for that operational data carries out convolution algorithm to newer, obtain convolution results;
The convolution results are stored into the row of next convolutional layer identical with the columns of corresponding column region;
Next convolutional layer is sent to the memory.
7. a kind of data processing equipment, which is characterized in that the equipment includes:
Main control unit for receiving data processing instructions, and sends control command to caching and computing unit is rolled, to control rolling
Dynamic caching extracts data from memory, control computing unit carries out convolution algorithm to the data of extraction;
Caching is rolled, for obtaining default convolution kernel, and determines the convolution width of frame of the default convolution kernel;It obtains and according to core
Piece buffer memory capacity, preset data amount and the first default line number, determine data column width, wherein the data column width be more than or
Equal to the convolution width of frame;After receiving the control command that the main control unit is sent, for according to the data col width
Any column region in the multiple row region divided into ranks to the pending data matrix in memory is spent, extraction second is default
Line number waits for operational data;After the first row data after in operational data have participated in convolution algorithm, by the first row
Data are deleted, and extract next line data from corresponding column region, and as last column data for waiting for operational data, update is to be shipped
Count evidence;
Computing unit, for receive rolling caching send described in after operational data, utilize the default convolution
Core, to it is described wait for operational data or it is newer wait for operational data carry out convolution algorithm, until the region in All Datarows
Convolution algorithm has been both participated in, all operation results obtained after convolution algorithm are sent to the memory.
8. data processing equipment according to claim 7, which is characterized in that the rolling caching is specifically additionally operable to:
Chip buffer memory capacity and preset data amount are obtained, the chip buffer memory capacity and the preset data amount are divided by, obtained
The maximum value of the data amount check for rolling caching;
The first default line number is obtained, the maximum value of the data amount check and the described first default line number are divided by, the rolling is obtained
The number of each row of data of dynamic caching;
Determine that the number of each row of data for rolling caching is data column width.
9. data processing equipment according to claim 7, which is characterized in that the rolling caching is specifically additionally operable to:
The convolution width of frame is subtracted into preset value, obtains the width of overlapping region, wherein the overlapping region is any data
It arranges and arranges the region overlapped with adjacent data;
Determine the width for including the overlapping region in the data column width.
10. data processing equipment according to claim 7, which is characterized in that the rolling caching is specifically additionally operable to:
For any column region in all areas, before extracting the first row data, is added before the first row data
One null, and the data that first null is arranged are 0;
Before extracting last column data, the second null is added after last column data, and it is empty to be arranged described second
Capable data are 0;
That the second default line number is extracted since first null waits for operational data.
11. data processing equipment according to claim 7, which is characterized in that the rolling caching is specifically additionally operable to:
When the first row data when in operational data have participated in convolution algorithm, the first row data are deleted, and from
Next line data are extracted in corresponding column region, as last column data for waiting for operational data, update waits for operational data;
Alternatively,
After having carried out any secondary convolution algorithm, the first row data are deleted from chip caching, and deleting institute
It states when carrying out any secondary convolution algorithm after the first row data, extraction next line data are sent to the chip from corresponding column region
Caching, as last column data for waiting for operational data, update waits for operational data.
12. data processing equipment according to claim 7, which is characterized in that the computing unit is specifically additionally operable to:
Using the default convolution kernel, to it is described wait for operational data or it is newer wait for that operational data carries out convolution algorithm, rolled up
Product result;
The convolution results are stored into the row of next convolutional layer identical with the columns of corresponding column region;
Next convolutional layer is sent to the memory.
13. a kind of data processing equipment, which is characterized in that described device includes:
First determining module for obtaining default convolution kernel, and determines the convolution width of frame of the default convolution kernel;
Second determining module, for obtaining and according to chip buffer memory capacity, preset data amount and the first default line number, determining data
Column width, wherein the data column width is greater than or equal to the convolution width of frame;
Division module is used to divide pending data matrix into ranks according to the data column width, obtains multiple row region,
In, the pending data matrix be storage in memory, the matrix that includes all pending datas;
Extraction module, for when receiving data processing instructions, for any column region in all column regions, extracting second
Default line number waits for that operational data is sent to the chip caching, using the default convolution kernel, to waiting for operation described in caching
Data carry out convolution algorithm, wherein the second default line number is greater than or equal to the convolution width of frame and is less than or equal to institute
State the first default line number;
Update module, for after the first row data after in operational data have participated in convolution algorithm, by the first row
Data are deleted from chip caching, and extraction next line data are sent to the chip caching from corresponding column region, make
To wait for that last column data of operational data, update wait for operational data;
First computing module waits for that operational data carries out convolution algorithm, until institute for utilizing the default convolution kernel to newer
The All Datarows stated in region have both participated in convolution algorithm, all operation results obtained after convolution algorithm are sent to described
Memory.
14. data processing equipment according to claim 13, which is characterized in that second determining module, including:
First operation submodule, for obtaining chip buffer memory capacity and preset data amount, by the chip buffer memory capacity with it is described
Preset data amount is divided by, and the maximum value of the data amount check of the chip caching is obtained;
Second operation submodule presets the maximum value of the data amount check and described first for obtaining the first default line number
Line number is divided by, and the number of each row of data of the chip caching is obtained;
Determination sub-module, for determining that the number of each row of data of the chip caching is data column width.
15. data processing equipment according to claim 13, which is characterized in that the data processing equipment further includes:
Second computing module obtains the width of overlapping region, wherein described for the convolution width of frame to be subtracted preset value
Overlapping region is that any data arranges the region overlapped with adjacent data row;
Third determining module, for determining the width for including the overlapping region in the data column width.
16. data processing equipment according to claim 13, which is characterized in that the data processing equipment further includes:
First setup module, for for any column region in all areas, the first null to be added before the first row data,
And the data that first null is arranged are 0;
For adding the second null after last column data, and the data of second null are arranged in second setup module
It is 0;
The extraction module, is specifically additionally operable to:
That the second default line number is extracted since first null waits for that operational data is sent to the chip caching.
17. data processing equipment according to claim 13, which is characterized in that the update module is specifically used for:
When the first row data when in operational data have participated in convolution algorithm, by the first row data from the chip
It is deleted in caching, and extraction next line data are sent to the chip caching from corresponding column region, as waiting for operational data
Last column data, update wait for operational data;
Alternatively,
After having carried out any secondary convolution algorithm, the first row data are deleted from chip caching, and deleting institute
It states when carrying out any secondary convolution algorithm after the first row data, extraction next line data are sent to the chip from corresponding column region
Caching, as last column data for waiting for operational data, update waits for operational data.
18. data processing equipment according to claim 13, which is characterized in that first computing module further includes:
Third operation submodule waits for that operational data carries out convolution algorithm to newer, obtains for utilizing the default convolution kernel
Convolution results;
Sub-module stored, for storing the convolution results to the row of next convolutional layer identical with the columns of corresponding column region
In;
Sending submodule, for sending next convolutional layer to the memory.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710152660.5A CN108573305B (en) | 2017-03-15 | 2017-03-15 | Data processing method, equipment and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710152660.5A CN108573305B (en) | 2017-03-15 | 2017-03-15 | Data processing method, equipment and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108573305A true CN108573305A (en) | 2018-09-25 |
CN108573305B CN108573305B (en) | 2020-07-24 |
Family
ID=63575806
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710152660.5A Active CN108573305B (en) | 2017-03-15 | 2017-03-15 | Data processing method, equipment and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108573305B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109726798A (en) * | 2018-12-27 | 2019-05-07 | 北京灵汐科技有限公司 | A kind of data processing method and device |
CN109740732A (en) * | 2018-12-27 | 2019-05-10 | 深圳云天励飞技术有限公司 | Neural network processor, convolutional neural networks data multiplexing method and relevant device |
CN110770740A (en) * | 2018-09-30 | 2020-02-07 | 深圳市大疆创新科技有限公司 | Image processing method and device based on convolutional neural network and unmanned aerial vehicle |
CN110866597A (en) * | 2019-09-27 | 2020-03-06 | 珠海博雅科技有限公司 | Data processing circuit and data processing method |
CN111125617A (en) * | 2019-12-23 | 2020-05-08 | 中科寒武纪科技股份有限公司 | Data processing method, data processing device, computer equipment and storage medium |
CN111124626A (en) * | 2018-11-01 | 2020-05-08 | 北京灵汐科技有限公司 | Many-core system and data processing method and processing device thereof |
CN111177115A (en) * | 2019-12-11 | 2020-05-19 | 中电普信(北京)科技发展有限公司 | General flow method and system for data preprocessing |
CN111199273A (en) * | 2019-12-31 | 2020-05-26 | 深圳云天励飞技术有限公司 | Convolution calculation method, device, equipment and storage medium |
WO2020238843A1 (en) * | 2019-05-24 | 2020-12-03 | 华为技术有限公司 | Neural network computing device and method, and computing device |
CN112396165A (en) * | 2020-11-30 | 2021-02-23 | 珠海零边界集成电路有限公司 | Arithmetic device and method for convolutional neural network |
-
2017
- 2017-03-15 CN CN201710152660.5A patent/CN108573305B/en active Active
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110770740A (en) * | 2018-09-30 | 2020-02-07 | 深圳市大疆创新科技有限公司 | Image processing method and device based on convolutional neural network and unmanned aerial vehicle |
WO2020062284A1 (en) * | 2018-09-30 | 2020-04-02 | 深圳市大疆创新科技有限公司 | Convolutional neural network-based image processing method and device, and unmanned aerial vehicle |
CN111124626A (en) * | 2018-11-01 | 2020-05-08 | 北京灵汐科技有限公司 | Many-core system and data processing method and processing device thereof |
CN109740732A (en) * | 2018-12-27 | 2019-05-10 | 深圳云天励飞技术有限公司 | Neural network processor, convolutional neural networks data multiplexing method and relevant device |
CN109726798B (en) * | 2018-12-27 | 2021-04-13 | 北京灵汐科技有限公司 | Data processing method and device |
CN109726798A (en) * | 2018-12-27 | 2019-05-07 | 北京灵汐科技有限公司 | A kind of data processing method and device |
WO2020238843A1 (en) * | 2019-05-24 | 2020-12-03 | 华为技术有限公司 | Neural network computing device and method, and computing device |
CN110866597A (en) * | 2019-09-27 | 2020-03-06 | 珠海博雅科技有限公司 | Data processing circuit and data processing method |
CN110866597B (en) * | 2019-09-27 | 2021-07-27 | 珠海博雅科技有限公司 | Data processing circuit and data processing method |
CN111177115A (en) * | 2019-12-11 | 2020-05-19 | 中电普信(北京)科技发展有限公司 | General flow method and system for data preprocessing |
CN111125617A (en) * | 2019-12-23 | 2020-05-08 | 中科寒武纪科技股份有限公司 | Data processing method, data processing device, computer equipment and storage medium |
CN111199273A (en) * | 2019-12-31 | 2020-05-26 | 深圳云天励飞技术有限公司 | Convolution calculation method, device, equipment and storage medium |
CN111199273B (en) * | 2019-12-31 | 2024-03-26 | 深圳云天励飞技术有限公司 | Convolution calculation method, device, equipment and storage medium |
CN112396165A (en) * | 2020-11-30 | 2021-02-23 | 珠海零边界集成电路有限公司 | Arithmetic device and method for convolutional neural network |
Also Published As
Publication number | Publication date |
---|---|
CN108573305B (en) | 2020-07-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108573305A (en) | A kind of data processing method, equipment and device | |
CN107392309A (en) | A kind of general fixed-point number neutral net convolution accelerator hardware structure based on FPGA | |
US20190164037A1 (en) | Apparatus for processing convolutional neural network using systolic array and method thereof | |
CN107742150A (en) | A kind of data processing method and device of convolutional neural networks | |
CN109871510B (en) | Two-dimensional convolution operation processing method, system, equipment and computer storage medium | |
DE102020122174A1 (en) | CALCULATE-IN / NEAR MEMORY (CIM) CIRCUIT ARCHITECTURE FOR UNIFIED MATRIX-MATRIX AND MATRIX-VECTOR CALCULATIONS | |
CN107957976A (en) | A kind of computational methods and Related product | |
CN107315574A (en) | A kind of apparatus and method for performing matrix multiplication | |
CN108416436A (en) | The method and its system of neural network division are carried out using multi-core processing module | |
CN108121688A (en) | A kind of computational methods and Related product | |
CN108073977A (en) | Convolution algorithm device and convolution algorithm method | |
CN106874219A (en) | A kind of data dispatching method of convolutional neural networks, system and computer equipment | |
CN105739951B (en) | A kind of L1 minimization problem fast solution methods based on GPU | |
CN108629411A (en) | A kind of convolution algorithm hardware realization apparatus and method | |
CN107633297A (en) | A kind of convolutional neural networks hardware accelerator based on parallel quick FIR filter algorithm | |
CN108629406A (en) | Arithmetic unit for convolutional neural networks | |
DE102013114351A1 (en) | System and method for hardware scheduling conditional barriers and impatient barriers | |
CN108108190A (en) | A kind of computational methods and Related product | |
CN107943756A (en) | A kind of computational methods and Related product | |
CN110009644B (en) | Method and device for segmenting line pixels of feature map | |
CN107479887B (en) | A kind of data display method, device and storage device | |
CN106293953A (en) | A kind of method and system accessing shared video data | |
CN109858622A (en) | The data of deep learning neural network carry circuit and method | |
CN108108189A (en) | A kind of computational methods and Related product | |
CN108090028A (en) | A kind of computational methods and Related product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |