CN110399972A - Data processing method, device and electronic equipment - Google Patents
Data processing method, device and electronic equipment Download PDFInfo
- Publication number
- CN110399972A CN110399972A CN201910661953.5A CN201910661953A CN110399972A CN 110399972 A CN110399972 A CN 110399972A CN 201910661953 A CN201910661953 A CN 201910661953A CN 110399972 A CN110399972 A CN 110399972A
- Authority
- CN
- China
- Prior art keywords
- convolution
- item
- operator
- information
- mask
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/15—Correlation function computation including computation of convolution operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Algebra (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Complex Calculations (AREA)
Abstract
This application provides a kind of data processing method and devices, wherein, the described method includes: obtaining the mask information of the first convolution item, wherein, the first convolution item includes the sparse convolution operation between pending data and convolution kernel, and the mask information of the first convolution item is used to mark the nonzero element in the first convolution item;According to the mask information of the first convolution item, the information of the first convolution operator corresponding with the first convolution item in multiple default convolution operators is determined;Based on the information of first convolution operator, sparse convolution operation is carried out to the first convolution item.Using data processing method provided by the present application, it is possible to reduce calculate calculation amount of the equipment when executing the convolution algorithm in deep neural network, improve the efficiency of the convolution algorithm in deep neural network.
Description
Technical field
This application involves depth learning technology field, in particular to a kind of data processing method, device and electronic equipment.
Background technique
With the rapid development of artificial intelligence (Artificial Intelligence, AI) technology, it is based on depth nerve net
The depth learning technology of network such as convolutional neural networks can carry out image recognition and detection, speech recognition with higher accuracy rate
Deng being widely used in the fields such as security monitoring, intelligent driving, human-computer interaction, intelligent medical.
Convolution algorithm amount in deep neural network is usually very big, how efficiently to realize the convolution in deep neural network
Operation is those skilled in the art's urgent problem to be solved.
Summary of the invention
The embodiment of the present application provides a kind of data processing method, device and electronic equipment.
According to the embodiment of the present application in a first aspect, provide a kind of data processing method, it is applied to calculate in equipment, institute
The method of stating includes: the mask information for obtaining the first convolution item, wherein the first convolution item includes pending data and convolution kernel
Between sparse convolution operation, the mask information of the first convolution item is used to mark the non-zero entry in the first convolution item
Element;According to the mask information of the first convolution item, determine corresponding with the first convolution item in multiple default convolution operators
The information of first convolution operator;Based on the information of first convolution operator, sparse convolution fortune is carried out to the first convolution item
It calculates.
In one possible implementation, it calculates equipment and is stored with the multiple default convolution operator.
In one possible implementation, the mask information according to the first convolution item determines multiple default
The information of the first convolution operator corresponding with the first convolution item in convolution operator, comprising: according to the first convolution item
Mask information determines the target operator address of the first convolution item;The information based on first convolution operator, to described
One convolution item carries out sparse convolution operation, comprising: obtains first convolution operator from target operator address, and utilizes and obtain
First convolution operator got carries out sparse convolution operation to the first convolution item.
In one possible implementation, the mask information according to the first convolution item determines multiple default
The information of the first convolution operator corresponding with the first convolution item in convolution operator, comprising: according to the first convolution item
Mapping relations between mask information and default convolution operator information and default mask information determine that the first convolution item is right
The information for first convolution operator answered.
In one possible implementation, the mapping between the default convolution operator information and default mask information is closed
It is stored in the mapping table comprising multiple list items.
In one possible implementation, described to be calculated according to the mask information of the first convolution item and default convolution
Mapping relations between sub-information and default mask information determine corresponding first convolution operator of the first convolution item
Information, comprising: mask information based on the first convolution item searches the mapping table, and by the mapping table, default cover
Default convolution operator information in the matched list item of mask information of code information and the first convolution item is as the first volume
The information of integrating.
In one possible implementation, the mask information includes covering for the sparse data in the first convolution item
Code.
In one possible implementation, the mask information includes any one of following: with the pending data pair
The data mask answered;Convolution kernel mask corresponding with the convolution kernel;It is corresponding with the pending data and the convolution kernel
Combine mask.
In one possible implementation, the mask information according to the first convolution item determines multiple default
The information of the first convolution operator corresponding with the first convolution item in convolution operator, comprising: in the first iteration according to
The mask information of first convolution item determines first convolution corresponding with the first convolution item in multiple default convolution operators
The information of operator;
The information based on first convolution operator carries out sparse convolution operation to the first convolution item, comprising:
Information based on first convolution operator in secondary iteration carries out sparse convolution operation to the first convolution item, wherein
The secondary iteration is the successive iterations of first iteration.
In one possible implementation, the second convolution item for being different from the first convolution item is carried out in the first iteration
Convolution algorithm.
In one possible implementation, to the information of the convolution algorithm of the second convolution item and determining first convolution operator
It is parallel to execute.
In this way, determining that the first convolution item is corresponding by the mask information for being in advance based on the first convolution item in iteration in front
The first convolution operator information, and be directly based upon in successive iterations the first convolution operator and convolution fortune carried out to the first convolution item
It calculates, to further promote the efficiency of convolution algorithm.
In one possible implementation, the secondary iteration is the following iteration of first iteration.
In one possible implementation, the method also includes the information of storage first convolution operator.
In one possible implementation, the method also includes: in the secondary iteration be based on the first volume
The information of integrating obtains first convolution operator stored in reservoir.
In one possible implementation, the method also includes: in first iteration be based on the second convolution item
The information of corresponding second convolution operator carries out sparse convolution operation to the second convolution item, wherein the first convolution item
For next convolution item of the second convolution item.
In one possible implementation, the method also includes: covering for the second convolution item is obtained in the first iteration
The information of code information or corresponding second convolution operator of the second convolution item.
In one possible implementation, the first iteration is the iteration for the first time of this data handling procedure, then first
Mask information based on the second convolution item in iteration determines corresponding second convolution operator of the second convolution item, and is based on volume Two
Integrating carries out convolution algorithm to the second convolution item;In addition, the mask information based on the first convolution item in the first iteration, determines
Corresponding first convolution operator of first convolution item, and store the information of the first convolution operator.
In one possible implementation, the first iteration is not the iteration for the first time of this data handling procedure, then
The information of the second convolution operator, and the acquisition of information volume Two based on the second convolution operator can be obtained in one iteration from memory
Integrating, alternatively, in the first iteration available second convolution item mask information, and from memory obtain the second convolution item
Corresponding second convolution operator of mask information information.
In one possible implementation, before the mask information for obtaining the first convolution item, the method is also
It include: that the mask information of the first convolution item is determined according to the nonzero element that sparse data includes in the first convolution item.
In one possible implementation, the mask information of the first convolution item includes preset quantity mask member
Element;Meet following any formula: M=2 between the quantity M of the default convolution operator and the quantity n of the mask elementn, or
Person, M=2n-1。
According to the second aspect of the embodiment of the present application, a kind of data processing equipment is provided, comprising: mask obtains module,
Be configured as obtaining the mask information of the first convolution item, wherein the first convolution item include pending data and convolution kernel it
Between sparse convolution operation, the mask information of the first convolution item is used to mark the nonzero element in the first convolution item;
Operator information determination module is configured as the mask information according to the first convolution item, determines in multiple default convolution operators
The information of the first convolution operator corresponding with the first convolution item;Computing module is configured as based on the first volume integrating
The information of son carries out sparse convolution operation to the first convolution item.
Optionally, the operator information determination module is configured as the mask information according to the first convolution item, determines
The target operator address of first convolution item;
The computing module is configured as obtaining first convolution operator from target operator address, and utilizes and obtain
First convolution operator got carries out sparse convolution operation to the first convolution item.
Optionally, the operator information determination module, be configured as according to the mask information of the first convolution item and
Mapping relations between default convolution operator information and default mask information, determine the first convolution item corresponding described first
The information of convolution operator.
Optionally, the mapping relations between the default convolution operator information and default mask information are stored in comprising multiple
In the mapping table of list item;
The operator information determination module is configured as the mask information based on the first convolution item and searches the mapping
Table, and will be default in the mapping table, in the matched list item of mask information of default mask information and the first convolution item
Information of the convolution operator information as first convolution operator.
Optionally, the mask information includes any one of following:
Data mask corresponding with the pending data;
Convolution kernel mask corresponding with the convolution kernel;
Combination mask corresponding with the pending data and the convolution kernel.
Optionally, the operator information determination module is configured as in the first iteration according to the first convolution item
Mask information determines the information of first convolution operator corresponding with the first convolution item in multiple default convolution operators;
The computing module is configured as the information based on first convolution operator in secondary iteration, to described
One convolution item carries out sparse convolution operation, wherein the secondary iteration is the following iteration of first iteration.
Optionally, the computing module is configured as in first iteration based on the second convolution item corresponding second
The information of convolution operator carries out sparse convolution operation to the second convolution item, wherein the first convolution item is described second
Next convolution item of convolution item.
Optionally, the mask obtains module, is configured as obtaining the mask information of the second convolution item in the first iteration.
Optionally, described device further include:
Mask information determining module is configured as according to the nonzero element that sparse data includes in the first convolution item,
Determine the mask information of the first convolution item.
Optionally, the mask information of the first convolution item includes preset quantity mask element;
Meet following any formula: M=between the quantity M of the default convolution operator and the quantity n of the mask element
2n, alternatively, M=2n-1。
Optionally, described device further include:
Memory module is configured as storing the information of the mask information of the first convolution item and first convolution operator
At least one of in.
According to the third aspect of the embodiment of the present application, a kind of convolution algorithm accelerator is provided, comprising:
Operator memory, for storing multiple default convolution operators;
Controller, for the mask information according to the first convolution item, determine in the multiple default convolution operator with it is described
The information of corresponding first convolution operator of first convolution item;
Register cache area, for providing input data for convolution algorithm;
Computing unit, the first convolution operator for being exported according to the operator memory execute the first convolution item
Sparse convolution operation.
Optionally, branch's multi-path choice fallout predictor is provided in the controller;
Branch's multi-path choice fallout predictor, it is true for the mask information in the first iteration according to the first convolution item
The information of the first convolution operator corresponding with the first convolution item in fixed the multiple default convolution operator, so as to calculate equipment
Information based on first convolution operator in secondary iteration carries out sparse convolution operation to the first convolution item;
Wherein, the secondary iteration is the following iteration of first iteration.
According to the fourth aspect of the embodiment of the present application, a kind of computer readable storage medium, the storage medium are provided
It is stored with computer program, the computer program realizes above-mentioned first aspect described in any item sides when being executed by processor
Method.
According to the 5th of the embodiment of the present application the aspect, a kind of chip is provided, comprising:
Processor is configured as executing method described in any possible implementation of above-mentioned first aspect.
According to the 6th of the embodiment of the present application the aspect, a kind of electronic equipment, including memory, processor and storage are provided
On a memory and the computer program that can run on a processor, which is characterized in that when the processor executes described program
Realize the described in any item methods of above-mentioned first aspect.
Using data processing method provided by the embodiments of the present application, it is true to calculate mask information of the equipment based on the first convolution item
The information of fixed corresponding first convolution operator, the information for being then based on the first convolution operator determines the first convolution operator, and utilizes
First convolution operator carries out sparse convolution operation to the first convolution item, it is possible to reduce calculates equipment in executing deep neural network
Convolution algorithm when calculation amount, improve convolution algorithm efficiency.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not
The application can be limited.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the application
Example, and together with specification it is used to explain the principle of the application.
Fig. 1 is a kind of the application data processing method flow chart shown according to an exemplary embodiment;
Fig. 2 is the application another data processing method flow chart shown according to an exemplary embodiment;
Fig. 3 is the application another data processing method flow chart shown according to an exemplary embodiment;
Fig. 4 is the application another data processing method flow chart shown according to an exemplary embodiment;
Fig. 5 is a kind of the application schematic diagram realizing convolution algorithm and accelerating shown according to an exemplary embodiment;
Fig. 6 is the schematic diagram of the application another data processing method shown according to an exemplary embodiment;
Fig. 7 is a kind of the application block diagram of data processing equipment shown according to an exemplary embodiment;
Fig. 8 is the block diagram of the application another data processing equipment shown according to an exemplary embodiment;
Fig. 9 is the block diagram of the application another data processing equipment shown according to an exemplary embodiment;
Figure 10 is a kind of the application structural schematic diagram of convolution algorithm accelerator shown according to an exemplary embodiment;
Figure 11 is the structural schematic diagram of the application a kind of electronic equipment shown according to an exemplary embodiment.
Specific embodiment
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to
When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment
Described in embodiment do not represent all embodiments consistent with the application.On the contrary, they be only with it is such as appended
The example of the consistent device and method of some aspects be described in detail in claims, the application.
It is only to be not intended to be limiting the application merely for for the purpose of describing particular embodiments in term used in this application.
It is also intended in the application and the "an" of singular used in the attached claims, " described " and "the" including majority
Form, unless the context clearly indicates other meaning.It is also understood that term "and/or" used herein refers to and wraps
It may be combined containing one or more associated any or all of project listed.
It will be appreciated that though various information, but this may be described using term first, second, third, etc. in the application
A little information should not necessarily be limited by these terms.These terms are only used to for same type of information being distinguished from each other out.For example, not departing from
In the case where the application range, the first information can also be referred to as the second information, and similarly, the second information can also be referred to as
One information.Depending on context, word as used in this " if " can be construed to " ... when " or " when ...
When " or " in response to determination ".
Before introducing the application, artificial neural network correlation involved in the embodiment of the present application is simply introduced first and is known
Know:
Neural network may include: convolutional layer (Convolutional Layer), pond layer (Pooling Layer), swash
The network units such as layer (Activation Layer) living, full articulamentum (Fully Connected Layer), by above-mentioned network list
Member is stacked according to certain way.
About convolutional layer, the convolutional layer in artificial neural network be with convolution algorithm to original input picture or it is upper one layer it is defeated
The layer that characteristic pattern out is converted.Convolutional layer (convolutional layer) is deep neural network (Deep Neural
Network, DNN) common a kind of layer when handling image.When a deep neural network is based on convolutional layer, referred to as
For convolutional neural networks (Convolutional Neural Network, CNN).
Convolution algorithm, in a convolutional layer, in order to extract the feature of diversified forms from image, usually using multiple volumes
Product verification input picture carries out different convolution operations.
Input data about convolutional layer, it is generally the case that first convolutional layer of a deep neural network can be to scheme
As being used as input data, and convolutional layer later can be using the characteristic pattern (Feature Map) that preamble convolutional layer exports as input
Data.In the embodiment of the present application, the input picture of the first convolutional layer and the input feature vector figure of other convolutional layers can be referred to as
For pending data.
In practical application scene, an intelligent task may be made of multiple neural networks such as convolutional neural networks, volume
Product neural network is the neural network of computation-intensive, and a data transmission may relate to one or more convolutional layers and carry out tens
Secondary or up to a hundred convolution algorithm, therefore, the convolution algorithm amount that an intelligent task is related to are usually very big, need largely to calculate
Resource, and more demand of the more complicated network of deep layer to computing resource is bigger.
In order to reduce the convolution algorithm amount in neural network, make the limited equipment of computing capability such as edge Edge end equipment
Also intelligent task can be executed using neural network well, the embodiment of the present application provides a kind of data processing method, to mention
Height calculates the efficiency that equipment executes convolution algorithm.Above-mentioned calculating equipment, which can be, is embedded in end AI (Artificial
Intelligence, artificial intelligence) chip, execute edge calculations terminal device.
Referring to a kind of Fig. 1 data processing method flow chart shown according to an exemplary embodiment, the method can be answered
With in calculating equipment, comprising the following steps:
In a step 11, the mask information of the first convolution item is obtained;
In deep neural network such as convolutional neural networks CNN, for a convolutional layer, by the pending data of input with
One convolution kernel carries out the process of convolution algorithm, the calculating process of referred to as one convolution item.That is, the embodiment of the present application
In, the pending data participated in convolution algorithm, convolution kernel will be prepared, is referred to as convolution item.
The first convolution item in the embodiment of the present application, which can be, calculates equipment currently to the convolution item of operation, is also possible to count
It calculates equipment and has executed convolution item to be processed after epicycle convolution algorithm, postorder will be specifically described in conjunction with example.
The mask information of above-mentioned first convolution item executes the convolution of rarefaction convolution algorithm for determining to the first convolution item
Operator.The mask information is for the nonzero element in the first convolution of label item.
In step 12, according to the mask information of the first convolution item, determine in multiple default convolution operators with it is described
The information of corresponding first convolution operator of first convolution item;
In the embodiment of the present application, calculating equipment can use the mask information of the first convolution item, determine the first convolution operator
Information.Wherein, the first convolution operator is used convolution when calculating equipment to the execution sparse convolution operation of the first convolution item
Operator.In the embodiment of the present application, the information of the first convolution operator can be first convolution operator itself, be also possible to for searching
The index information of first convolution operator, such as operator address information.
In the embodiment of the present application, the preset memory locations for calculating equipment are stored with multiple default convolution operators, calculate equipment
Corresponding with the first convolution item the can be determined from above-mentioned multiple default convolution operators according to the mask information of the first convolution item
The information of one convolution operator.
In step 13, based on the information of first convolution operator, sparse convolution fortune is carried out to the first convolution item
It calculates.
Equipment is calculated after the information for determining the first convolution operator, can determine the first convolution operator, then utilizing should
First convolution operator executes sparse convolution operation to above-mentioned first convolution item.
In the embodiment of the present application, equipment is calculated before executing sparse convolution operation to the first convolution item, it can prior base
The information of corresponding first convolution operator is determined in the mask information of the first convolution item, is then based on the information of the first convolution operator
It determines the first convolution operator, and sparse convolution operation is carried out to the first convolution item using the first convolution operator, it can be effectively save
Computing resource reduces the calculation amount for calculating equipment when executing the convolution algorithm in deep neural network, so that computing capability has
The equipment of limit, such as Edge end equipment, such as mobile phone, security protection camera, automobile, smart home device, various IoT (Internet
Of Things, Internet of Things) equipment etc. executes the smart machines of edge calculations, it can also efficiently execute the volume of deep neural network
Product operation improves the utilization rate for calculating computing resource in equipment, while the artificial intelligence experience of lifting means.
About the implementation of above-mentioned steps 12, step 13, according to timing position of first convolution item during convolution algorithm
Difference may include following two situation:
Situation one, the first convolution item are the convolution item for calculating equipment original execution, alternatively, determining the letter of the first convolution operator
Breath, it is unrelated with the calculating process of first convolution item.
In above situation one, the first convolution item is to calculate the current pending convolution item of equipment.In step 12, calculating is set
The standby mask information using the first convolution item determines the information of the first convolution operator.Corresponding step 13 are as follows: utilize the first convolution
The information of operator determines the first convolution operator, and executes sparse convolution operation to the first convolution item using the first convolution operator.
Situation two, the determination of the information of the first convolution operator are related with the implementation procedure of last round of convolution algorithm.
The situation two is corresponded to, referring to fig. 2 another data processing method flow chart shown according to an exemplary embodiment,
Above-mentioned steps 12 may include:
Step 121, in the first iteration according to the mask information of the first convolution item, determine multiple default convolution operators
In first convolution operator corresponding with the first convolution item information;
Correspondingly, above-mentioned steps 13 may include:
Step 131, the information based on first convolution operator in secondary iteration carry out the first convolution item dilute
Dredge convolution algorithm, wherein the secondary iteration is the following iteration of first iteration.
In the embodiment of the present application, following procedure can be determined as to an iterative process: to calculate equipment input data
Sparse convolution operation has been executed for the pending data in input data to equipment is calculated.
In the embodiment of the present application, if the first iteration is related to the calculating to first convolution item, in first iteration, calculating is set
Not only include the pending data of this convolution algorithm in the standby input data obtained, further includes next time involved by convolution algorithm
The mask information of convolution item, the i.e. mask information of the first convolution item.Wherein, the first convolution item is in secondary iteration, calculates equipment
Execute the convolution item that convolution algorithm is related to.In timing, which is the following iteration of first iteration.
For ease of understanding, above-mentioned first iteration, secondary iteration, the first convolution item, the first convolution item mask information between
Relationship, can be as shown in Table 1:
Table one
Iteration timing | Input mask information | Operational element |
First iteration | The mask information of first convolution item | Second convolution item |
Secondary iteration | It wouldn't list | First convolution item |
By above-mentioned table one it is found that in the embodiment of the present application, the second convolution item is that convolution algorithm relates in the first iterative process
And operational element, the first convolution item is the operational element that convolution algorithm is related to during secondary iteration.Wherein, in timing,
Secondary iteration is located at after the first iteration.Correspondingly, the convolution algorithm process of above-mentioned first convolution item, is located at second in timing
After the convolution algorithm process of convolution item.
In the embodiment of the present application, the first convolution can be determined according to the mask information of the first convolution item in the first iteration
The information of corresponding first convolution operator of item;In secondary iteration, equipment is calculated for the first convolution item and executes sparse convolution fortune
When calculation, the first convolution operator can quickly be determined according to the information of the first convolution operator pre-determined in the first iteration, thus
Sparse convolution operation is carried out to the first convolution item using the first convolution operator.
Correspondingly, in the embodiment of the present application, the method also includes:
In the first iteration, based on the information of corresponding second convolution operator of the second convolution item, to the second convolution item
Carry out sparse convolution operation;Wherein, the first convolution item is next convolution item of the second convolution item.
It should be noted that the step can carry out simultaneously with above-mentioned steps 121.
Corresponding above situation two, calculating equipment can be before executing convolution algorithm to the second convolution item, alternatively, to the
When two convolution items execute convolution algorithm, the information of the first convolution operator is determined based on the mask information of above-mentioned first convolution item.
Wherein, the implementation about above-mentioned steps 121 can be in some embodiments of the application specifically: in the first iteration
The middle mask information according to the first convolution item determines the target operator address of the first convolution item.
Correspondingly, above-mentioned steps 131 may include:
First convolution operator is obtained from target operator address in secondary iteration, and described using getting
First convolution operator carries out sparse convolution operation to the first convolution item.
About the implementation of above-mentioned steps 13, in the embodiment of the present application, calculate equipment preset memory locations be stored with it is default
Convolution operator calculates equipment after determining target operator address, can use above-mentioned target operator address and deposits from described preset
The corresponding target convolution operator of position acquisition is stored up, and sparse convolution operation is executed to convolution item using the target convolution operator.
In some embodiments of the application, default convolution operator set can be preset by calculating in equipment, the convolution operator collection
It include multiple default convolution operators in conjunction, each convolution operator is for executing a kind of convolution algorithm.Above-mentioned default convolution operator collection
It include: the sparse convolution operator for executing rarefaction convolution algorithm in conjunction.So-called rarefaction convolution algorithm is by certain in convolution item
A or certain elements are set as 0, become zero valued elements;It calculates equipment and convolution fortune is being carried out to convolution item using convolution operator
During calculation, operation related with these zero valued elements can be skipped, and convolution only is executed to the corresponding data of nonzero element
Operation.
After calculating equipment can determine target operator address according to above-mentioned default mask information, calculated from above-mentioned default convolution
Corresponding target convolution operator is searched in subclass, and the corresponding program code of target convolution operator is called to execute convolution item
Sparse convolution operation.
Referring to Fig. 3 another data processing method flow chart shown according to an exemplary embodiment, in above-mentioned steps 11
Before, the method can also include:
Step 10, according to the nonzero element that sparse data includes in the first convolution item, determine the first convolution item
Mask information.
On how to determine the mask information of convolution item, in the embodiment of the present application, can be first passed through for a convolution item
It presets LS-SVM sparseness method such as beta pruning processing mode and sets zero for the Partial Elements in the convolution item, obtain LS-SVM sparseness
Convolution item afterwards is properly termed as sparse data in the embodiment of the present application;Then according to the position of nonzero element in above-mentioned sparse data
Confidence breath determines above-mentioned mask information.So can be marked with the mask information of convolution item above-mentioned dilute in the embodiment of the present application
Dredge the nonzero element in data.
In the embodiment of the present application, different according to the object of LS-SVM sparseness, mask information may include following three kinds of classifications:
Data mask, mask information corresponding with pending data;It is the non-zero entry marked in pending data that it, which is acted on,
Element.
Sparse convolution implementation corresponding with above-mentioned data mask is properly termed as Dense-Sparse convolution realization side
Formula.Nonzero element i.e. in convolution kernel is dense;Nonzero element in pending data be it is sparse, mask information is for marking
Remember the nonzero element in the pending data after LS-SVM sparseness.
Convolution kernel mask, mask information corresponding with convolution kernel;It is the nonzero element marked in convolution kernel that it, which is acted on,.
Sparse convolution implementation corresponding with above-mentioned convolution kernel mask, is properly termed as Sparse-Dense convolution realization side
Formula.That is, the nonzero element in convolution kernel is sparse;The pending data of input such as the nonzero element in characteristic pattern piecemeal are thick
Close;Mask information is used to mark the nonzero element in the convolution kernel after LS-SVM sparseness.
Mask is combined, mask information corresponding with pending data and convolution kernel, effect is to mark number to be processed simultaneously
According to the nonzero element in convolution kernel.
Sparse convolution implementation corresponding with said combination mask is properly termed as Sparse-Sparse convolution realization side
Formula.That is, the nonzero element in convolution kernel be it is sparse, the nonzero element in the pending data of input is also sparse;Mask
The nonzero element that information marks the pending data after LS-SVM sparseness simultaneously and convolution kernel is included.
In practical application scene, it can select to execute using certain a kind of convolution implementation according to practical application request
Sparse convolution operation, correspondingly, including the mask information of corresponding classification into the mask information for calculating equipment input convolution item,
For marking the nonzero element in convolution item.
In the embodiment of the present application, it is related to determining the process of sparse convolution operator indirectly according to mask information, to illustrate to use
Convolution algorithm efficiency can be improved in method provided by the present application.Following the description by exemplary illustration mask information and convolution operator it
Between relationship:
On the basis of determining mask categories, each default corresponding specific sparse convolution operator of mask information.
Illustratively, for figure piecemeal, corresponding mask information are above-mentioned data mask characterized by pending data, under
State the corresponding relationship that content will illustrate mask information Yu sparse convolution operator in conjunction with specific example.
In practical application scene, characteristic pattern f and convolution kernel w are generally 4 dimension tensors, and data format can be NCHW lattice
Formula.Illustratively, the quantity of input feature vector figure is indicated for the characteristic pattern f, N of NCHW format;The channel of C expression feature diagram data
(Channel) number, for example, for the image data of rgb format, port number 3 respectively indicates the channel R, the channel G, channel B;
H indicates vertical (Height) component of the pixel coordinate system of characteristic pattern;W indicates the width of the pixel coordinate system of characteristic pattern
(Width) component.
The application is by by taking the convolution operation of an one-dimensional convolution kernel and one-dimensional characteristic figure piecemeal as an example, exemplary illustration mask
The corresponding relationship of information and convolution operator.
It is assumed that the size of convolution kernel (kernel size) is 1 × 3, step-length (stride) is 1.One-dimensional convolution kernel includes three
A element is labeled as w0, w1, w2, and the numerical value of convolution nuclear element is as shown in Table 2:
Table two
w0 | w1 | w2 |
2.0f | 3.0f | 1.0f |
Assuming that one-dimensional characteristic pattern piecemeal (Tile) includes four elements, it is respectively labeled as: f0, f1, f2, f3, characteristic pattern
The corresponding relationship of element and numerical value can be as shown in Table 3:
Table three
f0 | f1 | f2 | f3 |
0.0f | 0.0f | 1.5f | 0.0f |
In the case where not carrying out LS-SVM sparseness, calculating equipment can be calculated as follows according to default convolution operator:
Output0=w2 × f0;
Output1=w1 × f0+w2 × f1;
Output2=w0 × f0+w1 × f1+w2 × f2;
Output3=w0 × f1+w1 × f2+w2 × f3;
Output4=w0 × f2+w1 × f3;
Output5=w0 × f3.
It is found that during entire convolution algorithm, calculate equipment and need to be implemented 12 multiplying items, be respectively as follows: w2 ×
f0、w1×f0、w2×f1、w0×f0、w1×f1、w2×f2、w0×f1、w1×f2、w2×f3、w0×f2、w1×f3、w0×
f3。
From above-mentioned table three it is found that only f2 is not 0 in element due to characteristic pattern piecemeal, only following three defeated
Result is not 0 out, i.e. Output2=w2 × f2;Output3=w1 × f2;Output4=w0 × f2.It is set that is, calculating
In the standby above-mentioned 12 multiplying items executed, only there are three operation item, that is, w2 × f2, w1 × f2, w0 × f2 calculated result not
It is zero.
Based on this, the embodiment of the present application devises a kind of sparse convolution operator, calculates the equipment calls sparse convolution operator,
The calculating that above-mentioned w2 × f2, tri- w1 × f2, w0 × f2 operation items can directly be carried out, without successively executing above-mentioned 12 fortune
The calculating of item is calculated, calculation amount saving is 3/12 originally.
Above-mentioned sparse convolution operator is corresponding with the elemental characteristic of characteristic pattern piecemeal shown in table three.From table three it is found that f0,
The numerical value of f1, f3 are zero, it is assumed that each element is not zero in the characteristic pattern piecemeal being originally inputted, can in the embodiment of the present application
To mark characteristic pattern block data shown in above-mentioned table three by mask information (m0, m1, m2, m3).
Illustratively, it is assumed that a mask element indicates the data of corresponding position being set to 0 for 0, it is determined that table three is made
Mask information is 0010, can be determined according to above-mentioned calculating process: the corresponding sparse convolution operator of mask information 0010
The operation of execution are as follows: Output2=w2 × f2;Output3=w1 × f2;Output4=w0 × f2.
Therefore above-mentioned 4 mask elements m0, m1, m2, m3, which may be constructed 16 kinds of mask combinations, can correspond to generation 16
Independent convolution operator.The corresponding relationship of above-mentioned mask information and convolution operator can be as shown in Table 4:
Table four
Mask information | Convolution operator |
0000 | Operator 0 |
0001 | Operator 1 |
0010 | Operator 2 |
0011 | Operator 3 |
0100 | Operator 4 |
0101 | Operator 5 |
0110 | Operator 6 |
0111 | Operator 7 |
1000 | Operator 8 |
1001 | Operator 9 |
1010 | Operator 10 |
1011 | Operator 11 |
1100 | Operator 12 |
1101 | Operator 13 |
1110 | Operator 14 |
1111 | Operator 15 |
It altogether include 16 kinds of convolution operators: 0~Operator of Operator 15 in list shown in the table four, each
Mask information corresponds to a convolution operator.
Wherein, the corresponding sky operator, that is, Operator 0 of mask 0000, the corresponding dense operator of mask information 1111 are removed
That is Operator 15;1~Operator of remaining operator Operator 14 can be described as sparse convolution operator, for treating operation
Convolution item executes sparse convolution operation.
In the embodiment of the present application, the corresponding sparse convolution operator of mask information, it can be understood as execute specific mask combination
The program code of corresponding sparse convolution operation.
Above-mentioned example is to illustrate the corresponding relationship of mask information and convolution operator by taking data mask as an example.Similarly, above-mentioned
The corresponding relationship that mask information is also applied for convolution kernel mask with the corresponding relationship of convolution operator, combines mask and convolution operator
In.
For example, can still by taking the convolution operation of above-mentioned one-dimensional convolution kernel and one-dimensional characteristic figure piecemeal as an example for combination mask
To determine rarefaction convolution operator using the combination of 7 mask elements.
Still by taking above-mentioned convolution item as an example, it is assumed that the mask of convolution kernel is 010, and the mask is for marking above-mentioned table one by beta pruning
The convolution nuclear element obtained after processing, as shown in Table 5:
Table five
w0 | w1 | w2 |
0.0f | 3.0f | 0.0f |
Assuming that the mask information for nonzero element in marker characteristic figure piecemeal is still 0010, as shown in Table 3;Then right
In the Sparse-Sparse convolution operation that above-mentioned convolution item executes, presets the corresponding sparse convolution operator of mask 0100010 and execute
Operation are as follows: Output3=w1 × f2.
In the application example, the Gao Sanwei " 010 " of above-mentioned default mask information 0100010, for non-zero in label convolution kernel
The mask of element;Low four " 0010 " are the mask of nonzero element in marker characteristic figure piecemeal.In the application is exemplary, pre-
If extracted characteristics of image can after the corresponding convolution operator of mask information 0100010 executes Sparse-Sparse convolution operation
In the case where meeting intelligent task demand, equipment is calculated when executing above-mentioned convolution item, only needs to count using above-mentioned convolution operator
A product calculation item is calculated, calculation amount is reduced to 1 product calculation item by 12 original product calculation items, effectively reduces
Calculate the convolution algorithm amount of equipment.
It should be understood that in some embodiments of the application, the quantity M of convolution operator and the quantity n of mask element can be with
Meet following relationship: M=2n.In other embodiments of the application, the quantity M of convolution operator and the quantity n of mask element can also
To meet following relationship: M=2n-1。
In practical applications, sparse convolution operator can be selected according to the demand of intelligent task, convolution item is executed sparse
Convolution algorithm to reduce the operand of convolutional layer in neural network, and then reduces the meter calculated when equipment executes intelligent task
Calculation amount.
In some embodiments of the application, above-mentioned steps 12 may include:
According between the mask information of the first convolution item and default convolution operator information and default mask information
Mapping relations determine the information of corresponding first convolution operator of the first convolution item.
In some embodiments, the mapping relations between above-mentioned default convolution operator and default mask information can store in
In mapping table comprising multiple list items.
That is, the mapping table includes multiple list items, each list item indicates a kind of default mask information and a default convolution
Mapping relations between the information of operator.
Correspondingly, above-mentioned steps 12 may include:
Mask information based on the first convolution item searches the mapping table, and mask letter will be preset in the mapping table
Breath is with the default convolution operator information in the matched list item of mask information of the first convolution item as the first volume integrating
The information of son.
The above process is each list item in the mask information matching mapping table based on the first convolution item, will be in target list item
Convolution operator information be determined as the information of the first convolution operator.It wherein, include: covering for the first convolution item in above-mentioned target list item
Mapping relations between code information and the information of the first convolution operator.In some embodiments of the application, above-mentioned first volume integrating
The information of son can be the address of the first convolution operator.
Based on this, in data processing method provided by the present application, can be determined dilute first according to the mask information of convolution item
Dredge address, that is, target operator address of convolution operator;The target convolution operator of target operator address direction is then branched to, is utilized
The target convolution operator executes convolution algorithm to convolution item.
In some embodiments of the application, default operator address list can be inquired according to default mask information by calculating equipment,
Determine target operator address.
Above situation one is corresponded to, referring to fig. 4 another data processing method process shown according to an exemplary embodiment
Figure, above-mentioned steps 12 may include:
In step 1211, default operator address list is matched according to the mask information of the first convolution item, determines institute
The target operator address of the first convolution item is stated, the default operator address list includes: mask information and convolution operator address
Corresponding relationship;
In some embodiments of the application, calculating in equipment can store default operator address list, the default operator
Location list includes: the corresponding relationship of mask information Yu convolution operator address.
Wherein, above-mentioned mask information can be the combination of preset quantity mask element.Each mask element accounts for 1bit,
Numerical value can be binary number value, for example, can be set to 0 or 1 using binary form.In the embodiment of the present application, it can arrange
Meaning representated by the binary numerical value of mask element, for example, the mask element that numerical value is 0 is for marking convolution item such as characteristic pattern point
It is arranged to 0 pixel in block;Conversely, the mask element that numerical value is 1 is used to mark the non-zero in convolution item such as characteristic pattern piecemeal
Pixel value.
Above-mentioned steps 13 may include:
Step 1311, the target operator address based on the first convolution item obtains the first convolution operator;
In some embodiments of the application, default convolution operator set can be prestored by calculating in equipment, for example, the electronics is set
The corresponding instruction set of above-mentioned each convolution operator is stored in the cache of standby processor.
Then the step 1311 can be with specifically: the target operator address based on the first convolution item, from the default volume
The first convolution operator is searched in integrating subclass.
Step 1312, sparse convolution operation is executed to the first convolution item using first convolution operator.
It is corresponding above situation one, illustrative, referring to a kind of Fig. 5 realizations convolution fortune shown according to an exemplary embodiment
The schematic diagram accelerated is calculated, still shown in the above-mentioned table four for 16 kinds of convolution operators, in program compiling, is covered for above-mentioned 16 kinds
The corresponding 16 kinds of convolution operators of code information, definition have 16 operator addresses.It the address of each convolution operator can be according to base
Location, mask information and operator block size determine;Wherein, base address (Base Address) is the starting point of all convolution operators
Location, operator block size indicates the storage resource size that operator code occupies, and in the embodiment of the present application, each operator block size can be with
For 0x100.The then corresponding operator address of a mask information are as follows: base address+mask * 0x100;For example, binary mask information
0101 corresponding hexadecimal values 5, then, and 0101 corresponding operator address (Operator Address) of mask information are as follows: Base+
0x500;Operator 5, i.e. Operator 5 can be found according to the address.And so on, binary mask information 0110 is corresponding
Hexadecimal values 6, then the corresponding operator address of mask information 0110 can indicate are as follows: Base+0x600, it can according to the address
To find Operator 6;Binary mask information 1010 corresponds to hexadecimal values A, the then corresponding calculation of mask information 0110
Subaddressing can indicate are as follows: Base+0xA00 can find Operator 10 according to the address.It indicates in this manner
Each mask information corresponds to the address of convolution operator.
During the processor for calculating equipment executes program, control circuit (Control Circuit) is based on current
Mask information controls the destination address of program counter (Program Counter, PC).For example, if currently to operation convolution
The sparse convolution mask of item such as characteristic pattern piecemeal is 0101, then, it calculates equipment control PC and jumps on the Base+0x500 of address,
Using the corresponding convolution operator in the address to currently sparse convolution operation is executed to operation convolution item, returned to after having executed normal
Program flow.During being somebody's turn to do, interative computation process it is not related to.
Corresponding above situation two calculates equipment and obtains covering for the first convolution item in the first iteration in some embodiments
It can also include: the mask information that the second convolution item is obtained in the first iteration before code information.Wherein, the second convolution item is
The object of sparse convolution operation is executed in first iteration.
Based on this, in some embodiments of the application, a kind of branch's multi-path choice fallout predictor can be designed in calculating equipment
(Branch Multiplex Predictor)。
When executing sparse convolution operation for the second convolution item, first to above-mentioned branch's multi-path choice fallout predictor input second
The mask information of convolution item.
Branch's multi-path choice fallout predictor is after the mask information for getting above-mentioned second convolution item, according to above-mentioned volume Two
The mask information of product item determines the corresponding second target operator address of the second convolution item, and according to the mask of above-mentioned first convolution item
Information Accurate Prediction the first convolution item corresponding first object operator address;So that the second target operator address of processor foundation,
It, can be according to first predicted in advance after having executed sparse convolution operation to the second convolution item using the second target convolution operator
Target operator address automatic jumps to first object convolution operator to the execution sparse convolution operation of the first convolution item.
Illustratively, it by taking Dense-Sparse convolution implementation as an example, is shown referring to Fig. 6 according to an exemplary embodiment
A kind of schematic diagram realizing convolution algorithm and accelerating, it is assumed that there are three continuous characteristic pattern piecemeal, respectively indicate are as follows: Tile1,
Tile2, Tile3, corresponding mask information are respectively as follows: 0101,1001,1100.Pair of features described above figure piecemeal and mask information
Should be related to can be as shown in Table 6:
Table six
Characteristic pattern piecemeal | Mask information |
Tile 1 | 0101 |
Tile 2 | 1001 |
Tile 3 | 1100 |
In iteration for the first time, the corresponding mask information 0101 of tile 1, tile 2 are inputted to branch's multi-path choice fallout predictor
Corresponding mask information 1001;Multi-path choice fallout predictor is successively according to mask 0101,1001, with determining corresponding target operator
Location: base+0x500, base+0x900.Wherein, the corresponding mask information 1001 of tile 2 belongs in the embodiment of the present application
The mask information of one convolution item;The second convolution item that the corresponding mask information 0101 of tile 1 belongs in the embodiment of the present application is covered
Code information.
Processor finds Operator 5 according to base+0x500 first, using the Operator 5 to belonging to tile 1
Executed sparse convolution operation to operation convolution item;It is then possible to find Operator 9 automatically according to base+0x900,
As shown in the following figure in Fig. 6.
At second (2nd) in iteration, to the corresponding sparse mask 1001 of branch's multi-path choice fallout predictor input tile 2 with
And the corresponding sparse mask 1100 of tile 3.Processor is using the Operator 9 determined in iteration for the first time to belonging to tile 2
To operation convolution item execute sparse convolution operation, also, to convolution item belonging to tile 2 (i.e. the second convolution item) execute
When sparse convolution operation, the sparse convolution operator address of tile 3 is predicted according to the corresponding mask information 1100 of tile 3, so that
Processor is in third time (3th) can be according to the operator address of branch's multi-path choice fallout predictor look-ahead in iteration: base+
0xC00 finds target convolution operator Operator 12, using Operator 12 to belonging to tile 3 to operation convolution item
Execute sparse convolution operation.
In the embodiment of the present application, processor, can be according to before having executed sparse convolution operation to the second convolution item
The mask information of one convolution item goes out next jump address by branch's multi-path choice fallout predictor Accurate Prediction, so that processor exists
It, can be immediately according to the operator address for the first convolution item predicted in advance after having executed sparse convolution operation to the second convolution item
Next-hop convolution operator i.e. the first convolution operator accurately is found, sparse convolution operation is executed to the first convolution item.
Processor is generally using very long assembly line, if branch prediction failure may lose several to tens
In the clock period, therefore, the assembly line of processor the long more needs accurate branch prediction.
In the embodiment of the present application, since branch's multi-path choice fallout predictor can be accurate according to the mask information of the first convolution item
It predicts next jump address, effectively avoids branch misprediction, it can thus be avoided processor is in branch predictor to next-hop
Turn rollback after address prediction mistake and empty assembly line, effectively improves rate and property that pipeline processor executes convolution algorithm
Can, so that the limited equipment of computing capability, can also efficiently execute the convolution algorithm of deep neural network.
Data processing method provided by the present application can be applied to the artificial intelligence such as intelligent driving, human-computer interaction, safety monitoring
In energy application scenarios.
It should be noted that the neural network related in the embodiment of the present application may include: deep neural network (deep
Neural network, DNN) such as convolutional neural networks (convolutional neural network, CNN) etc., this Shen
Please embodiment the concrete form of above-mentioned neural network is not construed as limiting.
Above example is to be said by taking one-dimensional convolution algorithm as an example to the method that equipment realizes fast convolution operation is calculated
It is bright, it should be understood that, the above-mentioned method for realizing fast convolution operation using the corresponding convolution operator of sparse mask can also be applied
In the application scenarios of higher-dimension convolution algorithm, the application to this with no restriction.
For the various method embodiments described above, for simple description, therefore, it is stated as a series of action combinations, but
Be those skilled in the art should understand that, the disclosure is not limited by the described action sequence because according to the disclosure, certain
A little steps can be performed in other orders or simultaneously.
Secondly, those skilled in the art should also know that, embodiment described in this description belongs to alternative embodiment,
Necessary to the related actions and modules not necessarily disclosure.
Corresponding with aforementioned applications function realizing method embodiment, the disclosure additionally provides the reality of application function realization device
Apply example.
Referring to a kind of Fig. 7 block diagram of data processing equipment shown according to an exemplary embodiment, the embodiment of the present application is mentioned
A kind of data processing equipment has been supplied, can be set in calculating equipment, the apparatus may include:
Mask obtains module 21, is configured as obtaining the mask information of the first convolution item, wherein the first convolution item packet
The sparse convolution operation between pending data and convolution kernel is included, the mask information of the first convolution item is configured as label institute
State the nonzero element in the first convolution item;
Operator information determination module 23 is configured as the mask information according to the first convolution item, determines multiple default
The information of the first convolution operator corresponding with the first convolution item in convolution operator;
Computing module 25 is configured as the information based on first convolution operator, carries out to the first convolution item dilute
Dredge convolution algorithm.
In other embodiments of the application, the operator information determination module 23 be can be configured as according to described
The mask information of one convolution item determines the target operator address of the first convolution item;
Correspondingly, the computing module 25, can be configured as from target operator address and obtains first convolution
Operator, and sparse convolution operation is carried out to the first convolution item using first convolution operator got.
In other embodiments of the application, the operator information determination module 23 be can be configured as according to described
Mapping relations between the mask information of one convolution item and default convolution operator information and default mask information determine described
The information of corresponding first convolution operator of one convolution item.
Mapping relations in some embodiments of the application, between the default convolution operator information and default mask information
It is stored in the mapping table comprising multiple list items;
Correspondingly, the operator information determination module 23, can be configured as the mask letter based on the first convolution item
Breath searches the mapping table, and by the mapping table, default mask information matches with the mask information of the first convolution item
List item in information of the default convolution operator information as first convolution operator.
In any of the above-described data processing equipment embodiment of the application, the mask information may include any one of following:
Data mask corresponding with the pending data;
Convolution kernel mask corresponding with the convolution kernel;
Combination mask corresponding with the pending data and the convolution kernel.
In one data processing equipment embodiment of the application, the operator information determination module 23 be can be configured as
According to the mask information of the first convolution item in first iteration, determine in multiple default convolution operators with the first convolution item
The information of corresponding first convolution operator;
Corresponding, the computing module 25 can be configured as in secondary iteration based on first convolution operator
Information carries out sparse convolution operation to the first convolution item, wherein the secondary iteration is that the next of first iteration changes
Generation.
In another Installation practice of the application, computing module 25 can be additionally configured to the base in first iteration
In the information of corresponding second convolution operator of the second convolution item, sparse convolution operation is carried out to the second convolution item, wherein institute
State next convolution item that the first convolution item is the second convolution item.
Corresponding, in another Installation practice of the application, the mask obtains module 21, can be additionally configured to the
The mask information of the second convolution item is obtained in one iteration.
Referring to the structural block diagram of Fig. 8 another data processing equipment shown according to an exemplary embodiment, dress shown in Fig. 7
On the basis of setting embodiment, described device can also include:
Mask information determining module 20 is configured as according to the non-zero entry that sparse data includes in the first convolution item
Element determines the mask information of the first convolution item.
In the embodiment of the present application, the mask information of the first convolution item may include preset quantity mask element;
It can satisfy following any formula between the quantity M of the default convolution operator and the quantity n of the mask element:
M=2n, alternatively, M=2n-1。
Mask obtains module 21, can be additionally configured to the mask information that the second convolution item is obtained in the first iteration.
Referring to the structural block diagram of Fig. 9 another data processing equipment shown according to an exemplary embodiment, dress shown in Fig. 7
On the basis of setting embodiment, described device can also include:
Memory module 22 is configured as storing the letter of the mask information of the first convolution item and first convolution operator
At least one of in breath.
For device embodiment, since it corresponds essentially to embodiment of the method, so related place is referring to method reality
Apply the part explanation of example.The apparatus embodiments described above are merely exemplary, wherein described be used as separation unit
The unit of explanation may or may not be physically separated, and component shown as a unit can be or can also be with
It is not physical unit, it can it is in one place, or may be distributed over multiple network units.It can be according to actual
The purpose for needing to select some or all of the modules therein to realize application scheme.Those of ordinary skill in the art are not paying
Out in the case where creative work, it can understand and implement.
Correspondingly, being shown referring to Figure 10 according to an exemplary embodiment present invention also provides a kind of convolution algorithm accelerator
A kind of structural schematic diagram of convolution algorithm accelerator out, comprising:
Operator memory 100, for storing default convolution operator set;
In the embodiment of the present application, above-mentioned operator memory 100 can be normal memory, can also be that large capacity instruction is slow
(Instruction Cache) is deposited, for storing multiple default convolution operators, such as 16 kinds of convolution operators in examples detailed above.If
Above-mentioned operator memory is large capacity instruction buffer, and the processor for calculating equipment can the fast fast reading from the large capacity instruction buffer
Target convolution operator is taken, calculates the instruction fetch expense that equipment obtains target convolution operator to reduce.
Controller 200, for the mask information according to the first convolution item, determine in the multiple default convolution operator with institute
State the information of corresponding first convolution operator of the first convolution item;
Wherein, branch's multi-path choice fallout predictor 201 is provided in the controller 200.
Branch's multi-path choice fallout predictor 201, for being believed in the first iteration according to the mask of the first convolution item
Breath determines the information of the first convolution operator corresponding with the first convolution item in the multiple default convolution operator, so as to calculate
Information of the equipment based on first convolution operator in secondary iteration carries out sparse convolution operation to the first convolution item.
As shown in Figure 10, controller 200 can obtain volume the from the register (not shown) of storage mask information
The mask information of product item and the base address of all convolution operators, are input in branch's multi-path choice fallout predictor 201.Example as shown,
Branch's multi-path choice fallout predictor 201 can be the 0101 operator address for determining the second convolution item based on mask 1, and be based on mask 2
The i.e. 1001 next convolution items of prediction are the operator address of the first convolution item.
Register cache area, for providing input data for convolution algorithm;
In the embodiment of the present application, above-mentioned register cache area can be the storage region being made of muti-piece register, caching,
For storing data and/or mask information to operation convolution item;
As shown in Figure 10, above-mentioned register cache area may include:
Convolution nuclear memory 401, for storing the convolution kernel of neural network;After above-mentioned convolution kernel includes: LS-SVM sparseness
Sparse convolution core, alternatively, the dense convolution kernel without LS-SVM sparseness;For example, in Sparse-Sparse convolution implementation
Sparse convolution core, alternatively, the dense convolution kernel in Dense-Sparse convolution implementation.
Convolution kernel caching 402, for caching the convolution kernel of the second convolution item, in Dense-Sparse convolution implementation
Dense convolution kernel.
Data storage 501, for storing pending data, after above-mentioned pending data may include: LS-SVM sparseness
Pending data, also may include: the pending data without LS-SVM sparseness.
Data buffer storage 502, the pending data being related to for caching the second convolution item, as Dense-Sparse convolution is realized
The pending data after LS-SVM sparseness in mode.
Computing unit 300, the target convolution operator for being exported according to the operator memory is to the second convolution item
Execute sparse convolution operation.
In the embodiment of the present application, register cache area is connect with computing unit, is provided for computing unit 300 and is executed sparse volume
The input data of product operation.
Computing unit 300 executes sparse volume to the second convolution item according to the target convolution operator that operator memory 200 exports
Product operation, obtain convolutional calculation as a result, and convolutional calculation result is exported into output caching 601, and then by convolutional calculation knot
Fruit is stored in output storage 602.
In the embodiment of the present application, equipment is calculated before executing convolution algorithm to a convolution item every time, it can first really
The fixed sparse convolution operator being adapted to the convolution item executes sparse convolution to convolution item using determining sparse convolution operator and transports
It calculates, effectively reduces convolution algorithm amount, so as to the computing resource that effectively save convolution algorithm occupies, reduce depth nerve
Requirement of the convolution algorithm of network to equipment computing capability.
In addition, this specification also provides a kind of data processing equipment, and the apparatus may include: memory, processor, it is described
Memory stores in the memory for storing the computer instruction that can be run on a processor, the processor for calling
Executable instruction, realize any embodiment of this specification described in data processing method.
Corresponding to above-mentioned data processing method, the embodiment of the present application also provides shown in Figure 11 according to the one of the application
The schematic configuration diagram of the electronic equipment of exemplary embodiment.Figure 11 is please referred to, in hardware view, which includes processing
Device, internal bus, network interface, memory and nonvolatile memory are also possible that required for other business hard certainly
Part.Processor from read in nonvolatile memory corresponding computer program into memory then run, on logic level
It is formed and realizes the data processing equipment provided by the present application realizing convolution algorithm and accelerating.Certainly, other than software realization mode,
Other implementations, such as logical device or the mode of software and hardware combining etc. is not precluded in the application, that is to say, that following
The executing subject of process flow is not limited to each logic unit, is also possible to hardware or logical device.
It will be understood by those skilled in the art that this specification one or more embodiment can provide as method, system or calculating
Machine program product.Therefore, this specification one or more embodiment can be used complete hardware embodiment, complete software embodiment or
The form of embodiment combining software and hardware aspects.Moreover, this specification one or more embodiment can be used at one or
It is multiple wherein include computer usable program code computer-usable storage medium (including but not limited to magnetic disk storage,
CD-ROM, optical memory etc.) on the form of computer program product implemented.
This specification embodiment also provides a kind of computer readable storage medium, can store calculating on the storage medium
Machine program realizes the realization convolution fortune that any embodiment of this specification Fig. 1 to Fig. 6 provides when described program is executed by processor
The step of calculating the method accelerated.
This specification embodiment additionally provides a kind of chip, comprising:
Processor is configured as executing the data processing method that any embodiment of this specification Fig. 1 to Fig. 6 provides.
Theme described in this specification and the embodiment of feature operation can be realized in the following: Fundamental Digital Circuit,
Computer software or firmware, the computer including structure disclosed in this specification and its structural equivalents of tangible embodiment are hard
The combination of part or one or more of which.The embodiment of theme described in this specification can be implemented as one or
Multiple computer programs, i.e. coding are executed by data processing equipment on tangible non-transitory program carrier or are controlled at data
Manage one or more modules in the computer program instructions of the operation of device.Alternatively, or in addition, program instruction can be with
It is coded on manually generated transmitting signal, such as electricity, light or electromagnetic signal that machine generates, the signal are generated will believe
Breath encodes and is transferred to suitable receiver apparatus to be executed by data processing equipment.Computer storage medium can be machine can
Read storage equipment, machine readable storage substrate, random or serial access memory equipment or one or more of which group
It closes.
Processing described in this specification and logic flow can by execute one of one or more computer programs or
Multiple programmable calculators execute, to execute corresponding function by the way that output is operated and generated according to input data.Institute
Stating processing and logic flow can also be held by dedicated logic circuit-such as FPGA (field programmable gate array) or processor
Row, and device also can be implemented as dedicated logic circuit.
The computer for being suitable for carrying out computer program includes, such as general and/or special microprocessor.Computer
Basic module includes processing unit for being practiced or carried out instruction and deposits for storing instruction with the one or more of data
Storage device.In general, computer will also include one or more mass-memory units for storing data, such as disk,
Magneto-optic disk or CD etc. or computer will operationally be coupled with this mass-memory unit with receive from it data or to its
Transmission data or two kinds of situations have both at the same time.However, computer is not required have such equipment.In addition, computer can
To be embedded in another equipment, such as mobile phone, personal digital assistant (PDA), Mobile audio frequency or video player, game behaviour
The portable storage of vertical platform, global positioning system (GPS) receiver or such as universal serial bus (USB) flash drive is set
It is standby, it names just a few.
It is suitable for storing computer program instructions and the computer-readable medium of data including the non-volatile of form of ownership
Memory, medium and memory devices, for example including semiconductor memory devices (such as EPROM, EEPROM and flash memory device),
Disk (such as internal hard drive or removable disk), magneto-optic disk and CD ROM and DVD-ROM disk.Processor and memory can be by special
It is supplemented or is incorporated in dedicated logic circuit with logic circuit.
Although this specification includes many specific implementation details, these are not necessarily to be construed as the model for limiting any invention
It encloses or range claimed, and is primarily used for describing the feature of the specific embodiment of specific invention.In this specification
Certain features described in multiple embodiments can also be combined implementation in a single embodiment.On the other hand, individually implementing
Various features described in example can also be performed separately in various embodiments or be implemented with any suitable sub-portfolio.This
Outside, although feature can work in certain combinations as described above and even initially so be claimed, institute is come from
One or more features in claimed combination can be removed from the combination in some cases, and claimed
Combination can be directed toward the modification of sub-portfolio or sub-portfolio.
Similarly, although depicting operation in the accompanying drawings with particular order, this is understood not to require these behaviour
Make the particular order shown in execute or sequentially carry out or require the operation of all illustrations to be performed, to realize desired knot
Fruit.In some cases, multitask and parallel processing may be advantageous.In addition, the various system modules in above-described embodiment
Separation with component is understood not to be required to such separation in all embodiments, and it is to be understood that described
Program assembly and system can be usually integrated in together in single software product, or be packaged into multiple software product.
The specific embodiment of theme has been described as a result,.Other embodiments are within the scope of the appended claims.In
In some cases, the movement recorded in claims can be executed in different order and still realize desired result.This
Outside, the processing described in attached drawing and it is nonessential shown in particular order or sequential order, to realize desired result.In certain realities
In existing, multitask and parallel processing be may be advantageous.
The foregoing is merely the preferred embodiments of this specification one or more embodiment, not to limit this theory
Bright book one or more embodiment, all within the spirit and principle of this specification one or more embodiment, that is done is any
Modification, equivalent replacement, improvement etc. should be included within the scope of the protection of this specification one or more embodiment.
Claims (10)
1. a kind of data processing method, which is characterized in that be applied to calculate in equipment, which comprises
Obtain the mask information of the first convolution item, wherein the first convolution item includes between pending data and convolution kernel
Sparse convolution operation, the mask information of the first convolution item are used to mark the nonzero element in the first convolution item;
According to the mask information of the first convolution item, determine corresponding with the first convolution item in multiple default convolution operators
The information of first convolution operator;
Based on the information of first convolution operator, sparse convolution operation is carried out to the first convolution item.
2. the method according to claim 1, wherein the mask information according to the first convolution item, really
The information of the first convolution operator corresponding with the first convolution item in fixed multiple default convolution operators, comprising:
According to the mask information of the first convolution item, the target operator address of the first convolution item is determined;
The information based on first convolution operator carries out sparse convolution operation to the first convolution item, comprising:
First convolution operator is obtained from target operator address, and using first convolution operator got to institute
It states the first convolution item and carries out sparse convolution operation.
3. the method according to any one of claims 1 and 2, which is characterized in that the mask information includes any one of following:
Data mask corresponding with the pending data;
Convolution kernel mask corresponding with the convolution kernel;
Combination mask corresponding with the pending data and the convolution kernel.
4. according to the method in any one of claims 1 to 3, which is characterized in that described according to the first convolution item
Mask information determines the information of the first convolution operator corresponding with the first convolution item in multiple default convolution operators, comprising:
According to the mask information of the first convolution item in the first iteration, determine in multiple default convolution operators with described first
The information of corresponding first convolution operator of convolution item;
The information based on first convolution operator carries out sparse convolution operation to the first convolution item, comprising:
Information based on first convolution operator in secondary iteration carries out sparse convolution operation to the first convolution item,
Wherein, the secondary iteration is the following iteration of first iteration.
5. according to the method described in claim 4, it is characterized in that, the method also includes:
Information based on corresponding second convolution operator of the second convolution item in first iteration, to the second convolution item into
Row sparse convolution operation, wherein the first convolution item is next convolution item of the second convolution item.
6. a kind of data processing equipment, which is characterized in that described device includes:
Mask obtains module, is configured as obtaining the mask information of the first convolution item, wherein the first convolution item includes wait locate
The sparse convolution operation between data and convolution kernel is managed, the mask information of the first convolution item is for marking first convolution
Nonzero element in;
Operator information determination module is configured as the mask information according to the first convolution item, determines that multiple default convolution are calculated
The information of the first convolution operator corresponding with the first convolution item in son;
Computing module is configured as the information based on first convolution operator, carries out sparse convolution to the first convolution item
Operation.
7. a kind of convolution algorithm accelerator characterized by comprising
Operator memory, for storing multiple default convolution operators;
Controller determines in the multiple default convolution operator for the mask information according to the first convolution item with described first
The information of corresponding first convolution operator of convolution item;
Register cache area, for providing input data for convolution algorithm;
Computing unit, the first convolution operator for being exported according to the operator memory execute the first convolution item sparse
Convolution algorithm.
8. a kind of computer readable storage medium, which is characterized in that the storage medium is stored with computer program, the calculating
Method described in any one of the claims 1-5 is realized when machine program is executed by processor.
9. a kind of chip characterized by comprising
Processor is configured as data processing method described in any one of perform claim requirement 1 to 5.
10. a kind of electronic equipment, which is characterized in that the equipment include: memory, processor and storage on a memory and can
The computer program run on a processor, the processor are realized any in the claims 1-5 when executing described program
Method described in.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910661953.5A CN110399972B (en) | 2019-07-22 | 2019-07-22 | Data processing method and device and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910661953.5A CN110399972B (en) | 2019-07-22 | 2019-07-22 | Data processing method and device and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110399972A true CN110399972A (en) | 2019-11-01 |
CN110399972B CN110399972B (en) | 2021-05-25 |
Family
ID=68324793
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910661953.5A Active CN110399972B (en) | 2019-07-22 | 2019-07-22 | Data processing method and device and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110399972B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112883982A (en) * | 2021-01-08 | 2021-06-01 | 西北工业大学 | Data zero-removing coding and packaging method for neural network sparse features |
CN113032843A (en) * | 2021-03-30 | 2021-06-25 | 北京地平线信息技术有限公司 | Method and apparatus for obtaining and processing tensor data with digitally signed information |
CN113269316A (en) * | 2021-03-26 | 2021-08-17 | 复旦大学 | Sparse data selection logic module supporting sparse neural network computing accelerator |
CN114092708A (en) * | 2021-11-12 | 2022-02-25 | 北京百度网讯科技有限公司 | Characteristic image processing method and device and storage medium |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105095902A (en) * | 2014-05-23 | 2015-11-25 | 华为技术有限公司 | Method and apparatus for extracting image features |
CN106487939A (en) * | 2015-08-26 | 2017-03-08 | 阿里巴巴集团控股有限公司 | A kind of method and apparatus determining User IP subnet, a kind of electronic equipment |
CN107239825A (en) * | 2016-08-22 | 2017-10-10 | 北京深鉴智能科技有限公司 | Consider the deep neural network compression method of load balancing |
CN107527358A (en) * | 2017-08-23 | 2017-12-29 | 北京图森未来科技有限公司 | A kind of dense optical flow method of estimation and device |
CN107729999A (en) * | 2016-08-12 | 2018-02-23 | 北京深鉴科技有限公司 | Consider the deep neural network compression method of matrix correlation |
CN107886164A (en) * | 2017-12-20 | 2018-04-06 | 东软集团股份有限公司 | A kind of convolutional neural networks training, method of testing and training, test device |
US20180129935A1 (en) * | 2016-11-07 | 2018-05-10 | Electronics And Telecommunications Research Institute | Convolutional neural network system and operation method thereof |
WO2018084974A1 (en) * | 2016-11-04 | 2018-05-11 | Google Llc | Convolutional neural network |
EP3396524A1 (en) * | 2017-04-28 | 2018-10-31 | INTEL Corporation | Instructions and logic to perform floating-point and integer operations for machine learning |
CN108805889A (en) * | 2018-05-07 | 2018-11-13 | 中国科学院自动化研究所 | The fining conspicuousness method for segmenting objects of margin guide and system, equipment |
CN109840585A (en) * | 2018-01-10 | 2019-06-04 | 中国科学院计算技术研究所 | A kind of operation method and system towards sparse two-dimensional convolution |
-
2019
- 2019-07-22 CN CN201910661953.5A patent/CN110399972B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105095902A (en) * | 2014-05-23 | 2015-11-25 | 华为技术有限公司 | Method and apparatus for extracting image features |
CN106487939A (en) * | 2015-08-26 | 2017-03-08 | 阿里巴巴集团控股有限公司 | A kind of method and apparatus determining User IP subnet, a kind of electronic equipment |
CN107729999A (en) * | 2016-08-12 | 2018-02-23 | 北京深鉴科技有限公司 | Consider the deep neural network compression method of matrix correlation |
CN107239825A (en) * | 2016-08-22 | 2017-10-10 | 北京深鉴智能科技有限公司 | Consider the deep neural network compression method of load balancing |
WO2018084974A1 (en) * | 2016-11-04 | 2018-05-11 | Google Llc | Convolutional neural network |
US20180129935A1 (en) * | 2016-11-07 | 2018-05-10 | Electronics And Telecommunications Research Institute | Convolutional neural network system and operation method thereof |
EP3396524A1 (en) * | 2017-04-28 | 2018-10-31 | INTEL Corporation | Instructions and logic to perform floating-point and integer operations for machine learning |
CN107527358A (en) * | 2017-08-23 | 2017-12-29 | 北京图森未来科技有限公司 | A kind of dense optical flow method of estimation and device |
CN107886164A (en) * | 2017-12-20 | 2018-04-06 | 东软集团股份有限公司 | A kind of convolutional neural networks training, method of testing and training, test device |
CN109840585A (en) * | 2018-01-10 | 2019-06-04 | 中国科学院计算技术研究所 | A kind of operation method and system towards sparse two-dimensional convolution |
CN108805889A (en) * | 2018-05-07 | 2018-11-13 | 中国科学院自动化研究所 | The fining conspicuousness method for segmenting objects of margin guide and system, equipment |
Non-Patent Citations (2)
Title |
---|
BIPIN B ET AL: "Image Convolution Optimization Using Sparse Matrix Vector Multiplication Technique", 《2016 INTL. CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI)》 * |
连自锋: "基于深层神经网络的图像识别算法研究", 《中国博士学位论文全文数据库信息科技辑》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112883982A (en) * | 2021-01-08 | 2021-06-01 | 西北工业大学 | Data zero-removing coding and packaging method for neural network sparse features |
CN112883982B (en) * | 2021-01-08 | 2023-04-18 | 西北工业大学 | Data zero-removing coding and packaging method for neural network sparse features |
CN113269316A (en) * | 2021-03-26 | 2021-08-17 | 复旦大学 | Sparse data selection logic module supporting sparse neural network computing accelerator |
CN113269316B (en) * | 2021-03-26 | 2022-10-11 | 复旦大学 | Sparse data selection logic module supporting sparse neural network computing accelerator |
CN113032843A (en) * | 2021-03-30 | 2021-06-25 | 北京地平线信息技术有限公司 | Method and apparatus for obtaining and processing tensor data with digitally signed information |
CN113032843B (en) * | 2021-03-30 | 2023-09-15 | 北京地平线信息技术有限公司 | Method and apparatus for obtaining and processing tensor data with digital signature information |
CN114092708A (en) * | 2021-11-12 | 2022-02-25 | 北京百度网讯科技有限公司 | Characteristic image processing method and device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110399972B (en) | 2021-05-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110399972A (en) | Data processing method, device and electronic equipment | |
US9021241B2 (en) | Combined branch target and predicate prediction for instruction blocks | |
CN102460420B (en) | Conditional operation in an internal processor of a memory device | |
CN104040492B (en) | Microprocessor accelerated code optimizer and dependency reordering method | |
CN111656367A (en) | System and architecture for neural network accelerator | |
TWI287747B (en) | Instruction processing method, apparatus and system, and storage medium having stored thereon instructions | |
WO2020147410A1 (en) | Pedestrian detection method and system, computer device, and computer readable storage medium | |
EP3398113B1 (en) | Loop code processor optimizations | |
KR102378887B1 (en) | Method and Apparatus of Bounding Box Regression by a Perimeter-based IoU Loss Function in Object Detection | |
WO2020076392A1 (en) | Modifying machine learning models to improve locality | |
CN103250131A (en) | Single cycle multi-ranch prediction including shadow cache for early far branch prediction | |
CN111400868B (en) | Distributed workshop scheduling optimization method and system with order and robot carrying functions | |
WO2021042763A1 (en) | Image searches based on word vectors and image vectors | |
JP2021505978A (en) | Storage and loading methods, devices, systems and storage media for visual self-location estimation maps | |
CN103189853A (en) | Method and apparatus for providing efficient context classification | |
CN104050710A (en) | 3-d graphics rendering with implicit geometry | |
TW202324209A (en) | Data processing method and non-transitory computer program product for neural network sequential inputs | |
US11714992B1 (en) | Neural network processing based on subgraph recognition | |
CN108875914B (en) | Method and device for preprocessing and post-processing neural network data | |
JP2022516549A (en) | Chip operating frequency setting | |
CN110083433A (en) | Embedded software running method and device, terminal and computer readable storage medium | |
WO2020186518A1 (en) | Method and apparatus for debugging, and system on chip | |
CN112947932A (en) | Method and device for optimizing vectorization in compiling process and electronic equipment | |
WO2017116927A1 (en) | Zero cache memory system extension | |
CN114443174A (en) | Code loading method, code loading device, storage medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |