Specific embodiment
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to
Convenient for description, part relevant to related invention is illustrated only in attached drawing.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase
Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1 shows the method for storing data or device for storing data that can apply the embodiment of the present application
Exemplary system architecture 100.
As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105.
Network 104 between terminal device 101,102,103 and server 105 to provide the medium of communication link.Network 104 can be with
Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be used terminal device 101,102,103 and be interacted by network 104 with server 105, to receive or send out
Send message etc..Various telecommunication customer end applications can be installed, such as image processing class is answered on terminal device 101,102,103
With, video playback class application, web browser applications etc..
Terminal device 101,102,103 can be hardware, be also possible to software.When terminal device 101,102,103 is hard
When part, it can be various electronic equipments, including but not limited to smart phone, tablet computer, pocket computer on knee and desk-top
Computer etc..When terminal device 101,102,103 is software, may be mounted in above-mentioned cited electronic equipment.Its
Multiple softwares or software module (such as providing the software of Distributed Services or software module) may be implemented into, it can also be real
Ready-made single software or software module.It is not specifically limited herein.
Server 105 can be to provide the server of various services, such as to the figure that terminal device 101,102,103 uploads
The back-end data processing server that the data such as picture are handled.Back-end data processing server can use convolutional neural networks pair
The data such as image handled and extraction process used in the process of the weighted data that arrives and storage to first object register
In, used in the process of extraction process to characteristic and storage into the second destination register.
It should be noted that the method provided by the embodiment of the present application for storing data can be held by server 105
Row, can also be executed, correspondingly, device for storing data can be set in server by terminal device 101,102,103
In 105, also it can be set in terminal device 101,102,103.
It should be noted that server can be hardware, it is also possible to software.When server is hardware, may be implemented
At the distributed server cluster that multiple servers form, individual server also may be implemented into.It, can when server is software
To be implemented as multiple softwares or software module (such as providing the software of Distributed Services or software module), also may be implemented
At single software or software module.It is not specifically limited herein.
It should be understood that the number of terminal device, network and server in Fig. 1 is only schematical.According to realization need
It wants, can have any number of terminal device, network and server.The data handled using convolutional neural networks not
It needs in the case where long-range obtain, above system framework can not include network, and only need terminal device or server.
With continued reference to Fig. 2, the process of one embodiment of the method for storing data according to the application is shown
200.The method for storing data, comprising the following steps:
Step 201, from preset convolutional neural networks, preset quantity target signature matrix and preset quantity are determined
Target weight matrix.
In the present embodiment, (such as server shown in FIG. 1 or electronics are set the executing subject of method for storing data
It is standby) preset quantity target signature matrix and preset quantity target weight can be determined from preset convolutional neural networks
Matrix.Wherein, target signature matrix includes characteristic, and target weight matrix includes weighted data.Above-mentioned convolutional neural networks
It can be set in advance in above-mentioned executing subject, for handling initial data (such as picture, term vector etc.).In general,
The convolutional neural networks handled initial data may include convolutional layer, and convolutional layer includes eigenmatrix (i.e. rectangular again
The characteristic pattern (feature map) of formula) and weight matrix (i.e. convolution kernel, also known as filter).The characteristic that eigenmatrix includes
According to can be the data extracted from initial data (such as R (red, Red) G (green, Green) B (blue, Blue) value of pixel),
It can be the number of some layer (such as convolutional layer, pond layer etc.) output in the convolutional neural networks for handling above-mentioned initial data
According to.Weight matrix includes that weighted data is that convolutional neural networks are trained with identified data.In general, eigenmatrix and power
Weight matrix is after convolution algorithm, available new characteristic.
In the present embodiment, above-mentioned executing subject can be in various manners from the eigenmatrix in convolutional neural networks
In, determine preset quantity target signature matrix.Wherein, target signature matrix can be to itself and corresponding target weight square
Battle array carries out the eigenmatrix of convolution algorithm.As an example, above-mentioned executing subject can be random to extract in advance from each eigenmatrix
If quantity eigenmatrix is as target signature matrix.
In some optional implementations of the present embodiment, preset quantity target signature matrix is contained in convolutional Neural
The eigenmatrix set that destination layer in network includes, eigenmatrix set are divided at least one subclass, subset in advance
Closing includes preset quantity eigenmatrix.Wherein, destination layer can be the eigenmatrix set to include to it and carry out convolution fortune
The layer of calculation, such as convolutional layer.Above-mentioned executing subject can determine pre- in accordance with the following steps, from preset convolutional neural networks
If quantity target signature matrix and preset quantity target weight matrix:
Firstly, subclass is selected from least one above-mentioned subclass, the eigenmatrix for including by selected subclass
It is determined as target signature matrix.As an example, above-mentioned executing subject can randomly choose son from least one above-mentioned subclass
Set.Alternatively, the eigenmatrix in convolutional neural networks can have corresponding channel number, above-mentioned executing subject can be according to
The sequence of the corresponding channel number of eigenmatrix selects the subset of pending convolution algorithm from least one above-mentioned subclass
It closes.In general, a certain layer (such as convolutional layer, pond layer etc.) in convolutional neural networks may include multiple channels, each channel
A kind of feature (such as shape feature, color characteristic of picture etc.) can be corresponded to.
As an example it is supposed that the channel number for the eigenmatrix that eigenmatrix set includes is successively are as follows: 1,2 ..., 8, the spy
Sign set of matrices be divided into two subclass A and B in advance, wherein the channel number for the eigenmatrix that subclass A includes be 1,
2,3,4, the channel number for the eigenmatrix that subclass B includes is 5,6,7,8.Above-mentioned executing subject can be first by subclass A
Including eigenmatrix be determined as target signature matrix.
Then, for the target signature matrix in identified preset quantity target signature matrix, the determining and target
Eigenmatrix is corresponding, weight matrix for carrying out convolution algorithm is as target weight matrix.Wherein, target signature matrix and
The corresponding relationship of target weight matrix is pre-set.
In some optional implementations of the present embodiment, preset quantity target signature matrix and preset quantity mesh
Mark weight matrix is stored in advance in preset cache.Wherein, preset cache can be the CPU (Central of above-mentioned executing subject
Processing Unit, central processing unit) include caching (such as level-one (L1) caching, second level (L2) caching etc.).Due to electricity
Sub- equipment extracts number from the storage equipment such as other memories, hard disk when carrying out data operation, from caching reading data ratio
According to it is more efficient, therefore, the eigenmatrix in convolutional neural networks can be loaded into above-mentioned pre- by above-mentioned executing subject in advance
If in caching, so as to improve the efficiency of data access.
In some optional implementations of the present embodiment, above-mentioned preset quantity is preset single-instruction multiple-data stream (SIMD)
The digit for the characteristic that eigenmatrix in the digit and convolutional neural networks of the data of SIMD instruction single-trial extraction includes
Quotient.As an example it is supposed that SIMD instruction is 64 bit instructions, i.e., the digit of the data of single-trial extraction is 64, above-mentioned convolutional Neural
The digit for the characteristic that eigenmatrix in network includes is 16, then preset quantity is 64/16=4.
Optionally, above-mentioned SIMD instruction can instruct for NEON, wherein NEON instruction is to be suitable for embedded microprocessor
A kind of SIMD instruction, it is simplified transplanting of the software between different platform, is able to ascend at data using special design
The speed of reason reduces hardware power consumption.It should be appreciated that above-mentioned executing subject can also use other in addition to above-mentioned NEON instruction
SIMD instruction, such as SSE (Streaming SIMD Extensions, single-instruction multiple-data stream (SIMD) extension) instruction etc..
Step 202, for the target signature matrix in preset quantity target signature matrix, from the target signature matrix
In, determine the subcharacter matrix that convolution algorithm is carried out to target weight matrix corresponding with the target signature matrix.
In the present embodiment, for the target signature matrix in above-mentioned preset quantity target signature matrix, above-mentioned execution
Main body can determine from the target signature matrix and carry out convolution fortune to target weight matrix corresponding with the target signature matrix
The subcharacter matrix of calculation.
In general, needing to extract and corresponding mesh from target signature matrix when convolutional neural networks carry out convolution algorithm
The subcharacter matrix that the line number and columns for marking weight matrix are equal, will be in phase in subcharacter matrix and target weight matrix
Data with position are multiplied.Wherein, the corresponding relationship of target signature matrix and target weight matrix is pre-set.By holding
This step of row, the sub- eigenmatrix of available preset quantity.It should be noted that since convolutional neural networks are extensive at present
The well-known technique of research and application, about the method for the determining subcharacter matrix for carrying out convolution algorithm with weight matrix, here not
It repeats again.
In some optional implementations of the present embodiment, characteristic that the eigenmatrix in convolutional neural networks includes
It is the fixed-point number of presetting digit capacity with weight matrix according to the weighted data for including.Due to electronic equipment to the operation of fixed-point number compared to
It is higher to the operation efficiency of floating number, therefore, under the not high occasion of the required precision of the processing result to convolutional neural networks
(such as the terminal devices such as mobile phone, tablet computer run convolutional neural networks), characteristic and power in convolutional neural networks
Tuple is according to can be set to fixed-point number, to improve operation efficiency.Set default for the digit of characteristic and weighted data
Digit, can contribute to the digit for the data for making full use of register that can store, to improve the access efficiency of register.
Step 203, following storing step is executed: for the target weight matrix in preset quantity target weight matrix,
The weighted data not extracted and storage are extracted from the target weight matrix into first object register;For identified
Subcharacter matrix in subcharacter matrix extracts the characteristic that do not extracted and storage to the second mesh from the subcharacter matrix
In scalar register file;It determines in preset quantity target weight matrix and identified subcharacter matrix with the presence or absence of not extracting
Data.
In the present embodiment, above-mentioned executing subject can execute following storing step:
Step 2031, for the target weight matrix in preset quantity target weight matrix, from the target weight matrix
The middle weighted data not extracted and storage of extracting is into first object register.
Specifically, above-mentioned executing subject can be suitable according to the arrangement of the position for the weighted data that target weight matrix includes
Sequence extracts weighted data from target weight matrix, by the storage of extracted weighted data into first object register.As
Example, weighted data can have the line number and row number of position corresponding, for being characterized in target weight matrix, above-mentioned to hold
Row main body can be (or minimum from minimum line number included by each target weight matrix, in the weighted data that did not extracted, is extracted
Row number) in corresponding a line (or a column) weighted data, the corresponding weighted data of minimum row number (or minimum line number).
Above-mentioned first object register can be pre-set for storing the register of weighted data.Above-mentioned first mesh
Scalar register file can be the register at least one pre-assigned register of above-mentioned executing subject.At least one above-mentioned deposit
Device can be the register for including in the CPU of above-mentioned executing subject.Above-mentioned executing subject can be from least one above-mentioned register
In, in various manners (such as the sequence of the number according to register, or according to weighted data that is preconfigured, extracting
With the corresponding relationship of register) mask register is as first object register.
In some optional implementations of the present embodiment, above-mentioned executing subject can be based on above-mentioned SIMD instruction from pre-
If the weighted data not extracted and storage are extracted in quantity target weight matrix respectively into first object register.As
Example, it is assumed that above-mentioned preset quantity is 4, and SIMD instruction can once extract 4 data, then above-mentioned executing subject can be from 4
In target weight matrix, a weighted data is extracted respectively.Wherein, each weighted data of extraction is in affiliated target weight square
Position in battle array is identical.For existing SISD (Single Instruction Single Data stream, single instruction stream
Single data stream), each instruction can only extract a data.And for SIMD, an instruction can extract multiple data.Because more
The processing of a data is parallel, therefore for the time, the time of an instruction execution, and SISD and SIMD are much the same.
Since SIMD once can handle N (N is positive integer) a data, so the time of its processing also just shortens to the processing of SISD
Time 1/N.Above-mentioned executing subject uses SIMD instruction, and the efficiency for extracting weighted data can be improved.
Step 2032, it for the subcharacter matrix in identified subcharacter matrix, is extracted not from the subcharacter matrix
The characteristic extracted and storage are into the second destination register.
Specifically, above-mentioned executing subject can putting in order according to the position for the characteristic that subcharacter matrix includes,
Characteristic is extracted from subcharacter matrix, by the storage of extracted characteristic into the second destination register.As an example,
Characteristic can have the line number and row number of position that is corresponding, being characterized in subcharacter matrix, and above-mentioned executing subject can be with
From characteristic in each subcharacter matrix, not extracting, the corresponding a line of minimum line number (or minimum row number) is extracted
In (or one column) characteristic, the corresponding characteristic of minimum row number (or minimum line number).
Above-mentioned second destination register can be register pre-set, for storing weighted data.Above-mentioned second
Destination register can be the register at least one pre-assigned register of above-mentioned executing subject.It is above-mentioned at least one post
Storage can be the register for including in the CPU of above-mentioned executing subject.Above-mentioned executing subject can be from least one above-mentioned deposit
In device, in various manners (such as the sequence of the number according to register, or according to characteristic that is preconfigured, extracting
According to the corresponding relationship with register) mask register is as the second destination register.
In some optional implementations of the present embodiment, above-mentioned executing subject can be based on above-mentioned SIMD instruction from institute
The characteristic that do not extracted and storage are extracted in determining subcharacter matrix respectively into the second destination register.As showing
Example, it is assumed that above-mentioned preset quantity is 4, and SIMD instruction can once extract 4 data, then above-mentioned executing subject can be from 4 sons
In eigenmatrix, a characteristic is extracted respectively and is stored into the second destination register.Wherein, each characteristic of extraction
It is identical according to the position in affiliated subcharacter matrix.
Step 2033, determining, which whether there is in preset quantity target weight matrix and identified subcharacter matrix, does not mention
The data taken.
Specifically, above-mentioned executing subject can determine in preset quantity target weight matrix with the presence or absence of not extracting
Weighted data, and determine in identified subcharacter matrix with the presence or absence of the characteristic that do not extracted.If it is determined that default
There is the weighted data not extracted in quantity target weight matrix, and exists in identified subcharacter matrix and do not extract
The characteristic crossed, it is determined that exist in preset quantity target weight matrix and identified subcharacter matrix and do not extracted
Data.
Step 204, exist in response to determining, continue to execute storing step.
In the present embodiment, above-mentioned executing subject can be in response to determining preset quantity target weight matrix and determining
Subcharacter matrix in there are the data do not extracted, continue to execute above-mentioned storing step.
With continued reference to the signal that Fig. 3, Fig. 3 are according to the application scenarios of the method for storing data of the present embodiment
Figure.In the application scenarios of Fig. 3, be provided with convolutional neural networks 302 on terminal device 301, convolutional neural networks 302 for pair
The image of input is handled, and in treatment process, generates the eigenmatrix that convolutional neural networks include.Terminal device 301 is first
In the eigenmatrix that layer from convolution algorithms in convolutional neural networks 302, pending includes, four (i.e. preset quantities are selected
It is a) target signature matrix 303,304,305,306 and corresponding four target weight matrixes 307,308,309,310.Then, eventually
End equipment 301 determines subcharacter matrix that each target signature matrix includes, to carry out convolution algorithm with target weight matrix
3031,3041,3051,3061.Then, terminal device 301 executes following storing step: extracting from each target weight matrix
One weighted data not extracted and storage are extracted one from each subcharacter matrix and are not mentioned into first object register
The characteristic taken and storage are into the second destination register;It determines in four target weight matrixes and four sub- eigenmatrixes
With the presence or absence of the data that do not extracted, if it does, continuing to execute above-mentioned storing step.By being repeatedly carried out above-mentioned storage step
Suddenly, above-mentioned terminal device 301 is by the characteristic in the weighted data and four sub- eigenmatrixes in four target weight matrixes
All storage is into register.For example, first time execute storing step when, terminal device 301 from target weight matrix 307,
308, the weighted data 3071,3081,3091,3101 that out position is in the first row first row is extracted respectively in 309,310, and deposit
Store up first object register D1;It extracts out position respectively from subcharacter matrix 3031,3041,3051,3061 and is in first
The characteristic 30311,30411,30511,30611 of row first row, and store to the second destination register D2.
The method provided by the above embodiment of the application, it is default by determining first from preset convolutional neural networks
Quantity target signature matrix and preset quantity target weight matrix.Then in preset quantity target signature matrix
Target signature matrix is determined and is carried out to target weight matrix corresponding with the target signature matrix from the target signature matrix
The subcharacter matrix of convolution algorithm.Finally repeatedly from preset quantity target weight matrix, extracts do not extracted respectively
Weighted data and storage are into first object register;From identified subcharacter matrix, the spy not extracted is extracted respectively
Data and storage are levied into the second destination register.So as in batch by preset quantity target signature matrix and default
All storage facilitates the feature fast using the access speed of register, improves quantity target weight matrix into register
The operation efficiency of convolutional neural networks.
With further reference to Fig. 4, it illustrates the processes 400 of another embodiment of method for storing data.The use
In the process 400 of the method for storing data, comprising the following steps:
Step 401, from preset convolutional neural networks, preset quantity target signature matrix and preset quantity are determined
Target weight matrix.
In the present embodiment, step 401 and the step 201 in Fig. 2 corresponding embodiment are almost the same, and which is not described herein again.
Step 402, for the target signature matrix in preset quantity target signature matrix, from the target signature matrix
In, determine the subcharacter matrix that convolution algorithm is carried out to target weight matrix corresponding with the target signature matrix.
In the present embodiment, step 402 and the step 202 in Fig. 2 corresponding embodiment are almost the same, and which is not described herein again.
In the present embodiment, above-mentioned executing subject can continue to execute following storing step after executing the step 401,
That is step 403- step 407:
Step 403, for the target weight matrix in preset quantity target weight matrix, from the target weight matrix
The weighted data not extracted and storage are extracted into first object register.
In the present embodiment, step 403 and the step 2031 in Fig. 2 corresponding embodiment are almost the same, and which is not described herein again.
Step 404, it for the subcharacter matrix in identified subcharacter matrix, extracts from the subcharacter matrix and does not mention
The characteristic taken and storage are into the second destination register.
In the present embodiment, step 404 and the step 2032 in Fig. 2 corresponding embodiment are almost the same, and which is not described herein again.
Step 405, for the weighted data being stored in each weighted data in first object register, by the weight
Data obtain product multiplied by characteristic that is corresponding, being stored in the second destination register.
In the present embodiment, it for the weighted data being stored in each weighted data in first object register, uses
In the method for storing data executing subject (such as server shown in FIG. 1 or terminal device) can by the weighted data multiplied by
Characteristic that is corresponding, being stored in the second destination register, obtains product.
Specifically, as an example it is supposed that the data stored in first object register include: A, B, C, D, the second target is posted
The data stored in storage include: E, F, G, H, wherein the position of A, B, C, D in weight matrix is respectively with E, F, G, H in son
Position in eigenmatrix is identical, i.e., A, B, C, D correspond respectively to E, F, G, H, then the product obtained includes: A × E, B × F, C
×G、D×H。
Step 406, by the storage of obtained product into preset storage region.
In the present embodiment, above-mentioned executing subject can store obtained product into preset storage region.Its
In, preset storage region can be the storage region of the fast speed of access data, such as the CPU of above-mentioned executing subject includes
Caching (such as level cache, L2 cache etc.) or above-mentioned executing subject CPU include register (with storage feature
The data register different with the register of weight data).Since above-mentioned preset storage region has access data faster
Feature, therefore the storage of obtained product can contribute to after convolutional neural networks carry out into preset storage region
When continuous calculating, operation efficiency is further increased.
Step 407, determining, which whether there is in preset quantity target weight matrix and identified subcharacter matrix, does not mention
The data taken.
In the present embodiment, step 407 and the step 2033 in Fig. 2 corresponding embodiment are almost the same, and which is not described herein again.
Step 408, exist in response to determining, continue to execute above-mentioned storing step.
In the present embodiment, step 408 and the step 204 in Fig. 2 corresponding embodiment are almost the same, and which is not described herein again.
Figure 4, it is seen that the method for storing data compared with the corresponding embodiment of Fig. 2, in the present embodiment
Process 400 highlight to the characteristic phase in each weighted data and the second destination register in first object register
The step of multiplying and storing.The scheme of the present embodiment description is utilized obtained product storage to preset storage region as a result,
In, it can contribute to further increase operation efficiency when convolutional neural networks carry out subsequent calculating.
With further reference to Fig. 5, as the realization to method shown in above-mentioned each figure, this application provides one kind for storing number
According to device one embodiment, the Installation practice is corresponding with embodiment of the method shown in Fig. 2, which can specifically answer
For in various electronic equipments.
As shown in figure 5, the device 500 for storing data of the present embodiment includes: the first determination unit 501, it is configured
At from preset convolutional neural networks, determining preset quantity target signature matrix and preset quantity target weight matrix;
Second determination unit 502 is configured to for the target signature matrix in preset quantity target signature matrix, from target spy
It levies in matrix, determines the subcharacter matrix for carrying out convolution algorithm to target weight matrix corresponding with the target signature matrix;It deposits
Storage unit 503 is configured to execute following storing step: for the target weight square in preset quantity target weight matrix
Battle array extracts the weighted data not extracted and storage into first object register from the target weight matrix;For really
Subcharacter matrix in fixed subcharacter matrix extracts the characteristic that do not extracted and storage to the from the subcharacter matrix
In two destination registers;Determining, which whether there is in preset quantity target weight matrix and identified subcharacter matrix, does not extract
The data crossed;Third determination unit 504 is configured in response to determine presence, continues to execute storing step.
In the present embodiment, the first determination unit 501 can determine preset quantity from preset convolutional neural networks
Target signature matrix and preset quantity target weight matrix.Wherein, target signature matrix includes characteristic, target weight square
Battle array includes weighted data.Above-mentioned convolutional neural networks can be set in advance in above-mentioned apparatus 500, for initial data (example
Such as picture, term vector) it is handled.In general, the convolutional neural networks handled initial data may include convolutional layer,
Convolutional layer include again eigenmatrix (i.e. the characteristic pattern (feature map) of matrix form) and weight matrix (i.e. convolution kernel, also known as
Filter).The characteristic that eigenmatrix includes can be extracted from initial data data (such as pixel R (it is red,
Red) G (green, Green) B (blue, Blue) value), some layer being also possible in the convolutional neural networks for handling above-mentioned initial data
The data of (such as convolutional layer, pond layer etc.) output.Weight matrix includes that weighted data is trained to convolutional neural networks
Identified data.In general, eigenmatrix and weight matrix be after convolution algorithm, available new characteristic.
In the present embodiment, above-mentioned first determination unit 501 can be in various manners from the spy in convolutional neural networks
It levies in matrix, determines preset quantity target signature matrix.Wherein, target signature matrix can be to itself and corresponding target
The eigenmatrix of weight matrix progress convolution algorithm.As an example, above-mentioned executing subject can be from each eigenmatrix, at random
Preset quantity eigenmatrix is extracted as target signature matrix.
In the present embodiment, for the target signature matrix in above-mentioned preset quantity target signature matrix, above-mentioned second
Determination unit 502 can from the target signature matrix, determine to target weight matrix corresponding with the target signature matrix into
The subcharacter matrix of row convolution algorithm.
In general, needing to extract and corresponding mesh from target signature matrix when convolutional neural networks carry out convolution algorithm
The subcharacter matrix that the line number and columns for marking weight matrix are equal, will be in phase in subcharacter matrix and target weight matrix
Data with position are multiplied.Wherein, the corresponding relationship of target signature matrix and target weight matrix is pre-set.By holding
This step of row, the sub- eigenmatrix of available preset quantity.It should be noted that since convolutional neural networks are extensive at present
The well-known technique of research and application, about the method for the determining subcharacter matrix for carrying out convolution algorithm with weight matrix, here not
It repeats again.
In the present embodiment, storage unit 503 can execute following storing step:
Step 5031, for the target weight matrix in preset quantity target weight matrix, from the target weight matrix
The middle weighted data not extracted and storage of extracting is into first object register.
Specifically, said memory cells 503 can be according to the arrangement of the position for the weighted data that target weight matrix includes
Sequentially, weighted data is extracted from target weight matrix, by the storage of extracted weighted data into first object register.Make
For example, weighted data can have the line number and row number of position corresponding, for being characterized in target weight matrix, above-mentioned
Executing subject can be from included by each target weight matrix, in the weighted data that did not extracted, extracting minimum line number (or most
Small row number) in corresponding a line (or a column) weighted data, the corresponding weighted data of minimum row number (or minimum line number).
Above-mentioned first object register can be pre-set for storing the register of weighted data.Above-mentioned first mesh
Scalar register file can be the register at least one the pre-assigned register of above-mentioned apparatus 500.At least one above-mentioned deposit
Device can be the register for including in the CPU of above-mentioned apparatus 500.Above-mentioned apparatus 500 can from least one above-mentioned register,
In various manners (such as the sequence of the number according to register, or according to weighted data that is preconfigured, extracting with post
The corresponding relationship of storage) mask register is as first object register.
Step 5032, it for the subcharacter matrix in identified subcharacter matrix, is extracted not from the subcharacter matrix
The characteristic extracted and storage are into the second destination register.
Specifically, said memory cells 503 can be suitable according to the arrangement of the position for the characteristic that subcharacter matrix includes
Sequence extracts characteristic from subcharacter matrix, by the storage of extracted characteristic into the second destination register.As showing
Example, characteristic can have the line number and row number of position that is corresponding, being characterized in subcharacter matrix, and above-mentioned executing subject can
From characteristic in each subcharacter matrix, not extracting, to extract minimum line number (or minimum row number) corresponding one
In row (or one column) characteristic, the corresponding characteristic of minimum row number (or minimum line number).
Above-mentioned second destination register can be pre-set for storing the register of weighted data.Above-mentioned second mesh
Scalar register file can be the register at least one the pre-assigned register of above-mentioned apparatus 500.At least one above-mentioned deposit
Device can be the register for including in the CPU of above-mentioned executing subject.Above-mentioned apparatus 500 can be from least one above-mentioned register
In, in various manners (such as the sequence of the number according to register, or according to characteristic that is preconfigured, extracting
With the corresponding relationship of register) mask register is as the second destination register.
Step 5033, determining, which whether there is in preset quantity target weight matrix and identified subcharacter matrix, does not mention
The data taken.
Specifically, said memory cells 503 can be determined to whether there is in preset quantity target weight matrix and not extracted
The weighted data crossed, and determine in identified subcharacter matrix with the presence or absence of the characteristic that do not extracted.If it is determined that
There is the weighted data not extracted in preset quantity target weight matrix, and exists not in identified subcharacter matrix
The characteristic extracted, it is determined that exist in preset quantity target weight matrix and identified subcharacter matrix and do not extract
The data crossed.
In the present embodiment, third determination unit 504 can be in response to determining preset quantity target weight matrix and institute
There are the data that do not extracted in determining subcharacter matrix, continues to execute above-mentioned storing step.
In some optional implementations of the present embodiment, storage unit 503 may include: computing module (in figure not
Show), it is configured to for the weighted data in each weighted data for being stored in first object register, by the weight number
According to multiplied by characteristic that is corresponding, being stored in the second destination register, product is obtained;Memory module (not shown),
It is configured to store obtained product.
In some optional implementations of the present embodiment, characteristic that the eigenmatrix in convolutional neural networks includes
It is the fixed-point number of presetting digit capacity with weight matrix according to the weighted data for including.
In some optional implementations of the present embodiment, above-mentioned preset quantity target signature matrix and preset quantity
A target weight matrix is stored in advance in preset cache.
In some optional implementations of the present embodiment, above-mentioned preset quantity target signature matrix is contained in convolution
The eigenmatrix set that destination layer in neural network includes, eigenmatrix set are divided at least one subclass in advance,
Wherein, subclass includes preset quantity eigenmatrix;And first determination unit 501 may include: selecting module (in figure not
Show), it is configured to select subclass from least one subclass, the eigenmatrix for including by selected subclass determines
For target signature matrix;Determining module (not shown) is configured to for identified preset quantity target signature square
Target signature matrix in battle array determines weight matrix conduct corresponding with the target signature matrix, for carrying out convolution algorithm
Target weight matrix.
In some optional implementations of the present embodiment, preset quantity is that preset single-instruction multiple-data stream (SIMD) SIMD refers to
The quotient of the digit for the characteristic for enabling the eigenmatrix in the digit and convolutional neural networks of the data of single-trial extraction include.
In some optional implementations of the present embodiment, the beginning of storage unit 503 is further configured to: being based on
SIMD instruction extracts the weighted data not extracted and storage to first object respectively from preset quantity target weight matrix
In register.
In some optional implementations of the present embodiment, storage unit 503 can be further configured to: be based on
SIMD instruction extracts the characteristic that do not extracted respectively from identified subcharacter matrix and storage is deposited to the second target
In device.
The device provided by the above embodiment of the application, it is default by determining first from preset convolutional neural networks
Quantity target signature matrix and preset quantity target weight matrix.Then in preset quantity target signature matrix
Target signature matrix is determined and is carried out to target weight matrix corresponding with the target signature matrix from the target signature matrix
The subcharacter matrix of convolution algorithm.Finally repeatedly from preset quantity target weight matrix, extracts do not extracted respectively
Weighted data and storage are into first object register;From identified subcharacter matrix, the spy not extracted is extracted respectively
Data and storage are levied into the second destination register.So as in batch by preset quantity target signature matrix and default
All storage facilitates the feature fast using the access speed of register, improves quantity target weight matrix into register
The operation efficiency of convolutional neural networks.
Below with reference to Fig. 6, it is (such as shown in FIG. 1 that it illustrates the electronic equipments for being suitable for being used to realize the embodiment of the present application
Server or terminal device) computer system 600 structural schematic diagram.Electronic equipment shown in Fig. 6 is only an example,
Should not function to the embodiment of the present application and use scope bring any restrictions.
As shown in fig. 6, computer system 600 includes central processing unit (CPU) 601, it can be read-only according to being stored in
Program in memory (ROM) 602 or be loaded into the program in random access storage device (RAM) 603 from storage section 608 and
Execute various movements appropriate and processing.In RAM 603, also it is stored with system 600 and operates required various programs and data.
CPU 601, ROM 602 and RAM 603 are connected with each other by bus 604.Input/output (I/O) interface 605 is also connected to always
Line 604.
I/O interface 605 is connected to lower component: the importation 606 including keyboard, mouse etc.;Including such as liquid crystal
Show the output par, c 607 of device (LCD) etc. and loudspeaker etc.;Storage section 608 including hard disk etc.;And including such as LAN
The communications portion 609 of the network interface card of card, modem etc..Communications portion 609 is executed via the network of such as internet
Communication process.Driver 610 is also connected to I/O interface 605 as needed.Detachable media 611, such as disk, CD, magneto-optic
Disk, semiconductor memory etc. are mounted on as needed on driver 610, in order to from the computer program root read thereon
According to needing to be mounted into storage section 608.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description
Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium
On computer program, which includes the program code for method shown in execution flow chart.In such reality
It applies in example, which can be downloaded and installed from network by communications portion 609, and/or from detachable media
611 are mounted.When the computer program is executed by central processing unit (CPU) 601, limited in execution the present processes
Above-mentioned function.
It should be noted that computer-readable medium described herein can be computer-readable signal media or meter
Calculation machine readable medium either the two any combination.Computer-readable medium for example may be-but not limited to-
Electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.It is computer-readable
The more specific example of medium can include but is not limited to: have electrical connection, the portable computer magnetic of one or more conducting wires
Disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM or sudden strain of a muscle
Deposit), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device or above-mentioned appoint
The suitable combination of meaning.In this application, computer-readable medium can be any tangible medium for including or store program, the journey
Sequence can be commanded execution system, device or device use or in connection.And in this application, it is computer-readable
Signal media may include in a base band or as carrier wave a part propagate data-signal, wherein carrying computer can
The program code of reading.The data-signal of this propagation can take various forms, including but not limited to electromagnetic signal, optical signal or
Above-mentioned any appropriate combination.Computer-readable signal media can also be any calculating other than computer-readable medium
Machine readable medium, the computer-readable medium can be sent, propagated or transmitted for by instruction execution system, device or device
Part uses or program in connection.The program code for including on computer-readable medium can use any Jie appropriate
Matter transmission, including but not limited to: wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.
The calculating of the operation for executing the application can be write with one or more programming languages or combinations thereof
Machine program code, described program design language include object oriented program language-such as Java, Smalltalk, C+
+, it further include conventional procedural programming language-such as " C " language or similar programming language.Program code can
Fully to execute, partly execute on the user computer on the user computer, be executed as an independent software package,
Part executes on the remote computer or executes on a remote computer or server completely on the user computer for part.
In situations involving remote computers, remote computer can pass through the network of any kind --- including local area network (LAN)
Or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as utilize Internet service
Provider is connected by internet).
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the application, method and computer journey
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation
A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use
The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box
The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually
It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse
Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding
The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction
Combination realize.
Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard
The mode of part is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor packet
Include the first determination unit, the second determination unit, storage unit and third determination unit.Wherein, the title of these units is at certain
In the case of do not constitute restriction to the unit itself, for example, the first determination unit is also described as " from preset convolution
In neural network, the unit of preset quantity target signature matrix and preset quantity target weight matrix is determined ".
As on the other hand, present invention also provides a kind of computer-readable medium, which be can be
Included in electronic equipment described in above-described embodiment;It is also possible to individualism, and without in the supplying electronic equipment.
Above-mentioned computer-readable medium carries one or more program, when said one or multiple programs are held by the electronic equipment
When row, so that the electronic equipment: from preset convolutional neural networks, determining preset quantity target signature matrix and present count
Measure a target weight matrix;For the target signature matrix in preset quantity target signature matrix, from the target signature matrix
In, determine the subcharacter matrix that convolution algorithm is carried out to target weight matrix corresponding with the target signature matrix;It executes as follows
Storing step: it for the target weight matrix in preset quantity target weight matrix, is extracted not from the target weight matrix
The weighted data extracted and storage are into first object register;For the subcharacter square in identified subcharacter matrix
Battle array extracts the characteristic that do not extracted and storage into the second destination register from the subcharacter matrix;Determine present count
It measures in a target weight matrix and identified subcharacter matrix with the presence or absence of the data that do not extracted;Exist in response to determining,
Continue to execute storing step.
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.Those skilled in the art
Member is it should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic
Scheme, while should also cover in the case where not departing from foregoing invention design, it is carried out by above-mentioned technical characteristic or its equivalent feature
Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed herein
Can technical characteristic replaced mutually and the technical solution that is formed.