CN109375952A

CN109375952A - Method and apparatus for storing data

Info

Publication number: CN109375952A
Application number: CN201811149864.4A
Authority: CN
Inventors: 胡耀全
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Douyin Vision Co Ltd; Douyin Vision Beijing Co Ltd
Priority date: 2018-09-29
Filing date: 2018-09-29
Publication date: 2019-02-22
Anticipated expiration: 2038-09-29
Also published as: CN109375952B

Abstract

The embodiment of the present application discloses method and apparatus for storing data.One specific embodiment of this method includes: to determine preset quantity target signature matrix and preset quantity target weight matrix from preset convolutional neural networks；Subcharacter matrix is determined from target signature matrix；It executes following storing step: for the target weight matrix in preset quantity target weight matrix, the weighted data not extracted and storage is extracted from the target weight matrix into first object register；For the subcharacter matrix in identified subcharacter matrix, the characteristic that do not extracted and storage are extracted from the subcharacter matrix into the second destination register；In response to determining the data for existing in preset quantity target weight matrix and identified subcharacter matrix and not extracting, storing step is continued to execute.The embodiment helps to improve the operation efficiency of convolutional neural networks.

Description

Method and apparatus for storing data

Technical field

The invention relates to field of computer technology, and in particular to method and apparatus for storing data.

Background technique

Convolutional neural networks (Convolutional Neural Network, CNN) are a kind of feedforward neural networks, it Artificial neuron can respond the surrounding cells in a part of coverage area, have outstanding performance for large-scale image procossing.CNN packet Include convolutional layer (convolutional layer), pond layer (pooling layer) etc..It is carried out to the data in these layers When convolution algorithm, it usually needs the spy for including by eigenmatrix therein (i.e. the characteristic pattern (feature map) of matrix form) Sign data are multiplied with the weighted data that weight matrix (i.e. the convolution kernel (also known as filter) of matrix form) includes.

Summary of the invention

The embodiment of the present application proposes method and apparatus for storing data.

In a first aspect, the embodiment of the present application provides a kind of method for storing data, this method comprises: from preset In convolutional neural networks, preset quantity target signature matrix and preset quantity target weight matrix are determined；For present count Measure the target signature matrix in target signature matrix, from the target signature matrix, determine to the target signature matrix pair The target weight matrix answered carries out the subcharacter matrix of convolution algorithm；Execute following storing step: for preset quantity target Target weight matrix in weight matrix extracts the weighted data not extracted and storage to first from the target weight matrix In destination register；For the subcharacter matrix in identified subcharacter matrix, extracts from the subcharacter matrix and do not extract The characteristic crossed and storage are into the second destination register；Determine that preset quantity target weight matrix and identified son are special With the presence or absence of the data that do not extracted in sign matrix；Exist in response to determining, continues to execute storing step.

In some embodiments, storing step further include: for each weight number being stored in first object register Weighted data in is multiplied by the weighted data multiplied by characteristic that is corresponding, being stored in the second destination register Product；Store obtained product.

In some embodiments, the characteristic and weight matrix that the eigenmatrix in convolutional neural networks includes include Weighted data is the fixed-point number of presetting digit capacity.

In some embodiments, preset quantity target signature matrix and preset quantity target weight matrix are stored in advance In preset cache.

In some embodiments, preset quantity target signature matrix is contained in the destination layer in convolutional neural networks and includes Eigenmatrix set, eigenmatrix set is divided at least one subclass in advance, wherein subclass includes preset quantity A eigenmatrix；And from preset convolutional neural networks, preset quantity target signature matrix and preset quantity are determined Target weight matrix, comprising: select subclass from least one subclass, the eigenmatrix for including by selected subclass It is determined as target signature matrix；For the target signature matrix in identified preset quantity target signature matrix, determine with The target signature matrix is corresponding, weight matrix for carrying out convolution algorithm is as target weight matrix.

In some embodiments, preset quantity is the data of preset single-instruction multiple-data stream (SIMD) SIMD instruction single-trial extraction The quotient of the digit for the characteristic that eigenmatrix in digit and convolutional neural networks includes.

In some embodiments, it for the target weight matrix in preset quantity target weight matrix, is weighed from the target The weighted data not extracted and storage are extracted in weight matrix into first object register, comprising: based on SIMD instruction from pre- If the weighted data not extracted and storage are extracted in quantity target weight matrix respectively into first object register.

In some embodiments, for the subcharacter matrix in identified subcharacter matrix, from the subcharacter matrix The characteristic do not extracted and storage are extracted into the second destination register, comprising: based on SIMD instruction from identified son The characteristic that do not extracted and storage are extracted in eigenmatrix respectively into the second destination register.

Second aspect, the embodiment of the present application provide a kind of device for storing data, which includes: first determining Unit is configured to from preset convolutional neural networks, determines preset quantity target signature matrix and preset quantity mesh Mark weight matrix；Second determination unit, is configured to for the target signature matrix in preset quantity target signature matrix, from In the target signature matrix, the subcharacter that convolution algorithm is carried out to target weight matrix corresponding with the target signature matrix is determined Matrix；Storage unit is configured to execute following storing step: for the target weight in preset quantity target weight matrix Matrix extracts the weighted data not extracted and storage into first object register from the target weight matrix；For institute Subcharacter matrix in determining subcharacter matrix, extracted from the subcharacter matrix characteristic that do not extracted and storage to In second destination register；Determining, which whether there is in preset quantity target weight matrix and identified subcharacter matrix, does not mention The data taken；Third determination unit is configured in response to determine presence, continues to execute storing step.

In some embodiments, storage unit includes: computing module, is configured to for being stored in first object register In each weighted data in weighted data, by the weighted data multiplied by it is corresponding, be stored in the second destination register Characteristic obtains product；Memory module is configured to store obtained product.

In some embodiments, preset quantity target signature matrix is contained in the destination layer in convolutional neural networks and includes Eigenmatrix set, eigenmatrix set is divided at least one subclass in advance, wherein subclass includes preset quantity A eigenmatrix；And first determination unit include: selecting module, be configured to select subset from least one subclass It closes, the eigenmatrix that selected subclass includes is determined as target signature matrix；Determining module, be configured to for really Target signature matrix in fixed preset quantity target signature matrix, determine it is corresponding with the target signature matrix, be used for into The weight matrix of row convolution algorithm is as target weight matrix.

In some embodiments, storage unit is further configured to: being weighed based on SIMD instruction from preset quantity target The weighted data not extracted and storage are extracted in weight matrix respectively into first object register.

In some embodiments, storage unit is further configured to: based on SIMD instruction from identified subcharacter square The characteristic that do not extracted and storage are extracted in battle array respectively into the second destination register.

The third aspect, the embodiment of the present application provide a kind of electronic equipment, which includes: one or more processing Device, wherein processor includes register；Storage device is stored thereon with one or more programs；When one or more program quilts One or more processors execute, so that one or more processors realize the side as described in implementation any in first aspect Method.

Fourth aspect, the embodiment of the present application provide a kind of computer-readable medium, are stored thereon with computer program, should The method as described in implementation any in first aspect is realized when computer program is executed by processor.

Method and apparatus for storing data provided by the embodiments of the present application, by first from preset convolutional Neural net In network, preset quantity target signature matrix and preset quantity target weight matrix are determined.Then for preset quantity mesh The target signature matrix in eigenmatrix is marked, from the target signature matrix, is determined to mesh corresponding with the target signature matrix Mark the subcharacter matrix that weight matrix carries out convolution algorithm.Finally repeatedly from preset quantity target weight matrix, respectively The weighted data not extracted and storage are extracted into first object register；From identified subcharacter matrix, mention respectively Take the characteristic that do not extracted and storage into the second destination register.So as in batch by preset quantity target All storage facilitates the access speed using register into register for eigenmatrix and preset quantity target weight matrix Fast feature improves the operation efficiency of convolutional neural networks.

Detailed description of the invention

By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other Feature, objects and advantages will become more apparent upon:

Fig. 1 is that one embodiment of the application can be applied to exemplary system architecture figure therein；

Fig. 2 is the flow chart according to one embodiment of the method for storing data of the embodiment of the present application；

Fig. 3 is the schematic diagram according to an application scenarios of the method for storing data of the embodiment of the present application；

Fig. 4 is the flow chart according to another embodiment of the method for storing data of the embodiment of the present application；

Fig. 5 is the structural schematic diagram according to one embodiment of the device for storing data of the embodiment of the present application；

Fig. 6 is adapted for the structural schematic diagram for the computer system for realizing the electronic equipment of the embodiment of the present application.

Specific embodiment

The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Convenient for description, part relevant to related invention is illustrated only in attached drawing.

It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.

Fig. 1 shows the method for storing data or device for storing data that can apply the embodiment of the present application Exemplary system architecture 100.

As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105. Network 104 between terminal device 101,102,103 and server 105 to provide the medium of communication link.Network 104 can be with Including various connection types, such as wired, wireless communication link or fiber optic cables etc..

User can be used terminal device 101,102,103 and be interacted by network 104 with server 105, to receive or send out Send message etc..Various telecommunication customer end applications can be installed, such as image processing class is answered on terminal device 101,102,103 With, video playback class application, web browser applications etc..

Terminal device 101,102,103 can be hardware, be also possible to software.When terminal device 101,102,103 is hard When part, it can be various electronic equipments, including but not limited to smart phone, tablet computer, pocket computer on knee and desk-top Computer etc..When terminal device 101,102,103 is software, may be mounted in above-mentioned cited electronic equipment.Its Multiple softwares or software module (such as providing the software of Distributed Services or software module) may be implemented into, it can also be real Ready-made single software or software module.It is not specifically limited herein.

Server 105 can be to provide the server of various services, such as to the figure that terminal device 101,102,103 uploads The back-end data processing server that the data such as picture are handled.Back-end data processing server can use convolutional neural networks pair The data such as image handled and extraction process used in the process of the weighted data that arrives and storage to first object register In, used in the process of extraction process to characteristic and storage into the second destination register.

It should be noted that the method provided by the embodiment of the present application for storing data can be held by server 105 Row, can also be executed, correspondingly, device for storing data can be set in server by terminal device 101,102,103 In 105, also it can be set in terminal device 101,102,103.

It should be noted that server can be hardware, it is also possible to software.When server is hardware, may be implemented At the distributed server cluster that multiple servers form, individual server also may be implemented into.It, can when server is software To be implemented as multiple softwares or software module (such as providing the software of Distributed Services or software module), also may be implemented At single software or software module.It is not specifically limited herein.

It should be understood that the number of terminal device, network and server in Fig. 1 is only schematical.According to realization need It wants, can have any number of terminal device, network and server.The data handled using convolutional neural networks not It needs in the case where long-range obtain, above system framework can not include network, and only need terminal device or server.

With continued reference to Fig. 2, the process of one embodiment of the method for storing data according to the application is shown 200.The method for storing data, comprising the following steps:

Step 201, from preset convolutional neural networks, preset quantity target signature matrix and preset quantity are determined Target weight matrix.

In the present embodiment, (such as server shown in FIG. 1 or electronics are set the executing subject of method for storing data It is standby) preset quantity target signature matrix and preset quantity target weight can be determined from preset convolutional neural networks Matrix.Wherein, target signature matrix includes characteristic, and target weight matrix includes weighted data.Above-mentioned convolutional neural networks It can be set in advance in above-mentioned executing subject, for handling initial data (such as picture, term vector etc.).In general, The convolutional neural networks handled initial data may include convolutional layer, and convolutional layer includes eigenmatrix (i.e. rectangular again The characteristic pattern (feature map) of formula) and weight matrix (i.e. convolution kernel, also known as filter).The characteristic that eigenmatrix includes According to can be the data extracted from initial data (such as R (red, Red) G (green, Green) B (blue, Blue) value of pixel), It can be the number of some layer (such as convolutional layer, pond layer etc.) output in the convolutional neural networks for handling above-mentioned initial data According to.Weight matrix includes that weighted data is that convolutional neural networks are trained with identified data.In general, eigenmatrix and power Weight matrix is after convolution algorithm, available new characteristic.

In the present embodiment, above-mentioned executing subject can be in various manners from the eigenmatrix in convolutional neural networks In, determine preset quantity target signature matrix.Wherein, target signature matrix can be to itself and corresponding target weight square Battle array carries out the eigenmatrix of convolution algorithm.As an example, above-mentioned executing subject can be random to extract in advance from each eigenmatrix If quantity eigenmatrix is as target signature matrix.

In some optional implementations of the present embodiment, preset quantity target signature matrix is contained in convolutional Neural The eigenmatrix set that destination layer in network includes, eigenmatrix set are divided at least one subclass, subset in advance Closing includes preset quantity eigenmatrix.Wherein, destination layer can be the eigenmatrix set to include to it and carry out convolution fortune The layer of calculation, such as convolutional layer.Above-mentioned executing subject can determine pre- in accordance with the following steps, from preset convolutional neural networks If quantity target signature matrix and preset quantity target weight matrix:

Firstly, subclass is selected from least one above-mentioned subclass, the eigenmatrix for including by selected subclass It is determined as target signature matrix.As an example, above-mentioned executing subject can randomly choose son from least one above-mentioned subclass Set.Alternatively, the eigenmatrix in convolutional neural networks can have corresponding channel number, above-mentioned executing subject can be according to The sequence of the corresponding channel number of eigenmatrix selects the subset of pending convolution algorithm from least one above-mentioned subclass It closes.In general, a certain layer (such as convolutional layer, pond layer etc.) in convolutional neural networks may include multiple channels, each channel A kind of feature (such as shape feature, color characteristic of picture etc.) can be corresponded to.

As an example it is supposed that the channel number for the eigenmatrix that eigenmatrix set includes is successively are as follows: 1,2 ..., 8, the spy Sign set of matrices be divided into two subclass A and B in advance, wherein the channel number for the eigenmatrix that subclass A includes be 1, 2,3,4, the channel number for the eigenmatrix that subclass B includes is 5,6,7,8.Above-mentioned executing subject can be first by subclass A Including eigenmatrix be determined as target signature matrix.

Then, for the target signature matrix in identified preset quantity target signature matrix, the determining and target Eigenmatrix is corresponding, weight matrix for carrying out convolution algorithm is as target weight matrix.Wherein, target signature matrix and The corresponding relationship of target weight matrix is pre-set.

In some optional implementations of the present embodiment, preset quantity target signature matrix and preset quantity mesh Mark weight matrix is stored in advance in preset cache.Wherein, preset cache can be the CPU (Central of above-mentioned executing subject Processing Unit, central processing unit) include caching (such as level-one (L1) caching, second level (L2) caching etc.).Due to electricity Sub- equipment extracts number from the storage equipment such as other memories, hard disk when carrying out data operation, from caching reading data ratio According to it is more efficient, therefore, the eigenmatrix in convolutional neural networks can be loaded into above-mentioned pre- by above-mentioned executing subject in advance If in caching, so as to improve the efficiency of data access.

In some optional implementations of the present embodiment, above-mentioned preset quantity is preset single-instruction multiple-data stream (SIMD) The digit for the characteristic that eigenmatrix in the digit and convolutional neural networks of the data of SIMD instruction single-trial extraction includes Quotient.As an example it is supposed that SIMD instruction is 64 bit instructions, i.e., the digit of the data of single-trial extraction is 64, above-mentioned convolutional Neural The digit for the characteristic that eigenmatrix in network includes is 16, then preset quantity is 64/16=4.

Optionally, above-mentioned SIMD instruction can instruct for NEON, wherein NEON instruction is to be suitable for embedded microprocessor A kind of SIMD instruction, it is simplified transplanting of the software between different platform, is able to ascend at data using special design The speed of reason reduces hardware power consumption.It should be appreciated that above-mentioned executing subject can also use other in addition to above-mentioned NEON instruction SIMD instruction, such as SSE (Streaming SIMD Extensions, single-instruction multiple-data stream (SIMD) extension) instruction etc..

Step 202, for the target signature matrix in preset quantity target signature matrix, from the target signature matrix In, determine the subcharacter matrix that convolution algorithm is carried out to target weight matrix corresponding with the target signature matrix.

In the present embodiment, for the target signature matrix in above-mentioned preset quantity target signature matrix, above-mentioned execution Main body can determine from the target signature matrix and carry out convolution fortune to target weight matrix corresponding with the target signature matrix The subcharacter matrix of calculation.

In general, needing to extract and corresponding mesh from target signature matrix when convolutional neural networks carry out convolution algorithm The subcharacter matrix that the line number and columns for marking weight matrix are equal, will be in phase in subcharacter matrix and target weight matrix Data with position are multiplied.Wherein, the corresponding relationship of target signature matrix and target weight matrix is pre-set.By holding This step of row, the sub- eigenmatrix of available preset quantity.It should be noted that since convolutional neural networks are extensive at present The well-known technique of research and application, about the method for the determining subcharacter matrix for carrying out convolution algorithm with weight matrix, here not It repeats again.

In some optional implementations of the present embodiment, characteristic that the eigenmatrix in convolutional neural networks includes It is the fixed-point number of presetting digit capacity with weight matrix according to the weighted data for including.Due to electronic equipment to the operation of fixed-point number compared to It is higher to the operation efficiency of floating number, therefore, under the not high occasion of the required precision of the processing result to convolutional neural networks (such as the terminal devices such as mobile phone, tablet computer run convolutional neural networks), characteristic and power in convolutional neural networks Tuple is according to can be set to fixed-point number, to improve operation efficiency.Set default for the digit of characteristic and weighted data Digit, can contribute to the digit for the data for making full use of register that can store, to improve the access efficiency of register.

Step 203, following storing step is executed: for the target weight matrix in preset quantity target weight matrix, The weighted data not extracted and storage are extracted from the target weight matrix into first object register；For identified Subcharacter matrix in subcharacter matrix extracts the characteristic that do not extracted and storage to the second mesh from the subcharacter matrix In scalar register file；It determines in preset quantity target weight matrix and identified subcharacter matrix with the presence or absence of not extracting Data.

In the present embodiment, above-mentioned executing subject can execute following storing step:

Step 2031, for the target weight matrix in preset quantity target weight matrix, from the target weight matrix The middle weighted data not extracted and storage of extracting is into first object register.

Specifically, above-mentioned executing subject can be suitable according to the arrangement of the position for the weighted data that target weight matrix includes Sequence extracts weighted data from target weight matrix, by the storage of extracted weighted data into first object register.As Example, weighted data can have the line number and row number of position corresponding, for being characterized in target weight matrix, above-mentioned to hold Row main body can be (or minimum from minimum line number included by each target weight matrix, in the weighted data that did not extracted, is extracted Row number) in corresponding a line (or a column) weighted data, the corresponding weighted data of minimum row number (or minimum line number).

Above-mentioned first object register can be pre-set for storing the register of weighted data.Above-mentioned first mesh Scalar register file can be the register at least one pre-assigned register of above-mentioned executing subject.At least one above-mentioned deposit Device can be the register for including in the CPU of above-mentioned executing subject.Above-mentioned executing subject can be from least one above-mentioned register In, in various manners (such as the sequence of the number according to register, or according to weighted data that is preconfigured, extracting With the corresponding relationship of register) mask register is as first object register.

In some optional implementations of the present embodiment, above-mentioned executing subject can be based on above-mentioned SIMD instruction from pre- If the weighted data not extracted and storage are extracted in quantity target weight matrix respectively into first object register.As Example, it is assumed that above-mentioned preset quantity is 4, and SIMD instruction can once extract 4 data, then above-mentioned executing subject can be from 4 In target weight matrix, a weighted data is extracted respectively.Wherein, each weighted data of extraction is in affiliated target weight square Position in battle array is identical.For existing SISD (Single Instruction Single Data stream, single instruction stream Single data stream), each instruction can only extract a data.And for SIMD, an instruction can extract multiple data.Because more The processing of a data is parallel, therefore for the time, the time of an instruction execution, and SISD and SIMD are much the same. Since SIMD once can handle N (N is positive integer) a data, so the time of its processing also just shortens to the processing of SISD Time 1/N.Above-mentioned executing subject uses SIMD instruction, and the efficiency for extracting weighted data can be improved.

Step 2032, it for the subcharacter matrix in identified subcharacter matrix, is extracted not from the subcharacter matrix The characteristic extracted and storage are into the second destination register.

Specifically, above-mentioned executing subject can putting in order according to the position for the characteristic that subcharacter matrix includes, Characteristic is extracted from subcharacter matrix, by the storage of extracted characteristic into the second destination register.As an example, Characteristic can have the line number and row number of position that is corresponding, being characterized in subcharacter matrix, and above-mentioned executing subject can be with From characteristic in each subcharacter matrix, not extracting, the corresponding a line of minimum line number (or minimum row number) is extracted In (or one column) characteristic, the corresponding characteristic of minimum row number (or minimum line number).

Above-mentioned second destination register can be register pre-set, for storing weighted data.Above-mentioned second Destination register can be the register at least one pre-assigned register of above-mentioned executing subject.It is above-mentioned at least one post Storage can be the register for including in the CPU of above-mentioned executing subject.Above-mentioned executing subject can be from least one above-mentioned deposit In device, in various manners (such as the sequence of the number according to register, or according to characteristic that is preconfigured, extracting According to the corresponding relationship with register) mask register is as the second destination register.

In some optional implementations of the present embodiment, above-mentioned executing subject can be based on above-mentioned SIMD instruction from institute The characteristic that do not extracted and storage are extracted in determining subcharacter matrix respectively into the second destination register.As showing Example, it is assumed that above-mentioned preset quantity is 4, and SIMD instruction can once extract 4 data, then above-mentioned executing subject can be from 4 sons In eigenmatrix, a characteristic is extracted respectively and is stored into the second destination register.Wherein, each characteristic of extraction It is identical according to the position in affiliated subcharacter matrix.

Step 2033, determining, which whether there is in preset quantity target weight matrix and identified subcharacter matrix, does not mention The data taken.

Specifically, above-mentioned executing subject can determine in preset quantity target weight matrix with the presence or absence of not extracting Weighted data, and determine in identified subcharacter matrix with the presence or absence of the characteristic that do not extracted.If it is determined that default There is the weighted data not extracted in quantity target weight matrix, and exists in identified subcharacter matrix and do not extract The characteristic crossed, it is determined that exist in preset quantity target weight matrix and identified subcharacter matrix and do not extracted Data.

Step 204, exist in response to determining, continue to execute storing step.

In the present embodiment, above-mentioned executing subject can be in response to determining preset quantity target weight matrix and determining Subcharacter matrix in there are the data do not extracted, continue to execute above-mentioned storing step.

With continued reference to the signal that Fig. 3, Fig. 3 are according to the application scenarios of the method for storing data of the present embodiment Figure.In the application scenarios of Fig. 3, be provided with convolutional neural networks 302 on terminal device 301, convolutional neural networks 302 for pair The image of input is handled, and in treatment process, generates the eigenmatrix that convolutional neural networks include.Terminal device 301 is first In the eigenmatrix that layer from convolution algorithms in convolutional neural networks 302, pending includes, four (i.e. preset quantities are selected It is a) target signature matrix 303,304,305,306 and corresponding four target weight matrixes 307,308,309,310.Then, eventually End equipment 301 determines subcharacter matrix that each target signature matrix includes, to carry out convolution algorithm with target weight matrix 3031,3041,3051,3061.Then, terminal device 301 executes following storing step: extracting from each target weight matrix One weighted data not extracted and storage are extracted one from each subcharacter matrix and are not mentioned into first object register The characteristic taken and storage are into the second destination register；It determines in four target weight matrixes and four sub- eigenmatrixes With the presence or absence of the data that do not extracted, if it does, continuing to execute above-mentioned storing step.By being repeatedly carried out above-mentioned storage step Suddenly, above-mentioned terminal device 301 is by the characteristic in the weighted data and four sub- eigenmatrixes in four target weight matrixes All storage is into register.For example, first time execute storing step when, terminal device 301 from target weight matrix 307, 308, the weighted data 3071,3081,3091,3101 that out position is in the first row first row is extracted respectively in 309,310, and deposit Store up first object register D1；It extracts out position respectively from subcharacter matrix 3031,3041,3051,3061 and is in first The characteristic 30311,30411,30511,30611 of row first row, and store to the second destination register D2.

The method provided by the above embodiment of the application, it is default by determining first from preset convolutional neural networks Quantity target signature matrix and preset quantity target weight matrix.Then in preset quantity target signature matrix Target signature matrix is determined and is carried out to target weight matrix corresponding with the target signature matrix from the target signature matrix The subcharacter matrix of convolution algorithm.Finally repeatedly from preset quantity target weight matrix, extracts do not extracted respectively Weighted data and storage are into first object register；From identified subcharacter matrix, the spy not extracted is extracted respectively Data and storage are levied into the second destination register.So as in batch by preset quantity target signature matrix and default All storage facilitates the feature fast using the access speed of register, improves quantity target weight matrix into register The operation efficiency of convolutional neural networks.

With further reference to Fig. 4, it illustrates the processes 400 of another embodiment of method for storing data.The use In the process 400 of the method for storing data, comprising the following steps:

Step 401, from preset convolutional neural networks, preset quantity target signature matrix and preset quantity are determined Target weight matrix.

In the present embodiment, step 401 and the step 201 in Fig. 2 corresponding embodiment are almost the same, and which is not described herein again.

Step 402, for the target signature matrix in preset quantity target signature matrix, from the target signature matrix In, determine the subcharacter matrix that convolution algorithm is carried out to target weight matrix corresponding with the target signature matrix.

In the present embodiment, step 402 and the step 202 in Fig. 2 corresponding embodiment are almost the same, and which is not described herein again.

In the present embodiment, above-mentioned executing subject can continue to execute following storing step after executing the step 401, That is step 403- step 407:

Step 403, for the target weight matrix in preset quantity target weight matrix, from the target weight matrix The weighted data not extracted and storage are extracted into first object register.

In the present embodiment, step 403 and the step 2031 in Fig. 2 corresponding embodiment are almost the same, and which is not described herein again.

Step 404, it for the subcharacter matrix in identified subcharacter matrix, extracts from the subcharacter matrix and does not mention The characteristic taken and storage are into the second destination register.

In the present embodiment, step 404 and the step 2032 in Fig. 2 corresponding embodiment are almost the same, and which is not described herein again.

Step 405, for the weighted data being stored in each weighted data in first object register, by the weight Data obtain product multiplied by characteristic that is corresponding, being stored in the second destination register.

In the present embodiment, it for the weighted data being stored in each weighted data in first object register, uses In the method for storing data executing subject (such as server shown in FIG. 1 or terminal device) can by the weighted data multiplied by Characteristic that is corresponding, being stored in the second destination register, obtains product.

Specifically, as an example it is supposed that the data stored in first object register include: A, B, C, D, the second target is posted The data stored in storage include: E, F, G, H, wherein the position of A, B, C, D in weight matrix is respectively with E, F, G, H in son Position in eigenmatrix is identical, i.e., A, B, C, D correspond respectively to E, F, G, H, then the product obtained includes: A × E, B × F, C ×G、D×H。

Step 406, by the storage of obtained product into preset storage region.

In the present embodiment, above-mentioned executing subject can store obtained product into preset storage region.Its In, preset storage region can be the storage region of the fast speed of access data, such as the CPU of above-mentioned executing subject includes Caching (such as level cache, L2 cache etc.) or above-mentioned executing subject CPU include register (with storage feature The data register different with the register of weight data).Since above-mentioned preset storage region has access data faster Feature, therefore the storage of obtained product can contribute to after convolutional neural networks carry out into preset storage region When continuous calculating, operation efficiency is further increased.

Step 407, determining, which whether there is in preset quantity target weight matrix and identified subcharacter matrix, does not mention The data taken.

In the present embodiment, step 407 and the step 2033 in Fig. 2 corresponding embodiment are almost the same, and which is not described herein again.

Step 408, exist in response to determining, continue to execute above-mentioned storing step.

In the present embodiment, step 408 and the step 204 in Fig. 2 corresponding embodiment are almost the same, and which is not described herein again.

Figure 4, it is seen that the method for storing data compared with the corresponding embodiment of Fig. 2, in the present embodiment Process 400 highlight to the characteristic phase in each weighted data and the second destination register in first object register The step of multiplying and storing.The scheme of the present embodiment description is utilized obtained product storage to preset storage region as a result, In, it can contribute to further increase operation efficiency when convolutional neural networks carry out subsequent calculating.

With further reference to Fig. 5, as the realization to method shown in above-mentioned each figure, this application provides one kind for storing number According to device one embodiment, the Installation practice is corresponding with embodiment of the method shown in Fig. 2, which can specifically answer For in various electronic equipments.

As shown in figure 5, the device 500 for storing data of the present embodiment includes: the first determination unit 501, it is configured At from preset convolutional neural networks, determining preset quantity target signature matrix and preset quantity target weight matrix； Second determination unit 502 is configured to for the target signature matrix in preset quantity target signature matrix, from target spy It levies in matrix, determines the subcharacter matrix for carrying out convolution algorithm to target weight matrix corresponding with the target signature matrix；It deposits Storage unit 503 is configured to execute following storing step: for the target weight square in preset quantity target weight matrix Battle array extracts the weighted data not extracted and storage into first object register from the target weight matrix；For really Subcharacter matrix in fixed subcharacter matrix extracts the characteristic that do not extracted and storage to the from the subcharacter matrix In two destination registers；Determining, which whether there is in preset quantity target weight matrix and identified subcharacter matrix, does not extract The data crossed；Third determination unit 504 is configured in response to determine presence, continues to execute storing step.

In the present embodiment, the first determination unit 501 can determine preset quantity from preset convolutional neural networks Target signature matrix and preset quantity target weight matrix.Wherein, target signature matrix includes characteristic, target weight square Battle array includes weighted data.Above-mentioned convolutional neural networks can be set in advance in above-mentioned apparatus 500, for initial data (example Such as picture, term vector) it is handled.In general, the convolutional neural networks handled initial data may include convolutional layer, Convolutional layer include again eigenmatrix (i.e. the characteristic pattern (feature map) of matrix form) and weight matrix (i.e. convolution kernel, also known as Filter).The characteristic that eigenmatrix includes can be extracted from initial data data (such as pixel R (it is red, Red) G (green, Green) B (blue, Blue) value), some layer being also possible in the convolutional neural networks for handling above-mentioned initial data The data of (such as convolutional layer, pond layer etc.) output.Weight matrix includes that weighted data is trained to convolutional neural networks Identified data.In general, eigenmatrix and weight matrix be after convolution algorithm, available new characteristic.

In the present embodiment, above-mentioned first determination unit 501 can be in various manners from the spy in convolutional neural networks It levies in matrix, determines preset quantity target signature matrix.Wherein, target signature matrix can be to itself and corresponding target The eigenmatrix of weight matrix progress convolution algorithm.As an example, above-mentioned executing subject can be from each eigenmatrix, at random Preset quantity eigenmatrix is extracted as target signature matrix.

In the present embodiment, for the target signature matrix in above-mentioned preset quantity target signature matrix, above-mentioned second Determination unit 502 can from the target signature matrix, determine to target weight matrix corresponding with the target signature matrix into The subcharacter matrix of row convolution algorithm.

In the present embodiment, storage unit 503 can execute following storing step:

Step 5031, for the target weight matrix in preset quantity target weight matrix, from the target weight matrix The middle weighted data not extracted and storage of extracting is into first object register.

Specifically, said memory cells 503 can be according to the arrangement of the position for the weighted data that target weight matrix includes Sequentially, weighted data is extracted from target weight matrix, by the storage of extracted weighted data into first object register.Make For example, weighted data can have the line number and row number of position corresponding, for being characterized in target weight matrix, above-mentioned Executing subject can be from included by each target weight matrix, in the weighted data that did not extracted, extracting minimum line number (or most Small row number) in corresponding a line (or a column) weighted data, the corresponding weighted data of minimum row number (or minimum line number).

Above-mentioned first object register can be pre-set for storing the register of weighted data.Above-mentioned first mesh Scalar register file can be the register at least one the pre-assigned register of above-mentioned apparatus 500.At least one above-mentioned deposit Device can be the register for including in the CPU of above-mentioned apparatus 500.Above-mentioned apparatus 500 can from least one above-mentioned register, In various manners (such as the sequence of the number according to register, or according to weighted data that is preconfigured, extracting with post The corresponding relationship of storage) mask register is as first object register.

Step 5032, it for the subcharacter matrix in identified subcharacter matrix, is extracted not from the subcharacter matrix The characteristic extracted and storage are into the second destination register.

Specifically, said memory cells 503 can be suitable according to the arrangement of the position for the characteristic that subcharacter matrix includes Sequence extracts characteristic from subcharacter matrix, by the storage of extracted characteristic into the second destination register.As showing Example, characteristic can have the line number and row number of position that is corresponding, being characterized in subcharacter matrix, and above-mentioned executing subject can From characteristic in each subcharacter matrix, not extracting, to extract minimum line number (or minimum row number) corresponding one In row (or one column) characteristic, the corresponding characteristic of minimum row number (or minimum line number).

Above-mentioned second destination register can be pre-set for storing the register of weighted data.Above-mentioned second mesh Scalar register file can be the register at least one the pre-assigned register of above-mentioned apparatus 500.At least one above-mentioned deposit Device can be the register for including in the CPU of above-mentioned executing subject.Above-mentioned apparatus 500 can be from least one above-mentioned register In, in various manners (such as the sequence of the number according to register, or according to characteristic that is preconfigured, extracting With the corresponding relationship of register) mask register is as the second destination register.

Step 5033, determining, which whether there is in preset quantity target weight matrix and identified subcharacter matrix, does not mention The data taken.

Specifically, said memory cells 503 can be determined to whether there is in preset quantity target weight matrix and not extracted The weighted data crossed, and determine in identified subcharacter matrix with the presence or absence of the characteristic that do not extracted.If it is determined that There is the weighted data not extracted in preset quantity target weight matrix, and exists not in identified subcharacter matrix The characteristic extracted, it is determined that exist in preset quantity target weight matrix and identified subcharacter matrix and do not extract The data crossed.

In the present embodiment, third determination unit 504 can be in response to determining preset quantity target weight matrix and institute There are the data that do not extracted in determining subcharacter matrix, continues to execute above-mentioned storing step.

In some optional implementations of the present embodiment, storage unit 503 may include: computing module (in figure not Show), it is configured to for the weighted data in each weighted data for being stored in first object register, by the weight number According to multiplied by characteristic that is corresponding, being stored in the second destination register, product is obtained；Memory module (not shown), It is configured to store obtained product.

In some optional implementations of the present embodiment, characteristic that the eigenmatrix in convolutional neural networks includes It is the fixed-point number of presetting digit capacity with weight matrix according to the weighted data for including.

In some optional implementations of the present embodiment, above-mentioned preset quantity target signature matrix and preset quantity A target weight matrix is stored in advance in preset cache.

In some optional implementations of the present embodiment, above-mentioned preset quantity target signature matrix is contained in convolution The eigenmatrix set that destination layer in neural network includes, eigenmatrix set are divided at least one subclass in advance, Wherein, subclass includes preset quantity eigenmatrix；And first determination unit 501 may include: selecting module (in figure not Show), it is configured to select subclass from least one subclass, the eigenmatrix for including by selected subclass determines For target signature matrix；Determining module (not shown) is configured to for identified preset quantity target signature square Target signature matrix in battle array determines weight matrix conduct corresponding with the target signature matrix, for carrying out convolution algorithm Target weight matrix.

In some optional implementations of the present embodiment, preset quantity is that preset single-instruction multiple-data stream (SIMD) SIMD refers to The quotient of the digit for the characteristic for enabling the eigenmatrix in the digit and convolutional neural networks of the data of single-trial extraction include.

In some optional implementations of the present embodiment, the beginning of storage unit 503 is further configured to: being based on SIMD instruction extracts the weighted data not extracted and storage to first object respectively from preset quantity target weight matrix In register.

In some optional implementations of the present embodiment, storage unit 503 can be further configured to: be based on SIMD instruction extracts the characteristic that do not extracted respectively from identified subcharacter matrix and storage is deposited to the second target In device.

The device provided by the above embodiment of the application, it is default by determining first from preset convolutional neural networks Quantity target signature matrix and preset quantity target weight matrix.Then in preset quantity target signature matrix Target signature matrix is determined and is carried out to target weight matrix corresponding with the target signature matrix from the target signature matrix The subcharacter matrix of convolution algorithm.Finally repeatedly from preset quantity target weight matrix, extracts do not extracted respectively Weighted data and storage are into first object register；From identified subcharacter matrix, the spy not extracted is extracted respectively Data and storage are levied into the second destination register.So as in batch by preset quantity target signature matrix and default All storage facilitates the feature fast using the access speed of register, improves quantity target weight matrix into register The operation efficiency of convolutional neural networks.

Below with reference to Fig. 6, it is (such as shown in FIG. 1 that it illustrates the electronic equipments for being suitable for being used to realize the embodiment of the present application Server or terminal device) computer system 600 structural schematic diagram.Electronic equipment shown in Fig. 6 is only an example, Should not function to the embodiment of the present application and use scope bring any restrictions.

As shown in fig. 6, computer system 600 includes central processing unit (CPU) 601, it can be read-only according to being stored in Program in memory (ROM) 602 or be loaded into the program in random access storage device (RAM) 603 from storage section 608 and Execute various movements appropriate and processing.In RAM 603, also it is stored with system 600 and operates required various programs and data. CPU 601, ROM 602 and RAM 603 are connected with each other by bus 604.Input/output (I/O) interface 605 is also connected to always Line 604.

I/O interface 605 is connected to lower component: the importation 606 including keyboard, mouse etc.；Including such as liquid crystal Show the output par, c 607 of device (LCD) etc. and loudspeaker etc.；Storage section 608 including hard disk etc.；And including such as LAN The communications portion 609 of the network interface card of card, modem etc..Communications portion 609 is executed via the network of such as internet Communication process.Driver 610 is also connected to I/O interface 605 as needed.Detachable media 611, such as disk, CD, magneto-optic Disk, semiconductor memory etc. are mounted on as needed on driver 610, in order to from the computer program root read thereon According to needing to be mounted into storage section 608.

Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes the program code for method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed from network by communications portion 609, and/or from detachable media 611 are mounted.When the computer program is executed by central processing unit (CPU) 601, limited in execution the present processes Above-mentioned function.

It should be noted that computer-readable medium described herein can be computer-readable signal media or meter Calculation machine readable medium either the two any combination.Computer-readable medium for example may be-but not limited to- Electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.It is computer-readable The more specific example of medium can include but is not limited to: have electrical connection, the portable computer magnetic of one or more conducting wires Disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM or sudden strain of a muscle Deposit), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device or above-mentioned appoint The suitable combination of meaning.In this application, computer-readable medium can be any tangible medium for including or store program, the journey Sequence can be commanded execution system, device or device use or in connection.And in this application, it is computer-readable Signal media may include in a base band or as carrier wave a part propagate data-signal, wherein carrying computer can The program code of reading.The data-signal of this propagation can take various forms, including but not limited to electromagnetic signal, optical signal or Above-mentioned any appropriate combination.Computer-readable signal media can also be any calculating other than computer-readable medium Machine readable medium, the computer-readable medium can be sent, propagated or transmitted for by instruction execution system, device or device Part uses or program in connection.The program code for including on computer-readable medium can use any Jie appropriate Matter transmission, including but not limited to: wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.

The calculating of the operation for executing the application can be write with one or more programming languages or combinations thereof Machine program code, described program design language include object oriented program language-such as Java, Smalltalk, C+ +, it further include conventional procedural programming language-such as " C " language or similar programming language.Program code can Fully to execute, partly execute on the user computer on the user computer, be executed as an independent software package, Part executes on the remote computer or executes on a remote computer or server completely on the user computer for part. In situations involving remote computers, remote computer can pass through the network of any kind --- including local area network (LAN) Or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as utilize Internet service Provider is connected by internet).

Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the application, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction Combination realize.

Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard The mode of part is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor packet Include the first determination unit, the second determination unit, storage unit and third determination unit.Wherein, the title of these units is at certain In the case of do not constitute restriction to the unit itself, for example, the first determination unit is also described as " from preset convolution In neural network, the unit of preset quantity target signature matrix and preset quantity target weight matrix is determined ".

As on the other hand, present invention also provides a kind of computer-readable medium, which be can be Included in electronic equipment described in above-described embodiment；It is also possible to individualism, and without in the supplying electronic equipment. Above-mentioned computer-readable medium carries one or more program, when said one or multiple programs are held by the electronic equipment When row, so that the electronic equipment: from preset convolutional neural networks, determining preset quantity target signature matrix and present count Measure a target weight matrix；For the target signature matrix in preset quantity target signature matrix, from the target signature matrix In, determine the subcharacter matrix that convolution algorithm is carried out to target weight matrix corresponding with the target signature matrix；It executes as follows Storing step: it for the target weight matrix in preset quantity target weight matrix, is extracted not from the target weight matrix The weighted data extracted and storage are into first object register；For the subcharacter square in identified subcharacter matrix Battle array extracts the characteristic that do not extracted and storage into the second destination register from the subcharacter matrix；Determine present count It measures in a target weight matrix and identified subcharacter matrix with the presence or absence of the data that do not extracted；Exist in response to determining, Continue to execute storing step.

Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.Those skilled in the art Member is it should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from foregoing invention design, it is carried out by above-mentioned technical characteristic or its equivalent feature Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed herein Can technical characteristic replaced mutually and the technical solution that is formed.

Claims

1. a kind of method for storing data, comprising:

From preset convolutional neural networks, preset quantity target signature matrix and preset quantity target weight square are determined Battle array；

For the target signature matrix in the preset quantity target signature matrix, from the target signature matrix, determine to Target weight matrix corresponding with the target signature matrix carries out the subcharacter matrix of convolution algorithm；

Execute following storing step: for the target weight matrix in the preset quantity target weight matrix, from the target The weighted data not extracted and storage are extracted in weight matrix into first object register；For identified subcharacter square Subcharacter matrix in battle array extracts the characteristic that do not extracted and storage to the second destination register from the subcharacter matrix In；It determines in the preset quantity target weight matrix and identified subcharacter matrix with the presence or absence of the number not extracted According to；

Exist in response to determining, continues to execute the storing step.

2. according to the method described in claim 1, wherein, the storing step further include:

For the weighted data being stored in each weighted data in first object register, by the weighted data multiplied by correspondence , the characteristic being stored in the second destination register, obtain product；

Store obtained product.

3. according to the method described in claim 1, wherein, the characteristic that the eigenmatrix in the convolutional neural networks includes The weighted data for including with weight matrix is the fixed-point number of presetting digit capacity.

4. according to the method described in claim 1, wherein, the preset quantity target signature matrix and the preset quantity are a Target weight matrix is stored in advance in preset cache.

5. according to the method described in claim 1, wherein, the preset quantity target signature matrix is contained in the convolution mind The eigenmatrix set for including through the destination layer in network, the eigenmatrix set are divided at least one subset in advance It closes, wherein subclass includes preset quantity eigenmatrix；And

It is described from preset convolutional neural networks, determine preset quantity target signature matrix and preset quantity target weight Matrix, comprising:

Subclass is selected from least one described subclass, the eigenmatrix that selected subclass includes is determined as target Eigenmatrix；

For the target signature matrix in identified preset quantity target signature matrix, the determining and target signature matrix pair Weight matrix answering, for carrying out convolution algorithm is as target weight matrix.

6. method described in one of -5 according to claim 1, wherein the preset quantity is preset single-instruction multiple-data stream (SIMD) The position for the characteristic that eigenmatrix in the digit of the data of SIMD instruction single-trial extraction and the convolutional neural networks includes Several quotient.

7. according to the method described in claim 6, wherein, the target in the preset quantity target weight matrix Weight matrix extracts the weighted data not extracted and storage into first object register, packet from the target weight matrix It includes:

The weighted data not extracted is extracted respectively from the preset quantity target weight matrix based on the SIMD instruction And storage is into first object register.

8. according to the method described in claim 6, wherein, the subcharacter matrix in identified subcharacter matrix, The characteristic that do not extracted and storage are extracted from the subcharacter matrix into the second destination register, comprising:

Extracted respectively from identified subcharacter matrix based on the SIMD instruction characteristic that do not extracted and storage to In second destination register.

9. a kind of device for storing data, comprising:

First determination unit is configured to from preset convolutional neural networks, determine preset quantity target signature matrix and Preset quantity target weight matrix；

Second determination unit is configured to for the target signature matrix in the preset quantity target signature matrix, from this In target signature matrix, the subcharacter square that convolution algorithm is carried out to target weight matrix corresponding with the target signature matrix is determined Battle array；

Storage unit is configured to execute following storing step: for the target in the preset quantity target weight matrix Weight matrix extracts the weighted data not extracted and storage into first object register from the target weight matrix；It is right Subcharacter matrix in identified subcharacter matrix extracts the characteristic that do not extracted and is deposited from the subcharacter matrix It stores up into the second destination register；Determine in the preset quantity target weight matrix and identified subcharacter matrix whether In the presence of the data that do not extracted；

Third determination unit is configured in response to determine presence, continues to execute the storing step.

10. device according to claim 9, wherein the storage unit includes:

Computing module is configured to for the weighted data in each weighted data for being stored in first object register, will The weighted data obtains product multiplied by characteristic that is corresponding, being stored in the second destination register；

Memory module is configured to store obtained product.

11. device according to claim 9, wherein the characteristic that the eigenmatrix in the convolutional neural networks includes It is the fixed-point number of presetting digit capacity with weight matrix according to the weighted data for including.

12. device according to claim 9, wherein the preset quantity target signature matrix and the preset quantity A target weight matrix is stored in advance in preset cache.

13. device according to claim 9, wherein the preset quantity target signature matrix is contained in the convolution The eigenmatrix set that destination layer in neural network includes, the eigenmatrix set are divided at least one subset in advance It closes, wherein subclass includes preset quantity eigenmatrix；And

First determination unit includes:

Selecting module is configured to select subclass from least one described subclass, includes by selected subclass Eigenmatrix is determined as target signature matrix；

Determining module is configured to determine the target signature matrix in identified preset quantity target signature matrix Weight matrix corresponding with the target signature matrix, for carrying out convolution algorithm is as target weight matrix.

14. the device according to one of claim 9-13, wherein the preset quantity is preset single-instruction multiple-data stream (SIMD) The position for the characteristic that eigenmatrix in the digit of the data of SIMD instruction single-trial extraction and the convolutional neural networks includes Several quotient.

15. device according to claim 14, wherein the storage unit is further configured to:

16. device according to claim 14, wherein the storage unit is further configured to:

17. a kind of electronic equipment, comprising:

One or more processors, wherein processor includes register；

Storage device is stored thereon with one or more programs,

When one or more of programs are executed by one or more of processors, so that one or more of processors are real Now such as method described in any one of claims 1-8.

18. a kind of computer-readable medium, is stored thereon with computer program, wherein the realization when program is executed by processor Such as method described in any one of claims 1-8.