CN109308194B

CN109308194B - Method and apparatus for storing data

Info

Publication number: CN109308194B
Application number: CN201811149876.7A
Authority: CN
Inventors: 胡耀全
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Douyin Vision Co Ltd; Douyin Vision Beijing Co Ltd
Priority date: 2018-09-29
Filing date: 2018-09-29
Publication date: 2021-08-10
Anticipated expiration: 2038-09-29
Also published as: CN109308194A

Abstract

The embodiment of the application discloses a method and a device for storing data. One embodiment of the method comprises: determining a sub-feature matrix to be subjected to convolution operation on a weight matrix corresponding to a target feature matrix from the target feature matrix in a preset convolution neural network; the following storage steps are performed: extracting a preset number of weight data from the weight data which are not extracted in the weight matrix and storing the weight data in a first target register; extracting a preset number of feature data from the feature data which are not extracted in the sub-feature matrix and storing the feature data in a second target register; determining whether the quantity of the weight data which are not extracted in the weight matrix and the quantity of the feature data which are not extracted in the sub-feature matrix are both larger than or equal to a preset quantity; and responding to the determination that the number of the storage units is larger than or equal to the preset number, and continuing to execute the storage step. The implementation mode is beneficial to improving the operation efficiency of the convolutional neural network by utilizing the characteristic of high access speed of the register.

Description

Method and apparatus for storing data

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a method and a device for storing data.

Background

A Convolutional Neural Network (CNN) is a feed-forward Neural Network whose artificial neurons can respond to a portion of the coverage of surrounding cells, and performs well for large image processing. CNNs include convolutional layers (convolutional layers), pooling layers (pooling layers), and the like. When performing convolution operation on data in these layers, it is generally necessary to multiply feature data included in a feature matrix (i.e., a feature map in a matrix) with weight data included in a weight matrix (i.e., a convolution kernel (also referred to as a filter) in a matrix).

Disclosure of Invention

The embodiment of the application provides a method and a device for storing data.

In a first aspect, an embodiment of the present application provides a method for storing data, where the method includes: determining a sub-feature matrix to be subjected to convolution operation on a weight matrix corresponding to a target feature matrix from the target feature matrix in a preset convolution neural network; the following storage steps are performed: extracting a preset number of weight data from the weight data which are not extracted in the weight matrix and storing the weight data in a first target register; extracting a preset number of feature data from the feature data which are not extracted in the sub-feature matrix and storing the feature data in a second target register; determining whether the quantity of the weight data which are not extracted in the weight matrix and the quantity of the feature data which are not extracted in the sub-feature matrix are both larger than or equal to a preset quantity; and responding to the determination that the number of the storage units is larger than or equal to the preset number, and continuing to execute the storage step.

In some embodiments, the storing step further comprises: in response to determining that the number of unextracted weight data in the weight matrix and the number of unextracted feature data in the sub-feature matrix are both greater than zero and less than a preset number, storing the unextracted weight data in the weight matrix into a first target register and storing the unextracted feature data in the sub-feature matrix into a second target register.

In some embodiments, the feature matrix in the convolutional neural network includes feature data and the weight matrix includes weight data that is fixed-point numbers of a preset number of bits.

In some embodiments, the storing step further comprises: for the weight data in each weight data stored in the first target register, multiplying the weight data by the corresponding characteristic data stored in the second target register to obtain a product; and storing the obtained product into a preset storage area.

In some embodiments, the feature data included in the target feature matrix and the weight data included in the weight matrix are stored in a preset cache in advance.

In some embodiments, the preset number is a quotient of a preset number of bits of data extracted a single time by the single instruction multiple data stream SIMD instruction and a number of bits of feature data included in a feature matrix in the convolutional neural network.

In some embodiments, extracting a predetermined number of weight data from the weight data in the weight matrix that is not extracted and storing the weight data in the first target register includes: and extracting a preset number of weight data from the weight data which are not extracted in the weight matrix based on the SIMD instruction, and storing the weight data in a first target register.

In some embodiments, extracting a predetermined number of feature data from the feature data in the sub-feature matrix that is not extracted and storing the extracted feature data in the second target register includes:

and extracting a preset number of feature data from the feature data which are not extracted in the sub-feature matrix based on the SIMD instruction, and storing the feature data in a second target register.

In a second aspect, an embodiment of the present application provides an apparatus for storing data, the apparatus including: the first determining unit is configured to determine a sub-feature matrix to be subjected to convolution operation on a weight matrix corresponding to a target feature matrix from the target feature matrix in a preset convolution neural network; a storage unit configured to perform the following storage steps: extracting a preset number of weight data from the weight data which are not extracted in the weight matrix and storing the weight data in a first target register; extracting a preset number of feature data from the feature data which are not extracted in the sub-feature matrix and storing the feature data in a second target register; determining whether the quantity of the weight data which are not extracted in the weight matrix and the quantity of the feature data which are not extracted in the sub-feature matrix are both larger than or equal to a preset quantity; and a second determination unit configured to continue to perform the storing step in response to determining that each is greater than or equal to the preset number.

In some embodiments, the storage unit is further configured to: in response to determining that the number of unextracted weight data in the weight matrix and the number of unextracted feature data in the sub-feature matrix are both greater than zero and less than a preset number, storing the unextracted weight data in the weight matrix into a first target register and storing the unextracted feature data in the sub-feature matrix into a second target register.

In some embodiments, the memory cell comprises: a calculation module configured to multiply, for weight data among the weight data stored in the first target register, the weight data by corresponding feature data stored in the second target register to obtain a product; and the storage module is configured to store the obtained product into a preset storage area.

In some embodiments, the storage unit is further configured to: and extracting a preset number of weight data from the weight data which are not extracted in the weight matrix based on the SIMD instruction, and storing the weight data in a first target register.

In some embodiments, the storage unit is further configured to: and extracting a preset number of feature data from the feature data which are not extracted in the sub-feature matrix based on the SIMD instruction, and storing the feature data in a second target register.

In a third aspect, an embodiment of the present application provides an electronic device, where the server includes: one or more processors, wherein the processors comprise registers; a storage device having one or more programs stored thereon; when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method as described in any implementation of the first aspect.

In a fourth aspect, the present application provides a computer-readable medium, on which a computer program is stored, which, when executed by a processor, implements the method as described in any implementation manner of the first aspect.

According to the method and the device for storing data, a sub-feature matrix to be subjected to convolution operation on a weight matrix corresponding to a target feature matrix is determined from the target feature matrix in a preset convolution neural network. Repeatedly extracting a preset number of weight data from the weight data which are not extracted in the weight matrix and storing the weight data into the first target register, and extracting a preset number of feature data from the feature data which are not extracted in the sub-feature matrix and storing the feature data into the second target register. Therefore, the weight data included by the weight matrix and the characteristic data included by the sub-characteristic matrix can be stored in the register, and the characteristic of high access speed of the register is favorably utilized to improve the operation efficiency of the convolutional neural network.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for storing data, according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an application scenario of a method for storing data according to an embodiment of the present application;

FIG. 4 is a flow diagram of yet another embodiment of a method for storing data according to an embodiment of the present application;

FIG. 5 is a schematic block diagram illustrating one embodiment of an apparatus for storing data according to an embodiment of the present application;

FIG. 6 is a schematic block diagram of a computer system suitable for use in implementing an electronic device according to embodiments of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 illustrates an exemplary system architecture 100 to which the method for storing data or the apparatus for storing data of the embodiments of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as an image processing application, a video playing application, a web browser application, etc., may be installed on the

terminal devices

101, 102, 103.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server that provides various services, such as a background data processing server that processes data such as images uploaded by the

terminal apparatuses

101, 102, 103. The background data processing server can process data such as images by using a convolutional neural network, extract weight data used in the processing process and store the weight data into a first target register, and extract feature data used in the processing process and store the feature data into a second target register.

It should be noted that the method for storing data provided in the embodiment of the present application may be executed by the server 105, or may be executed by the

terminal devices

101, 102, and 103, and accordingly, the apparatus for storing data may be disposed in the server 105, or may be disposed in the

terminal devices

101, 102, and 103.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. In the case where data processed using a convolutional neural network does not need to be acquired from a remote location, the system architecture described above may not include a network, but only a terminal device or a server.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for storing data in accordance with the present application is shown. The method for storing data comprises the following steps:

step 201, determining a sub-feature matrix to be subjected to convolution operation on a weight matrix corresponding to a target feature matrix from the target feature matrix in a preset convolution neural network.

In this embodiment, an execution subject of the method for storing data (for example, a server or a terminal device shown in fig. 1) may determine a sub-feature matrix to be subjected to convolution operation with a weight matrix corresponding to a target feature matrix from the target feature matrices in a preset convolutional neural network. The convolutional neural network may be preset in the execution body, and is used for processing raw data (e.g., pictures, word vectors, etc.). In general, a convolutional neural network that processes raw data may include convolutional layers, which in turn include a feature matrix (i.e., a feature map in a matrix form) and a weight matrix (i.e., a convolutional kernel, also known as a filter). The feature matrix includes feature data and the weight matrix includes weight data. The feature matrix may include feature data extracted from raw data (e.g., R (Red) G (Green) B (Blue) values of pixels) or output from a layer (e.g., convolutional layer, pooling layer, etc.) in a convolutional neural network that processes the raw data. The weight matrix includes weight data determined from training the convolutional neural network. In general, new feature data can be obtained after convolution operation is performed on the feature matrix and the weight matrix.

The target feature matrix may be selected in advance by a technician from feature matrices included in the convolutional neural network, or may be a feature matrix to be subjected to convolution operation, which performs subject selection (for example, selection according to the order of the channel numbers corresponding to the feature matrices).

In general, when a convolutional neural network performs a convolution operation, it is necessary to extract a sub-feature matrix having the same number of rows and columns as that of a corresponding weight matrix from a feature matrix, and multiply data at the same position in the sub-feature matrix and the weight matrix. Wherein, the corresponding relation between the characteristic matrix and the weight matrix is preset. It should be noted that, since the convolutional neural network is a well-known technology widely studied and applied at present, details about a method for determining the sub-feature matrix convolved with the weight matrix are not repeated herein.

In some optional implementations of the embodiment, the feature data included in the feature matrix and the weight data included in the weight matrix in the convolutional neural network are fixed point numbers with preset number of bits. Because the electronic device has higher operation efficiency on the fixed point number than the floating point number, the characteristic data and the weight data in the convolutional neural network can be set as the fixed point number in the occasion that the precision requirement on the processing result of the convolutional neural network is not high (for example, the convolutional neural network is operated on a terminal device such as a mobile phone, a tablet computer and the like), so that the operation efficiency is improved. The bit numbers of the characteristic data and the weight data are set as the preset bit numbers, so that the bit numbers of the data which can be stored in the register can be fully utilized, and the access efficiency of the register is improved.

In some optional implementations of this embodiment, the feature data included in the target feature matrix and the weight data included in the weight matrix may be stored in a preset buffer in advance. The preset cache may be a cache (e.g., a first-level (L1) cache, a second-level (L2) cache, etc.) included in a Central Processing Unit (CPU) of the execution subject. When the electronic device performs data operation, the efficiency of reading data from the cache is higher than that of reading data from other storage devices such as an internal memory and a hard disk, so that the execution main body can load the feature matrix in the convolutional neural network into the preset cache in advance, and the efficiency of data access can be improved.

Step 202, the following storage steps are performed: extracting a preset number of weight data from the weight data which are not extracted in the weight matrix and storing the weight data in a first target register; extracting a preset number of feature data from the feature data which are not extracted in the sub-feature matrix and storing the feature data in a second target register; and determining whether the quantity of the weight data which are not extracted in the weight matrix and the quantity of the feature data which are not extracted in the sub-feature matrix are both larger than or equal to a preset quantity.

In this embodiment, based on the execution agent, the following storing step may be executed:

step 2021, extracting a predetermined number of weight data from the weight data not extracted in the weight matrix and storing the weight data in the first target register.

Specifically, when the executing agent executes step 2021 for the first time, the unextracted weight data in the weight matrix is all the weight data in the weight matrix. In general, the execution agent may extract the weight data in an order of arrangement of positions of the weight data in the weight matrix. For example, the weight data may have a corresponding row number and column number representing a position in the weight matrix, and the execution body may extract a preset number of weight data in order of the column number from small to large starting from the first row. The first target register may be a register which is set in advance to store weight data. The first target register may be a register of at least one register pre-allocated by the execution body. The at least one register may be a register included in the CPU of the execution main body. The execution body may select a register as the first target register in various ways (for example, in the order of the number of registers, or in a pre-configured correspondence relationship between extracted weight data and the register) from the at least one register.

In some optional implementations of this embodiment, the preset number is a quotient of a number of bits of Data extracted by a Single Instruction Multiple Data (SIMD) Instruction at a Single time and a number of bits of feature Data included in a feature matrix in the convolutional neural network. As an example, assuming that the SIMD instruction is a 64-bit instruction, that is, the number of bits of data extracted at a single time is 64 bits, and the number of bits of feature data included in the feature matrix in the above convolutional neural network is 16 bits, the preset number is 64/16 — 4.

Optionally, the SIMD instruction may be an NEON instruction, where the NEON instruction is a SIMD instruction suitable for an embedded microprocessor, and the SIMD instruction is specially designed, so that migration of software between different platforms is simplified, a data processing speed can be increased, and hardware power consumption is reduced. It should be understood that the execution body may employ other SIMD instructions, such as SSE (single instruction multiple data stream Extensions) instructions, in addition to the NEON instructions described above.

In some optional implementations of this embodiment, the execution entity may extract a preset number of weight data from the weight data that is not extracted in the weight matrix based on the SIMD instruction, and store the extracted weight data in the first target register. For existing SISD (Single Instruction Single Data stream), each Instruction can extract only one Data, while for SIMD, one Instruction can extract multiple Data. Since the processing of multiple data is parallel, SISD and SIMD are about the same in time, as far as the time of execution of one instruction. Since SIMD can process N (N is a positive integer) data at a time, its processing time is also shortened to 1/N of that of SISD. The execution main body uses SIMD instruction, so as to improve the efficiency of extracting weight data.

Step 2022, extracting a preset number of feature data from the feature data not extracted in the sub-feature matrix and storing the feature data in the second target register.

Specifically, when the executing entity executes step 2022 for the first time, the feature data in the sub-feature matrix that is not extracted is all the feature data in the sub-feature matrix. The second target register may be a preset register for storing the characteristic data. The second target register may be a register of at least one register pre-allocated by the execution body. The at least one register may be a register included in the CPU of the execution main body. The execution body may select a register as the second target register in various ways (for example, in the order of the number of the registers, or in a pre-configured correspondence relationship between the extracted feature data and the register) from the at least one register. It should be noted that the method for extracting feature data in step 2022 is substantially the same as the method for extracting weight data in step 2021, and details are not repeated here.

In some optional implementations of this embodiment, the execution main body may extract a preset number of feature data from the feature data that is not extracted in the sub-feature matrix based on the SIMD instruction, and store the extracted feature data in the second target register. Thereby improving the efficiency of extracting feature data.

Since the data stored in the register is processed faster than the data stored in other storage devices (e.g., a memory, a hard disk, etc.), the efficiency of data processing using the convolutional neural network can be improved by storing the weight data and the feature data in the register. In practice, technicians can adjust the quantity of the weight data included in the weight matrix and the quantity of the feature data included in the sub-feature matrix in advance according to the number of the registers and the number of bits of data stored in a single register, so that the utilization rate of the registers is maximized, and the operation efficiency of the convolutional neural network is improved to the maximum extent.

Step 2023, determining whether the number of the weight data not extracted in the weight matrix and the number of the feature data not extracted in the sub-feature matrix are both greater than or equal to a preset number.

And step 203, responding to the determination that the number is larger than or equal to the preset number, and continuing to execute the storage step.

In this embodiment, the executing agent may continue to execute the storing step in response to determining that the number of unextracted weight data in the weight matrix and the number of unextracted feature data in the sub-feature matrix are both greater than or equal to a preset number.

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for storing data according to the present embodiment. In the application scenario of fig. 3, a convolutional neural network 302 is provided on a terminal device 301, and the terminal device 301 first determines a sub-feature matrix 305 to be subjected to convolution operation on a weight matrix 304 corresponding to the target feature matrix 303 from a target feature matrix 303 in the convolutional neural network 302. The sub-feature matrix 305 and the weight matrix 304 are both 4 rows and 4 columns. Then, the terminal device 301 performs the following storing step: extracting a preset number (for example, four) of weight data from the weight data in the weight matrix 304 that are not extracted, and storing the weight data in the first target register; extracting a preset number of feature data from the feature data which are not extracted in the sub-feature matrix 305 and storing the feature data into a second target register; it is determined whether the number of unextracted weight data in the weight matrix 304 and the number of unextracted feature data in the sub-feature matrix 305 are both equal to or greater than a preset number. And if the number of the data is larger than or equal to the preset number, continuing to execute the storage step. By repeatedly performing the above-mentioned storing steps, the final terminal device 301 stores four rows of weight data (i.e. 3041 and 3044 in the figure) included in the weight matrix 304 to the corresponding first destination registers D1, D2, D3 and D4, respectively, and stores four rows of feature data (i.e. 3051 and 3054 in the figure) included in the sub-feature matrix 305 to the corresponding second destination registers D5, D6, D7 and D8, respectively.

According to the method provided by the embodiment of the application, the weight data included in the weight matrix and the feature data included in the sub-feature matrix in the convolutional neural network are stored in the register, so that the characteristic of high access speed of the register is favorably utilized, and the operation efficiency of the convolutional neural network is improved.

With further reference to FIG. 4, a flow 400 of yet another embodiment of a method for storing data is shown. The process 400 of the method for storing data includes the steps of:

step 401, determining a sub-feature matrix to be subjected to convolution operation on a weight matrix corresponding to a target feature matrix from the target feature matrix in a preset convolution neural network.

In this embodiment, step 401 is substantially the same as step 201 in the corresponding embodiment of fig. 2, and is not described here again.

In this embodiment, after the execution of step 401, the execution main body may continue to execute the following storage steps, i.e. step 402 to step 407:

step 402, extracting a preset number of weight data from the weight data which are not extracted in the weight matrix and storing the weight data in a first target register.

In this embodiment, step 402 is substantially the same as step 2021 in the corresponding embodiment of fig. 2, and is not described herein again.

In step 403, a preset number of feature data are extracted from the feature data that are not extracted in the sub-feature matrix and stored in the second target register.

In this embodiment, step 403 is substantially the same as step 2022 in the corresponding embodiment of fig. 2, and is not described herein again.

In step 404, for the weight data in each weight data stored in the first target register, the weight data is multiplied by the corresponding feature data stored in the second target register to obtain a product.

In the present embodiment, for the weight data among the respective weight data stored in the first destination register, an executing subject of the method for storing data (e.g., the server or the terminal device shown in fig. 1) may multiply the weight data by the corresponding feature data stored in the second destination register to obtain a product.

Specifically, as an example, assume that the data stored in the first destination register includes: A. b, C, D, the data stored in the second destination register includes: E. f, G, H, wherein the position of A, B, C, D in the weight matrix is the same as the position of E, F, G, H in the sub-feature matrix, respectively, i.e., A, B, C, D corresponds to E, F, G, H, respectively, the resulting product includes: a × E, B × F, C × G, D × H.

Step 405, storing the obtained product into a preset storage area.

In this embodiment, the execution body may store the resultant product in a preset storage area. The preset storage area may be a storage area with a fast data access speed, such as a cache (e.g., a first-level cache, a second-level cache, etc.) included in the CPU of the execution main body, or a register (different from a register storing the feature data and the weight data) included in the CPU of the execution main body. The preset storage area has the characteristic of fast data access, so that the obtained product is stored in the preset storage area, and the calculation efficiency can be further improved when the convolutional neural network performs subsequent calculation.

Step 406, determining whether the number of the weight data which are not extracted in the weight matrix and the number of the feature data which are not extracted in the sub-feature matrix are both greater than or equal to a preset number.

In this embodiment, step 406 is substantially the same as step 2023 in the corresponding embodiment of fig. 2, and is not described herein again.

Step 407, in response to determining that all of the weights are greater than zero and less than the preset number, storing the weight data in the weight matrix that is not extracted into a first target register, and storing the feature data in the sub-feature matrix that is not extracted into a second target register.

In this embodiment, the execution body may store the unextracted weight data in the weight matrix into the first target register and store the unextracted feature data in the sub-feature matrix into the second target register in response to determining that both the number of unextracted weight data in the weight matrix and the number of unextracted feature data in the sub-feature matrix are greater than zero and less than a preset number. By executing the implementation mode, the weight data in the weight matrix and the characteristic data in the sub-characteristic matrix can be completely stored in the register, so that the characteristic of faster access data of the register can be utilized, and the operation efficiency of the convolutional neural network is improved.

And step 408, responding to the determination that the number is larger than or equal to the preset number, and continuing to execute the storage step.

In this embodiment, step 408 is substantially the same as step 203 in the corresponding embodiment of fig. 2, and is not described herein again.

As can be seen from fig. 4, compared with the embodiment corresponding to fig. 2, the flow 400 of the method for storing data in the present embodiment highlights the step of multiplying and storing each weight data in the first target register and the feature data in the second target register, and the step of extracting and storing the weight data and the feature data when the number of unextracted weight data in the weight matrix and the number of unextracted feature data in the sub-feature matrix are both greater than zero and less than a preset number. Therefore, the scheme described in the embodiment can further improve the operation efficiency of the convolutional neural network by utilizing the characteristic that the access data of the register is faster. And the obtained product is stored in a preset storage area, so that the calculation efficiency can be further improved when the convolutional neural network performs subsequent calculation.

With further reference to fig. 5, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of an apparatus for storing data, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.

As shown in fig. 5, the apparatus 500 for storing data of the present embodiment includes: a first determining unit 501, configured to determine a sub-feature matrix to be subjected to convolution operation on a weight matrix corresponding to a target feature matrix from the target feature matrix in a preset convolutional neural network; a storage unit 502 configured to perform the following storage steps: extracting a preset number of weight data from the weight data which are not extracted in the weight matrix and storing the weight data in a first target register; extracting a preset number of feature data from the feature data which are not extracted in the sub-feature matrix and storing the feature data in a second target register; determining whether the quantity of the weight data which are not extracted in the weight matrix and the quantity of the feature data which are not extracted in the sub-feature matrix are both larger than or equal to a preset quantity; a second determining unit 503 configured to continue to perform the storing step in response to determining that each is greater than or equal to the preset number.

In this embodiment, the first determining unit 501 may determine a sub-feature matrix to be subjected to convolution operation with a weight matrix corresponding to a target feature matrix from the target feature matrix in a preset convolutional neural network. The convolutional neural network may be preset in the apparatus 500. In general, convolutional neural networks may include convolutional layers, which in turn include a feature matrix (i.e., a feature map in the form of a matrix) and a weight matrix (i.e., a convolutional kernel, also known as a filter). The feature matrix includes feature data and the weight matrix includes weight data. The feature matrix includes feature data that can be extracted from raw data (e.g., R (Red, Red) G (Green ) B (Blue) values of pixels) or output from a layer in a convolutional neural network (e.g., convolutional layer, pooling layer, etc.). The weight matrix includes weight data determined from training the convolutional neural network. In general, new feature data can be obtained after convolution operation is performed on the feature matrix and the weight matrix.

The target feature matrix may be selected in advance by a technician from feature matrices included in a convolutional neural network, or may be a feature matrix to be subjected to convolution operation selected by the first determining unit 501 (for example, selected according to the order of the channel numbers corresponding to the feature matrices).

In this embodiment, the storage unit 502 may perform the following storage steps:

step 5021, extracting a preset number of weight data from the weight data which are not extracted in the weight matrix and storing the weight data in a first target register.

Specifically, when step 5021 is performed for the first time, the unextracted weight data in the weight matrix is all the weight data in the weight matrix. In general, the storage unit 502 may extract the weight data in order of the position of the weight data in the weight matrix. For example, the weight data may have a corresponding row number and column number representing positions in the weight matrix, and the storage unit 502 may extract a predetermined number of weight data in order of the column numbers from small to large from the first row. The first target register may be a register of at least one register pre-allocated by the apparatus 500. The at least one register may be a register included in the CPU of the apparatus 500. The execution body may select a register as the first target register in various ways (for example, in the order of the number of registers, or in a pre-configured correspondence relationship between extracted weight data and the register) from the at least one register.

Step 5022, extracting a preset number of feature data from the feature data which are not extracted in the sub-feature matrix and storing the feature data in a second target register.

Specifically, when step 5022 is executed for the first time, the feature data in the sub-feature matrix that is not extracted is all the feature data in the sub-feature matrix. The second target register may be a register of at least one register pre-allocated by the apparatus 500. The at least one register may be a register included in the CPU of the apparatus 500. The storage unit 502 may select a register as the second destination register in various manners (for example, in the order of the number of the register or in the correspondence relationship between the extracted weight data and the register, which is configured in advance) from the at least one register. It should be noted that the method for extracting feature data in step 5022 is basically the same as the method for extracting weight data in step 5021, and is not described here again.

Step 5023, determining whether the quantity of the weight data which are not extracted in the weight matrix and the quantity of the feature data which are not extracted in the sub-feature matrix are both larger than or equal to a preset quantity.

In this embodiment, the second determining unit 503 may continue to perform the storing step in response to determining that the number of unextracted weight data in the weight matrix and the number of unextracted feature data in the sub-feature matrix are both greater than or equal to a preset number.

In some optional implementations of this embodiment, the storage unit 502 may be further configured to: in response to determining that the number of unextracted weight data in the weight matrix and the number of unextracted feature data in the sub-feature matrix are both greater than zero and less than a preset number, storing the unextracted weight data in the weight matrix into a first target register and storing the unextracted feature data in the sub-feature matrix into a second target register.

In some optional implementations of the embodiment, the feature data included in the feature matrix and the weight data included in the weight matrix in the convolutional neural network are fixed point numbers with preset number of bits.

In some optional implementations of this embodiment, the storage unit 502 may include: a calculation module (not shown in the figure) configured to multiply, for weight data in each weight data stored in the first destination register, the weight data by corresponding feature data stored in the second destination register to obtain a product; a storage module (not shown in the figure) configured to store the resulting product into a preset storage area.

In some optional implementations of this embodiment, the feature data included in the target feature matrix and the weight data included in the weight matrix are stored in a preset cache in advance.

In some optional implementations of the embodiment, the preset number is a quotient of a preset number of bits of data extracted by a single instruction multiple data stream SIMD instruction and a number of bits of feature data included in a feature matrix in the convolutional neural network.

In some optional implementations of this embodiment, the storage unit 502 may be further configured to: and extracting a preset number of weight data from the weight data which are not extracted in the weight matrix based on the SIMD instruction, and storing the weight data in a first target register.

In some optional implementations of this embodiment, the storage unit 502 may be further configured to: and extracting a preset number of feature data from the feature data which are not extracted in the sub-feature matrix based on the SIMD instruction, and storing the feature data in a second target register.

According to the device provided by the embodiment of the application, the weight data included in the weight matrix and the feature data included in the sub-feature matrix in the convolutional neural network are stored in the register, so that the characteristic of high access speed of the register is utilized, and the operation efficiency of the convolutional neural network is improved.

Referring now to FIG. 6, a block diagram of a computer system 600 suitable for use in implementing an electronic device (e.g., the server or terminal device shown in FIG. 1) of an embodiment of the present application is shown. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Liquid Crystal Display (LCD) and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 601.

It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable medium or any combination of the two. A computer readable medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a first determination unit, a storage unit, and a second determination unit. The names of the units do not form a limitation on the units themselves in some cases, for example, the first determination unit may also be described as a unit that determines a sub-feature matrix to be subjected to convolution operation on a weight matrix corresponding to a target feature matrix from the target feature matrix in a preset convolutional neural network.

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: determining a sub-feature matrix to be subjected to convolution operation on a weight matrix corresponding to a target feature matrix from the target feature matrix in a preset convolution neural network; the following storage steps are performed: extracting a preset number of weight data from the weight data which are not extracted in the weight matrix and storing the weight data in a first target register; extracting a preset number of feature data from the feature data which are not extracted in the sub-feature matrix and storing the feature data in a second target register; determining whether the quantity of the weight data which are not extracted in the weight matrix and the quantity of the feature data which are not extracted in the sub-feature matrix are both larger than or equal to a preset quantity; and responding to the determination that the number of the storage units is larger than or equal to the preset number, and continuing to execute the storage step.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A method for storing data, comprising:

determining a sub-feature matrix to be subjected to convolution operation on a weight matrix corresponding to a target feature matrix from the target feature matrix in a preset convolution neural network;

the following storage steps are performed: extracting a preset number of weight data from the weight data which are not extracted in the weight matrix and storing the weight data in a first target register; extracting a preset number of feature data from the feature data which are not extracted in the sub-feature matrix and storing the feature data in a second target register; determining whether the quantity of the weight data which are not extracted in the weight matrix and the quantity of the feature data which are not extracted in the sub-feature matrix are both larger than or equal to the preset quantity, wherein the feature data included in the target feature matrix and the weight data included in the weight matrix are stored in a preset cache in advance, and the preset cache is a cache included by the CPU;

and responding to the determination that the number of the storage units is larger than or equal to the preset number, and continuing to execute the storage step.

2. The method of claim 1, wherein the storing step further comprises:

in response to determining that the number of unextracted weight data in the weight matrix and the number of unextracted feature data in the sub-feature matrix are both greater than zero and less than the preset number, storing the unextracted weight data in the weight matrix into a first target register and storing the unextracted feature data in the sub-feature matrix into a second target register.

3. The method of claim 1, wherein the feature matrix in the convolutional neural network comprises feature data and the weight matrix comprises weight data that is a fixed-point number of a preset number of bits.

4. The method of claim 1, wherein the storing step further comprises:

for the weight data in each weight data stored in the first target register, multiplying the weight data by the corresponding characteristic data stored in the second target register to obtain a product;

and storing the obtained product into a preset storage area.

5. The method of any of claims 1-4, wherein the predetermined number is a quotient of a predetermined number of bits of data extracted a single time by a single instruction multiple data Stream (SIMD) instruction and a number of bits of feature data included in a feature matrix in the convolutional neural network.

6. The method of claim 5, wherein the extracting and storing a predetermined number of weight data from the weight data in the weight matrix that is not extracted into a first destination register comprises:

extracting a preset number of weight data from the weight data which are not extracted in the weight matrix based on the SIMD instruction and storing the weight data into a first target register.

7. The method of claim 5, wherein the extracting and storing a predetermined number of feature data from the unextracted feature data in the sub-feature matrix into a second destination register comprises:

and extracting a preset number of feature data from the unextracted feature data in the sub-feature matrix based on the SIMD instruction and storing the feature data in a second target register.

8. An apparatus for storing data, comprising:

the first determining unit is configured to determine a sub-feature matrix to be subjected to convolution operation on a weight matrix corresponding to a target feature matrix from the target feature matrix in a preset convolution neural network;

a storage unit configured to perform the following storage steps: extracting a preset number of weight data from the weight data which are not extracted in the weight matrix and storing the weight data in a first target register; extracting a preset number of feature data from the feature data which are not extracted in the sub-feature matrix and storing the feature data in a second target register; determining whether the quantity of the weight data which are not extracted in the weight matrix and the quantity of the feature data which are not extracted in the sub-feature matrix are both larger than or equal to the preset quantity, wherein the feature data included in the target feature matrix and the weight data included in the weight matrix are stored in a preset cache in advance, and the preset cache is a cache included by the CPU;

a second determination unit configured to continue to perform the storing step in response to determining that each is greater than or equal to the preset number.

9. The apparatus of claim 8, wherein the storage unit is further configured to:

10. The apparatus of claim 8, wherein the feature matrix in the convolutional neural network comprises feature data and the weight matrix comprises weight data that are fixed-point numbers of a preset number of bits.

11. The apparatus of claim 8, wherein the storage unit comprises:

a calculation module configured to multiply, for weight data among the weight data stored in the first target register, the weight data by corresponding feature data stored in the second target register to obtain a product;

and the storage module is configured to store the obtained product into a preset storage area.

12. The apparatus as in one of claims 8-11, where the preset number is a quotient of a preset number of bits of data extracted a single time by a single instruction multiple data stream SIMD instruction and a number of bits of feature data included in a feature matrix in the convolutional neural network.

13. The apparatus of claim 12, wherein the storage unit is further configured to:

14. The apparatus of claim 12, wherein the storage unit is further configured to:

15. An electronic device, comprising:

one or more processors, wherein the processors comprise registers;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.

16. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-7.