CN109308194B - Method and apparatus for storing data - Google Patents

Method and apparatus for storing data Download PDF

Info

Publication number
CN109308194B
CN109308194B CN201811149876.7A CN201811149876A CN109308194B CN 109308194 B CN109308194 B CN 109308194B CN 201811149876 A CN201811149876 A CN 201811149876A CN 109308194 B CN109308194 B CN 109308194B
Authority
CN
China
Prior art keywords
data
feature
weight
matrix
storing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811149876.7A
Other languages
Chinese (zh)
Other versions
CN109308194A (en
Inventor
胡耀全
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Douyin Vision Co Ltd
Douyin Vision Beijing Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN201811149876.7A priority Critical patent/CN109308194B/en
Publication of CN109308194A publication Critical patent/CN109308194A/en
Application granted granted Critical
Publication of CN109308194B publication Critical patent/CN109308194B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • G06F9/3887Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)
  • Complex Calculations (AREA)

Abstract

The embodiment of the application discloses a method and a device for storing data. One embodiment of the method comprises: determining a sub-feature matrix to be subjected to convolution operation on a weight matrix corresponding to a target feature matrix from the target feature matrix in a preset convolution neural network; the following storage steps are performed: extracting a preset number of weight data from the weight data which are not extracted in the weight matrix and storing the weight data in a first target register; extracting a preset number of feature data from the feature data which are not extracted in the sub-feature matrix and storing the feature data in a second target register; determining whether the quantity of the weight data which are not extracted in the weight matrix and the quantity of the feature data which are not extracted in the sub-feature matrix are both larger than or equal to a preset quantity; and responding to the determination that the number of the storage units is larger than or equal to the preset number, and continuing to execute the storage step. The implementation mode is beneficial to improving the operation efficiency of the convolutional neural network by utilizing the characteristic of high access speed of the register.

Description

Method and apparatus for storing data
Technical Field
The embodiment of the application relates to the technical field of computers, in particular to a method and a device for storing data.
Background
A Convolutional Neural Network (CNN) is a feed-forward Neural Network whose artificial neurons can respond to a portion of the coverage of surrounding cells, and performs well for large image processing. CNNs include convolutional layers (convolutional layers), pooling layers (pooling layers), and the like. When performing convolution operation on data in these layers, it is generally necessary to multiply feature data included in a feature matrix (i.e., a feature map in a matrix) with weight data included in a weight matrix (i.e., a convolution kernel (also referred to as a filter) in a matrix).
Disclosure of Invention
The embodiment of the application provides a method and a device for storing data.
In a first aspect, an embodiment of the present application provides a method for storing data, where the method includes: determining a sub-feature matrix to be subjected to convolution operation on a weight matrix corresponding to a target feature matrix from the target feature matrix in a preset convolution neural network; the following storage steps are performed: extracting a preset number of weight data from the weight data which are not extracted in the weight matrix and storing the weight data in a first target register; extracting a preset number of feature data from the feature data which are not extracted in the sub-feature matrix and storing the feature data in a second target register; determining whether the quantity of the weight data which are not extracted in the weight matrix and the quantity of the feature data which are not extracted in the sub-feature matrix are both larger than or equal to a preset quantity; and responding to the determination that the number of the storage units is larger than or equal to the preset number, and continuing to execute the storage step.
In some embodiments, the storing step further comprises: in response to determining that the number of unextracted weight data in the weight matrix and the number of unextracted feature data in the sub-feature matrix are both greater than zero and less than a preset number, storing the unextracted weight data in the weight matrix into a first target register and storing the unextracted feature data in the sub-feature matrix into a second target register.
In some embodiments, the feature matrix in the convolutional neural network includes feature data and the weight matrix includes weight data that is fixed-point numbers of a preset number of bits.
In some embodiments, the storing step further comprises: for the weight data in each weight data stored in the first target register, multiplying the weight data by the corresponding characteristic data stored in the second target register to obtain a product; and storing the obtained product into a preset storage area.
In some embodiments, the feature data included in the target feature matrix and the weight data included in the weight matrix are stored in a preset cache in advance.
In some embodiments, the preset number is a quotient of a preset number of bits of data extracted a single time by the single instruction multiple data stream SIMD instruction and a number of bits of feature data included in a feature matrix in the convolutional neural network.
In some embodiments, extracting a predetermined number of weight data from the weight data in the weight matrix that is not extracted and storing the weight data in the first target register includes: and extracting a preset number of weight data from the weight data which are not extracted in the weight matrix based on the SIMD instruction, and storing the weight data in a first target register.
In some embodiments, extracting a predetermined number of feature data from the feature data in the sub-feature matrix that is not extracted and storing the extracted feature data in the second target register includes:
and extracting a preset number of feature data from the feature data which are not extracted in the sub-feature matrix based on the SIMD instruction, and storing the feature data in a second target register.
In a second aspect, an embodiment of the present application provides an apparatus for storing data, the apparatus including: the first determining unit is configured to determine a sub-feature matrix to be subjected to convolution operation on a weight matrix corresponding to a target feature matrix from the target feature matrix in a preset convolution neural network; a storage unit configured to perform the following storage steps: extracting a preset number of weight data from the weight data which are not extracted in the weight matrix and storing the weight data in a first target register; extracting a preset number of feature data from the feature data which are not extracted in the sub-feature matrix and storing the feature data in a second target register; determining whether the quantity of the weight data which are not extracted in the weight matrix and the quantity of the feature data which are not extracted in the sub-feature matrix are both larger than or equal to a preset quantity; and a second determination unit configured to continue to perform the storing step in response to determining that each is greater than or equal to the preset number.
In some embodiments, the storage unit is further configured to: in response to determining that the number of unextracted weight data in the weight matrix and the number of unextracted feature data in the sub-feature matrix are both greater than zero and less than a preset number, storing the unextracted weight data in the weight matrix into a first target register and storing the unextracted feature data in the sub-feature matrix into a second target register.
In some embodiments, the feature matrix in the convolutional neural network includes feature data and the weight matrix includes weight data that is fixed-point numbers of a preset number of bits.
In some embodiments, the memory cell comprises: a calculation module configured to multiply, for weight data among the weight data stored in the first target register, the weight data by corresponding feature data stored in the second target register to obtain a product; and the storage module is configured to store the obtained product into a preset storage area.
In some embodiments, the feature data included in the target feature matrix and the weight data included in the weight matrix are stored in a preset cache in advance.
In some embodiments, the preset number is a quotient of a preset number of bits of data extracted a single time by the single instruction multiple data stream SIMD instruction and a number of bits of feature data included in a feature matrix in the convolutional neural network.
In some embodiments, the storage unit is further configured to: and extracting a preset number of weight data from the weight data which are not extracted in the weight matrix based on the SIMD instruction, and storing the weight data in a first target register.
In some embodiments, the storage unit is further configured to: and extracting a preset number of feature data from the feature data which are not extracted in the sub-feature matrix based on the SIMD instruction, and storing the feature data in a second target register.
In a third aspect, an embodiment of the present application provides an electronic device, where the server includes: one or more processors, wherein the processors comprise registers; a storage device having one or more programs stored thereon; when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method as described in any implementation of the first aspect.
In a fourth aspect, the present application provides a computer-readable medium, on which a computer program is stored, which, when executed by a processor, implements the method as described in any implementation manner of the first aspect.
According to the method and the device for storing data, a sub-feature matrix to be subjected to convolution operation on a weight matrix corresponding to a target feature matrix is determined from the target feature matrix in a preset convolution neural network. Repeatedly extracting a preset number of weight data from the weight data which are not extracted in the weight matrix and storing the weight data into the first target register, and extracting a preset number of feature data from the feature data which are not extracted in the sub-feature matrix and storing the feature data into the second target register. Therefore, the weight data included by the weight matrix and the characteristic data included by the sub-characteristic matrix can be stored in the register, and the characteristic of high access speed of the register is favorably utilized to improve the operation efficiency of the convolutional neural network.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present application may be applied;
FIG. 2 is a flow diagram of one embodiment of a method for storing data, according to an embodiment of the present application;
FIG. 3 is a schematic diagram of an application scenario of a method for storing data according to an embodiment of the present application;
FIG. 4 is a flow diagram of yet another embodiment of a method for storing data according to an embodiment of the present application;
FIG. 5 is a schematic block diagram illustrating one embodiment of an apparatus for storing data according to an embodiment of the present application;
FIG. 6 is a schematic block diagram of a computer system suitable for use in implementing an electronic device according to embodiments of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 illustrates an exemplary system architecture 100 to which the method for storing data or the apparatus for storing data of the embodiments of the present application may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as an image processing application, a video playing application, a web browser application, etc., may be installed on the terminal devices 101, 102, 103.
The terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
The server 105 may be a server that provides various services, such as a background data processing server that processes data such as images uploaded by the terminal apparatuses 101, 102, 103. The background data processing server can process data such as images by using a convolutional neural network, extract weight data used in the processing process and store the weight data into a first target register, and extract feature data used in the processing process and store the feature data into a second target register.
It should be noted that the method for storing data provided in the embodiment of the present application may be executed by the server 105, or may be executed by the terminal devices 101, 102, and 103, and accordingly, the apparatus for storing data may be disposed in the server 105, or may be disposed in the terminal devices 101, 102, and 103.
The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. In the case where data processed using a convolutional neural network does not need to be acquired from a remote location, the system architecture described above may not include a network, but only a terminal device or a server.
With continued reference to FIG. 2, a flow 200 of one embodiment of a method for storing data in accordance with the present application is shown. The method for storing data comprises the following steps:
step 201, determining a sub-feature matrix to be subjected to convolution operation on a weight matrix corresponding to a target feature matrix from the target feature matrix in a preset convolution neural network.
In this embodiment, an execution subject of the method for storing data (for example, a server or a terminal device shown in fig. 1) may determine a sub-feature matrix to be subjected to convolution operation with a weight matrix corresponding to a target feature matrix from the target feature matrices in a preset convolutional neural network. The convolutional neural network may be preset in the execution body, and is used for processing raw data (e.g., pictures, word vectors, etc.). In general, a convolutional neural network that processes raw data may include convolutional layers, which in turn include a feature matrix (i.e., a feature map in a matrix form) and a weight matrix (i.e., a convolutional kernel, also known as a filter). The feature matrix includes feature data and the weight matrix includes weight data. The feature matrix may include feature data extracted from raw data (e.g., R (Red) G (Green) B (Blue) values of pixels) or output from a layer (e.g., convolutional layer, pooling layer, etc.) in a convolutional neural network that processes the raw data. The weight matrix includes weight data determined from training the convolutional neural network. In general, new feature data can be obtained after convolution operation is performed on the feature matrix and the weight matrix.
The target feature matrix may be selected in advance by a technician from feature matrices included in the convolutional neural network, or may be a feature matrix to be subjected to convolution operation, which performs subject selection (for example, selection according to the order of the channel numbers corresponding to the feature matrices).
In general, when a convolutional neural network performs a convolution operation, it is necessary to extract a sub-feature matrix having the same number of rows and columns as that of a corresponding weight matrix from a feature matrix, and multiply data at the same position in the sub-feature matrix and the weight matrix. Wherein, the corresponding relation between the characteristic matrix and the weight matrix is preset. It should be noted that, since the convolutional neural network is a well-known technology widely studied and applied at present, details about a method for determining the sub-feature matrix convolved with the weight matrix are not repeated herein.
In some optional implementations of the embodiment, the feature data included in the feature matrix and the weight data included in the weight matrix in the convolutional neural network are fixed point numbers with preset number of bits. Because the electronic device has higher operation efficiency on the fixed point number than the floating point number, the characteristic data and the weight data in the convolutional neural network can be set as the fixed point number in the occasion that the precision requirement on the processing result of the convolutional neural network is not high (for example, the convolutional neural network is operated on a terminal device such as a mobile phone, a tablet computer and the like), so that the operation efficiency is improved. The bit numbers of the characteristic data and the weight data are set as the preset bit numbers, so that the bit numbers of the data which can be stored in the register can be fully utilized, and the access efficiency of the register is improved.
In some optional implementations of this embodiment, the feature data included in the target feature matrix and the weight data included in the weight matrix may be stored in a preset buffer in advance. The preset cache may be a cache (e.g., a first-level (L1) cache, a second-level (L2) cache, etc.) included in a Central Processing Unit (CPU) of the execution subject. When the electronic device performs data operation, the efficiency of reading data from the cache is higher than that of reading data from other storage devices such as an internal memory and a hard disk, so that the execution main body can load the feature matrix in the convolutional neural network into the preset cache in advance, and the efficiency of data access can be improved.
Step 202, the following storage steps are performed: extracting a preset number of weight data from the weight data which are not extracted in the weight matrix and storing the weight data in a first target register; extracting a preset number of feature data from the feature data which are not extracted in the sub-feature matrix and storing the feature data in a second target register; and determining whether the quantity of the weight data which are not extracted in the weight matrix and the quantity of the feature data which are not extracted in the sub-feature matrix are both larger than or equal to a preset quantity.
In this embodiment, based on the execution agent, the following storing step may be executed:
step 2021, extracting a predetermined number of weight data from the weight data not extracted in the weight matrix and storing the weight data in the first target register.
Specifically, when the executing agent executes step 2021 for the first time, the unextracted weight data in the weight matrix is all the weight data in the weight matrix. In general, the execution agent may extract the weight data in an order of arrangement of positions of the weight data in the weight matrix. For example, the weight data may have a corresponding row number and column number representing a position in the weight matrix, and the execution body may extract a preset number of weight data in order of the column number from small to large starting from the first row. The first target register may be a register which is set in advance to store weight data. The first target register may be a register of at least one register pre-allocated by the execution body. The at least one register may be a register included in the CPU of the execution main body. The execution body may select a register as the first target register in various ways (for example, in the order of the number of registers, or in a pre-configured correspondence relationship between extracted weight data and the register) from the at least one register.
In some optional implementations of this embodiment, the preset number is a quotient of a number of bits of Data extracted by a Single Instruction Multiple Data (SIMD) Instruction at a Single time and a number of bits of feature Data included in a feature matrix in the convolutional neural network. As an example, assuming that the SIMD instruction is a 64-bit instruction, that is, the number of bits of data extracted at a single time is 64 bits, and the number of bits of feature data included in the feature matrix in the above convolutional neural network is 16 bits, the preset number is 64/16 — 4.
Optionally, the SIMD instruction may be an NEON instruction, where the NEON instruction is a SIMD instruction suitable for an embedded microprocessor, and the SIMD instruction is specially designed, so that migration of software between different platforms is simplified, a data processing speed can be increased, and hardware power consumption is reduced. It should be understood that the execution body may employ other SIMD instructions, such as SSE (single instruction multiple data stream Extensions) instructions, in addition to the NEON instructions described above.
In some optional implementations of this embodiment, the execution entity may extract a preset number of weight data from the weight data that is not extracted in the weight matrix based on the SIMD instruction, and store the extracted weight data in the first target register. For existing SISD (Single Instruction Single Data stream), each Instruction can extract only one Data, while for SIMD, one Instruction can extract multiple Data. Since the processing of multiple data is parallel, SISD and SIMD are about the same in time, as far as the time of execution of one instruction. Since SIMD can process N (N is a positive integer) data at a time, its processing time is also shortened to 1/N of that of SISD. The execution main body uses SIMD instruction, so as to improve the efficiency of extracting weight data.
Step 2022, extracting a preset number of feature data from the feature data not extracted in the sub-feature matrix and storing the feature data in the second target register.
Specifically, when the executing entity executes step 2022 for the first time, the feature data in the sub-feature matrix that is not extracted is all the feature data in the sub-feature matrix. The second target register may be a preset register for storing the characteristic data. The second target register may be a register of at least one register pre-allocated by the execution body. The at least one register may be a register included in the CPU of the execution main body. The execution body may select a register as the second target register in various ways (for example, in the order of the number of the registers, or in a pre-configured correspondence relationship between the extracted feature data and the register) from the at least one register. It should be noted that the method for extracting feature data in step 2022 is substantially the same as the method for extracting weight data in step 2021, and details are not repeated here.
In some optional implementations of this embodiment, the execution main body may extract a preset number of feature data from the feature data that is not extracted in the sub-feature matrix based on the SIMD instruction, and store the extracted feature data in the second target register. Thereby improving the efficiency of extracting feature data.
Since the data stored in the register is processed faster than the data stored in other storage devices (e.g., a memory, a hard disk, etc.), the efficiency of data processing using the convolutional neural network can be improved by storing the weight data and the feature data in the register. In practice, technicians can adjust the quantity of the weight data included in the weight matrix and the quantity of the feature data included in the sub-feature matrix in advance according to the number of the registers and the number of bits of data stored in a single register, so that the utilization rate of the registers is maximized, and the operation efficiency of the convolutional neural network is improved to the maximum extent.
Step 2023, determining whether the number of the weight data not extracted in the weight matrix and the number of the feature data not extracted in the sub-feature matrix are both greater than or equal to a preset number.
And step 203, responding to the determination that the number is larger than or equal to the preset number, and continuing to execute the storage step.
In this embodiment, the executing agent may continue to execute the storing step in response to determining that the number of unextracted weight data in the weight matrix and the number of unextracted feature data in the sub-feature matrix are both greater than or equal to a preset number.
With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for storing data according to the present embodiment. In the application scenario of fig. 3, a convolutional neural network 302 is provided on a terminal device 301, and the terminal device 301 first determines a sub-feature matrix 305 to be subjected to convolution operation on a weight matrix 304 corresponding to the target feature matrix 303 from a target feature matrix 303 in the convolutional neural network 302. The sub-feature matrix 305 and the weight matrix 304 are both 4 rows and 4 columns. Then, the terminal device 301 performs the following storing step: extracting a preset number (for example, four) of weight data from the weight data in the weight matrix 304 that are not extracted, and storing the weight data in the first target register; extracting a preset number of feature data from the feature data which are not extracted in the sub-feature matrix 305 and storing the feature data into a second target register; it is determined whether the number of unextracted weight data in the weight matrix 304 and the number of unextracted feature data in the sub-feature matrix 305 are both equal to or greater than a preset number. And if the number of the data is larger than or equal to the preset number, continuing to execute the storage step. By repeatedly performing the above-mentioned storing steps, the final terminal device 301 stores four rows of weight data (i.e. 3041 and 3044 in the figure) included in the weight matrix 304 to the corresponding first destination registers D1, D2, D3 and D4, respectively, and stores four rows of feature data (i.e. 3051 and 3054 in the figure) included in the sub-feature matrix 305 to the corresponding second destination registers D5, D6, D7 and D8, respectively.
According to the method provided by the embodiment of the application, the weight data included in the weight matrix and the feature data included in the sub-feature matrix in the convolutional neural network are stored in the register, so that the characteristic of high access speed of the register is favorably utilized, and the operation efficiency of the convolutional neural network is improved.
With further reference to FIG. 4, a flow 400 of yet another embodiment of a method for storing data is shown. The process 400 of the method for storing data includes the steps of:
step 401, determining a sub-feature matrix to be subjected to convolution operation on a weight matrix corresponding to a target feature matrix from the target feature matrix in a preset convolution neural network.
In this embodiment, step 401 is substantially the same as step 201 in the corresponding embodiment of fig. 2, and is not described here again.
In this embodiment, after the execution of step 401, the execution main body may continue to execute the following storage steps, i.e. step 402 to step 407:
step 402, extracting a preset number of weight data from the weight data which are not extracted in the weight matrix and storing the weight data in a first target register.
In this embodiment, step 402 is substantially the same as step 2021 in the corresponding embodiment of fig. 2, and is not described herein again.
In step 403, a preset number of feature data are extracted from the feature data that are not extracted in the sub-feature matrix and stored in the second target register.
In this embodiment, step 403 is substantially the same as step 2022 in the corresponding embodiment of fig. 2, and is not described herein again.
In step 404, for the weight data in each weight data stored in the first target register, the weight data is multiplied by the corresponding feature data stored in the second target register to obtain a product.
In the present embodiment, for the weight data among the respective weight data stored in the first destination register, an executing subject of the method for storing data (e.g., the server or the terminal device shown in fig. 1) may multiply the weight data by the corresponding feature data stored in the second destination register to obtain a product.
Specifically, as an example, assume that the data stored in the first destination register includes: A. b, C, D, the data stored in the second destination register includes: E. f, G, H, wherein the position of A, B, C, D in the weight matrix is the same as the position of E, F, G, H in the sub-feature matrix, respectively, i.e., A, B, C, D corresponds to E, F, G, H, respectively, the resulting product includes: a × E, B × F, C × G, D × H.
Step 405, storing the obtained product into a preset storage area.
In this embodiment, the execution body may store the resultant product in a preset storage area. The preset storage area may be a storage area with a fast data access speed, such as a cache (e.g., a first-level cache, a second-level cache, etc.) included in the CPU of the execution main body, or a register (different from a register storing the feature data and the weight data) included in the CPU of the execution main body. The preset storage area has the characteristic of fast data access, so that the obtained product is stored in the preset storage area, and the calculation efficiency can be further improved when the convolutional neural network performs subsequent calculation.
Step 406, determining whether the number of the weight data which are not extracted in the weight matrix and the number of the feature data which are not extracted in the sub-feature matrix are both greater than or equal to a preset number.
In this embodiment, step 406 is substantially the same as step 2023 in the corresponding embodiment of fig. 2, and is not described herein again.
Step 407, in response to determining that all of the weights are greater than zero and less than the preset number, storing the weight data in the weight matrix that is not extracted into a first target register, and storing the feature data in the sub-feature matrix that is not extracted into a second target register.
In this embodiment, the execution body may store the unextracted weight data in the weight matrix into the first target register and store the unextracted feature data in the sub-feature matrix into the second target register in response to determining that both the number of unextracted weight data in the weight matrix and the number of unextracted feature data in the sub-feature matrix are greater than zero and less than a preset number. By executing the implementation mode, the weight data in the weight matrix and the characteristic data in the sub-characteristic matrix can be completely stored in the register, so that the characteristic of faster access data of the register can be utilized, and the operation efficiency of the convolutional neural network is improved.
And step 408, responding to the determination that the number is larger than or equal to the preset number, and continuing to execute the storage step.
In this embodiment, step 408 is substantially the same as step 203 in the corresponding embodiment of fig. 2, and is not described herein again.
As can be seen from fig. 4, compared with the embodiment corresponding to fig. 2, the flow 400 of the method for storing data in the present embodiment highlights the step of multiplying and storing each weight data in the first target register and the feature data in the second target register, and the step of extracting and storing the weight data and the feature data when the number of unextracted weight data in the weight matrix and the number of unextracted feature data in the sub-feature matrix are both greater than zero and less than a preset number. Therefore, the scheme described in the embodiment can further improve the operation efficiency of the convolutional neural network by utilizing the characteristic that the access data of the register is faster. And the obtained product is stored in a preset storage area, so that the calculation efficiency can be further improved when the convolutional neural network performs subsequent calculation.
With further reference to fig. 5, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of an apparatus for storing data, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.
As shown in fig. 5, the apparatus 500 for storing data of the present embodiment includes: a first determining unit 501, configured to determine a sub-feature matrix to be subjected to convolution operation on a weight matrix corresponding to a target feature matrix from the target feature matrix in a preset convolutional neural network; a storage unit 502 configured to perform the following storage steps: extracting a preset number of weight data from the weight data which are not extracted in the weight matrix and storing the weight data in a first target register; extracting a preset number of feature data from the feature data which are not extracted in the sub-feature matrix and storing the feature data in a second target register; determining whether the quantity of the weight data which are not extracted in the weight matrix and the quantity of the feature data which are not extracted in the sub-feature matrix are both larger than or equal to a preset quantity; a second determining unit 503 configured to continue to perform the storing step in response to determining that each is greater than or equal to the preset number.
In this embodiment, the first determining unit 501 may determine a sub-feature matrix to be subjected to convolution operation with a weight matrix corresponding to a target feature matrix from the target feature matrix in a preset convolutional neural network. The convolutional neural network may be preset in the apparatus 500. In general, convolutional neural networks may include convolutional layers, which in turn include a feature matrix (i.e., a feature map in the form of a matrix) and a weight matrix (i.e., a convolutional kernel, also known as a filter). The feature matrix includes feature data and the weight matrix includes weight data. The feature matrix includes feature data that can be extracted from raw data (e.g., R (Red, Red) G (Green ) B (Blue) values of pixels) or output from a layer in a convolutional neural network (e.g., convolutional layer, pooling layer, etc.). The weight matrix includes weight data determined from training the convolutional neural network. In general, new feature data can be obtained after convolution operation is performed on the feature matrix and the weight matrix.
The target feature matrix may be selected in advance by a technician from feature matrices included in a convolutional neural network, or may be a feature matrix to be subjected to convolution operation selected by the first determining unit 501 (for example, selected according to the order of the channel numbers corresponding to the feature matrices).
In general, when a convolutional neural network performs a convolution operation, it is necessary to extract a sub-feature matrix having the same number of rows and columns as that of a corresponding weight matrix from a feature matrix, and multiply data at the same position in the sub-feature matrix and the weight matrix. Wherein, the corresponding relation between the characteristic matrix and the weight matrix is preset. It should be noted that, since the convolutional neural network is a well-known technology widely studied and applied at present, details about a method for determining the sub-feature matrix convolved with the weight matrix are not repeated herein.
In this embodiment, the storage unit 502 may perform the following storage steps:
step 5021, extracting a preset number of weight data from the weight data which are not extracted in the weight matrix and storing the weight data in a first target register.
Specifically, when step 5021 is performed for the first time, the unextracted weight data in the weight matrix is all the weight data in the weight matrix. In general, the storage unit 502 may extract the weight data in order of the position of the weight data in the weight matrix. For example, the weight data may have a corresponding row number and column number representing positions in the weight matrix, and the storage unit 502 may extract a predetermined number of weight data in order of the column numbers from small to large from the first row. The first target register may be a register of at least one register pre-allocated by the apparatus 500. The at least one register may be a register included in the CPU of the apparatus 500. The execution body may select a register as the first target register in various ways (for example, in the order of the number of registers, or in a pre-configured correspondence relationship between extracted weight data and the register) from the at least one register.
Step 5022, extracting a preset number of feature data from the feature data which are not extracted in the sub-feature matrix and storing the feature data in a second target register.
Specifically, when step 5022 is executed for the first time, the feature data in the sub-feature matrix that is not extracted is all the feature data in the sub-feature matrix. The second target register may be a register of at least one register pre-allocated by the apparatus 500. The at least one register may be a register included in the CPU of the apparatus 500. The storage unit 502 may select a register as the second destination register in various manners (for example, in the order of the number of the register or in the correspondence relationship between the extracted weight data and the register, which is configured in advance) from the at least one register. It should be noted that the method for extracting feature data in step 5022 is basically the same as the method for extracting weight data in step 5021, and is not described here again.
Step 5023, determining whether the quantity of the weight data which are not extracted in the weight matrix and the quantity of the feature data which are not extracted in the sub-feature matrix are both larger than or equal to a preset quantity.
In this embodiment, the second determining unit 503 may continue to perform the storing step in response to determining that the number of unextracted weight data in the weight matrix and the number of unextracted feature data in the sub-feature matrix are both greater than or equal to a preset number.
In some optional implementations of this embodiment, the storage unit 502 may be further configured to: in response to determining that the number of unextracted weight data in the weight matrix and the number of unextracted feature data in the sub-feature matrix are both greater than zero and less than a preset number, storing the unextracted weight data in the weight matrix into a first target register and storing the unextracted feature data in the sub-feature matrix into a second target register.
In some optional implementations of the embodiment, the feature data included in the feature matrix and the weight data included in the weight matrix in the convolutional neural network are fixed point numbers with preset number of bits.
In some optional implementations of this embodiment, the storage unit 502 may include: a calculation module (not shown in the figure) configured to multiply, for weight data in each weight data stored in the first destination register, the weight data by corresponding feature data stored in the second destination register to obtain a product; a storage module (not shown in the figure) configured to store the resulting product into a preset storage area.
In some optional implementations of this embodiment, the feature data included in the target feature matrix and the weight data included in the weight matrix are stored in a preset cache in advance.
In some optional implementations of the embodiment, the preset number is a quotient of a preset number of bits of data extracted by a single instruction multiple data stream SIMD instruction and a number of bits of feature data included in a feature matrix in the convolutional neural network.
In some optional implementations of this embodiment, the storage unit 502 may be further configured to: and extracting a preset number of weight data from the weight data which are not extracted in the weight matrix based on the SIMD instruction, and storing the weight data in a first target register.
In some optional implementations of this embodiment, the storage unit 502 may be further configured to: and extracting a preset number of feature data from the feature data which are not extracted in the sub-feature matrix based on the SIMD instruction, and storing the feature data in a second target register.
According to the device provided by the embodiment of the application, the weight data included in the weight matrix and the feature data included in the sub-feature matrix in the convolutional neural network are stored in the register, so that the characteristic of high access speed of the register is utilized, and the operation efficiency of the convolutional neural network is improved.
Referring now to FIG. 6, a block diagram of a computer system 600 suitable for use in implementing an electronic device (e.g., the server or terminal device shown in FIG. 1) of an embodiment of the present application is shown. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Liquid Crystal Display (LCD) and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 601.
It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable medium or any combination of the two. A computer readable medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a first determination unit, a storage unit, and a second determination unit. The names of the units do not form a limitation on the units themselves in some cases, for example, the first determination unit may also be described as a unit that determines a sub-feature matrix to be subjected to convolution operation on a weight matrix corresponding to a target feature matrix from the target feature matrix in a preset convolutional neural network.
As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: determining a sub-feature matrix to be subjected to convolution operation on a weight matrix corresponding to a target feature matrix from the target feature matrix in a preset convolution neural network; the following storage steps are performed: extracting a preset number of weight data from the weight data which are not extracted in the weight matrix and storing the weight data in a first target register; extracting a preset number of feature data from the feature data which are not extracted in the sub-feature matrix and storing the feature data in a second target register; determining whether the quantity of the weight data which are not extracted in the weight matrix and the quantity of the feature data which are not extracted in the sub-feature matrix are both larger than or equal to a preset quantity; and responding to the determination that the number of the storage units is larger than or equal to the preset number, and continuing to execute the storage step.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (16)

1. A method for storing data, comprising:
determining a sub-feature matrix to be subjected to convolution operation on a weight matrix corresponding to a target feature matrix from the target feature matrix in a preset convolution neural network;
the following storage steps are performed: extracting a preset number of weight data from the weight data which are not extracted in the weight matrix and storing the weight data in a first target register; extracting a preset number of feature data from the feature data which are not extracted in the sub-feature matrix and storing the feature data in a second target register; determining whether the quantity of the weight data which are not extracted in the weight matrix and the quantity of the feature data which are not extracted in the sub-feature matrix are both larger than or equal to the preset quantity, wherein the feature data included in the target feature matrix and the weight data included in the weight matrix are stored in a preset cache in advance, and the preset cache is a cache included by the CPU;
and responding to the determination that the number of the storage units is larger than or equal to the preset number, and continuing to execute the storage step.
2. The method of claim 1, wherein the storing step further comprises:
in response to determining that the number of unextracted weight data in the weight matrix and the number of unextracted feature data in the sub-feature matrix are both greater than zero and less than the preset number, storing the unextracted weight data in the weight matrix into a first target register and storing the unextracted feature data in the sub-feature matrix into a second target register.
3. The method of claim 1, wherein the feature matrix in the convolutional neural network comprises feature data and the weight matrix comprises weight data that is a fixed-point number of a preset number of bits.
4. The method of claim 1, wherein the storing step further comprises:
for the weight data in each weight data stored in the first target register, multiplying the weight data by the corresponding characteristic data stored in the second target register to obtain a product;
and storing the obtained product into a preset storage area.
5. The method of any of claims 1-4, wherein the predetermined number is a quotient of a predetermined number of bits of data extracted a single time by a single instruction multiple data Stream (SIMD) instruction and a number of bits of feature data included in a feature matrix in the convolutional neural network.
6. The method of claim 5, wherein the extracting and storing a predetermined number of weight data from the weight data in the weight matrix that is not extracted into a first destination register comprises:
extracting a preset number of weight data from the weight data which are not extracted in the weight matrix based on the SIMD instruction and storing the weight data into a first target register.
7. The method of claim 5, wherein the extracting and storing a predetermined number of feature data from the unextracted feature data in the sub-feature matrix into a second destination register comprises:
and extracting a preset number of feature data from the unextracted feature data in the sub-feature matrix based on the SIMD instruction and storing the feature data in a second target register.
8. An apparatus for storing data, comprising:
the first determining unit is configured to determine a sub-feature matrix to be subjected to convolution operation on a weight matrix corresponding to a target feature matrix from the target feature matrix in a preset convolution neural network;
a storage unit configured to perform the following storage steps: extracting a preset number of weight data from the weight data which are not extracted in the weight matrix and storing the weight data in a first target register; extracting a preset number of feature data from the feature data which are not extracted in the sub-feature matrix and storing the feature data in a second target register; determining whether the quantity of the weight data which are not extracted in the weight matrix and the quantity of the feature data which are not extracted in the sub-feature matrix are both larger than or equal to the preset quantity, wherein the feature data included in the target feature matrix and the weight data included in the weight matrix are stored in a preset cache in advance, and the preset cache is a cache included by the CPU;
a second determination unit configured to continue to perform the storing step in response to determining that each is greater than or equal to the preset number.
9. The apparatus of claim 8, wherein the storage unit is further configured to:
in response to determining that the number of unextracted weight data in the weight matrix and the number of unextracted feature data in the sub-feature matrix are both greater than zero and less than the preset number, storing the unextracted weight data in the weight matrix into a first target register and storing the unextracted feature data in the sub-feature matrix into a second target register.
10. The apparatus of claim 8, wherein the feature matrix in the convolutional neural network comprises feature data and the weight matrix comprises weight data that are fixed-point numbers of a preset number of bits.
11. The apparatus of claim 8, wherein the storage unit comprises:
a calculation module configured to multiply, for weight data among the weight data stored in the first target register, the weight data by corresponding feature data stored in the second target register to obtain a product;
and the storage module is configured to store the obtained product into a preset storage area.
12. The apparatus as in one of claims 8-11, where the preset number is a quotient of a preset number of bits of data extracted a single time by a single instruction multiple data stream SIMD instruction and a number of bits of feature data included in a feature matrix in the convolutional neural network.
13. The apparatus of claim 12, wherein the storage unit is further configured to:
extracting a preset number of weight data from the weight data which are not extracted in the weight matrix based on the SIMD instruction and storing the weight data into a first target register.
14. The apparatus of claim 12, wherein the storage unit is further configured to:
and extracting a preset number of feature data from the unextracted feature data in the sub-feature matrix based on the SIMD instruction and storing the feature data in a second target register.
15. An electronic device, comprising:
one or more processors, wherein the processors comprise registers;
a storage device having one or more programs stored thereon,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.
16. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-7.
CN201811149876.7A 2018-09-29 2018-09-29 Method and apparatus for storing data Active CN109308194B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811149876.7A CN109308194B (en) 2018-09-29 2018-09-29 Method and apparatus for storing data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811149876.7A CN109308194B (en) 2018-09-29 2018-09-29 Method and apparatus for storing data

Publications (2)

Publication Number Publication Date
CN109308194A CN109308194A (en) 2019-02-05
CN109308194B true CN109308194B (en) 2021-08-10

Family

ID=65225384

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811149876.7A Active CN109308194B (en) 2018-09-29 2018-09-29 Method and apparatus for storing data

Country Status (1)

Country Link
CN (1) CN109308194B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117908994A (en) * 2024-03-20 2024-04-19 腾讯科技(深圳)有限公司 Method, device and equipment for processing media information and readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105786448A (en) * 2014-12-26 2016-07-20 深圳市中兴微电子技术有限公司 Instruction scheduling method and device
US9977744B2 (en) * 2015-07-30 2018-05-22 SK Hynix Inc. Memory system and operating method thereof
CN108416434A (en) * 2018-02-07 2018-08-17 复旦大学 The circuit structure accelerated with full articulamentum for the convolutional layer of neural network

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10380064B2 (en) * 2015-10-08 2019-08-13 Via Alliance Semiconductor Co., Ltd. Neural network unit employing user-supplied reciprocal for normalizing an accumulated value
US10565494B2 (en) * 2016-12-31 2020-02-18 Via Alliance Semiconductor Co., Ltd. Neural network unit with segmentable array width rotator
US11210584B2 (en) * 2017-01-31 2021-12-28 International Business Machines Corporation Memory efficient convolution operations in deep learning neural networks
CN107578055B (en) * 2017-06-20 2020-04-14 北京陌上花科技有限公司 Image prediction method and device
CN108288089A (en) * 2018-01-29 2018-07-17 百度在线网络技术(北京)有限公司 Method and apparatus for generating convolutional neural networks

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105786448A (en) * 2014-12-26 2016-07-20 深圳市中兴微电子技术有限公司 Instruction scheduling method and device
US9977744B2 (en) * 2015-07-30 2018-05-22 SK Hynix Inc. Memory system and operating method thereof
CN108416434A (en) * 2018-02-07 2018-08-17 复旦大学 The circuit structure accelerated with full articulamentum for the convolutional layer of neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
面向媒体应用的高能效神经网络可重构阵列结构与系统调度;张冬明;《中国优秀硕士学位论文全文数据库(电子期刊)》;20180415(第4期);第I140-64页 *

Also Published As

Publication number Publication date
CN109308194A (en) 2019-02-05

Similar Documents

Publication Publication Date Title
CN110582785B (en) Power efficient deep neural network module configured for executing layer descriptor lists
US11216726B2 (en) Batch processing in a neural network processor
CN110929865B (en) Network quantification method, service processing method and related product
US20200117981A1 (en) Data representation for dynamic precision in neural network cores
EP3564863B1 (en) Apparatus for executing lstm neural network operation, and operational method
US20220083857A1 (en) Convolutional neural network operation method and device
CN110825436B (en) Calculation method applied to artificial intelligence chip and artificial intelligence chip
CN108108190B (en) Calculation method and related product
CN115880132B (en) Graphics processor, matrix multiplication task processing method, device and storage medium
CN108595211B (en) Method and apparatus for outputting data
CN110826706B (en) Data processing method and device for neural network
CN107957977B (en) Calculation method and related product
EP4260174A1 (en) Data-type-aware clock-gating
CN116385328A (en) Image data enhancement method and device based on noise addition to image
CN109165723B (en) Method and apparatus for processing data
CN109308194B (en) Method and apparatus for storing data
CN108108189B (en) Calculation method and related product
CN109375952B (en) Method and apparatus for storing data
US11435941B1 (en) Matrix transpose hardware acceleration
CN112348182A (en) Neural network maxout layer computing device
US20230196086A1 (en) Increased precision neural processing element
CN111813721A (en) Neural network data processing method, device, equipment and storage medium
US11636569B1 (en) Matrix transpose hardware acceleration
CN111723917B (en) Operation method, device and related product
CN110825311B (en) Method and apparatus for storing data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee after: Tiktok vision (Beijing) Co.,Ltd.

Address before: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee before: BEIJING BYTEDANCE NETWORK TECHNOLOGY Co.,Ltd.

Address after: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee after: Douyin Vision Co.,Ltd.

Address before: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee before: Tiktok vision (Beijing) Co.,Ltd.

CP01 Change in the name or title of a patent holder