CN112099737A - Method, device and equipment for storing data and storage medium - Google Patents

Method, device and equipment for storing data and storage medium Download PDF

Info

Publication number
CN112099737A
CN112099737A CN202011069674.9A CN202011069674A CN112099737A CN 112099737 A CN112099737 A CN 112099737A CN 202011069674 A CN202011069674 A CN 202011069674A CN 112099737 A CN112099737 A CN 112099737A
Authority
CN
China
Prior art keywords
data
column
row
storing
input data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011069674.9A
Other languages
Chinese (zh)
Other versions
CN112099737B (en
Inventor
朱琳
韩布和
陈振
王春杰
王天飞
王磊
张红光
刘倩
吴甜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202011069674.9A priority Critical patent/CN112099737B/en
Publication of CN112099737A publication Critical patent/CN112099737A/en
Application granted granted Critical
Publication of CN112099737B publication Critical patent/CN112099737B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application discloses a method, a device, equipment and a medium for storing data, which can be applied to the technical field of artificial intelligence computer vision and the technical field of natural language processing. The specific implementation scheme is as follows: obtaining first input data using a convolutional neural network, the first input data comprising a plurality of first data blocks arranged in a first M x N array, the first data block of the mth row and the nth column being stored as a first subsection of the kth field; storing a first data block of the (m + i) th row and the (N + j) th column as a first subsection of the [ k + (i × N + j) ] th field; and obtaining second input data using the convolutional neural network, the second input data including a plurality of second data blocks arranged in a second M x N array, the second data blocks of the M row and N column being stored as second fields of the kth field; and storing the second data block of the (m + i) th row and the (N + j) th column as a second subsection of the [ K + (i multiplied by N + j) ] th field to obtain spliced storage data comprising the first field to the K field.

Description

Method, device and equipment for storing data and storage medium
Technical Field
The present application relates to the field of computer technologies, in particular to the field of computer vision technologies and natural language processing technologies for artificial intelligence, and more particularly to a method, an apparatus, a device, and a storage medium for storing data.
Background
In deep learning, in order to improve the accuracy of an output result, the features of each processing layer need to be spliced and input to a subsequent full-connection layer to connect the features, and a classifier is adopted to obtain a classification result.
In the related art, a Concat layer is set to realize the combined splicing of feature maps in the channel (Cannel) dimension. Generally, when Data in a feature map is combined and spliced, the Data to be spliced needs to be stored in a Double Data Rate (DDR) in a whole block, and then a Data relocation operation is performed by a processor, and a storage space is additionally arranged in the DDR, so that the Data is rearranged according to a splicing sequence and then stored in the additionally arranged storage space.
Disclosure of Invention
A method, apparatus, device and storage medium for storing data are provided that facilitate improving data splicing efficiency.
According to a first aspect, there is provided a method of storing data, comprising: obtaining first input data using a convolutional neural network, the first input data comprising a plurality of first data blocks arranged in a first M x N array, the first data block of the mth row and the nth column being stored as a first subsection of the kth field; storing a first data block of the (m + i) th row and the (N + j) th column as a first subsection of the [ k + (i × N + j) ] th field; and obtaining second input data using the convolutional neural network, the second input data including a plurality of second data blocks arranged in a second mxn array, the second data block of the mth row and nth column being stored as a second subsection of the kth field; and storing the second data block of the (m + i) th row and the (N + j) th column as a second subsection of the [ K + (i multiplied by N + j) ] th field to obtain spliced storage data comprising the first field to the K field. Wherein K is mxn, K is mxn; m, N, M and N are positive integers, i and j are non-negative integers, M is more than or equal to (M + i) and less than or equal to M, and N is more than or equal to (N + j) and less than or equal to M.
According to a second aspect, there is provided an apparatus for storing data, comprising: a first data acquisition module for acquiring first input data using a convolutional neural network, the first input data comprising a plurality of first data blocks arranged in a first mxn array; a first data storage module, configured to store a first data block in an mth row and an nth column as a first subsegment of a kth field, and store a first data block in an (m + i) th row and an (N + j) th column as a first subsegment of a [ k + (i × N + j) ] th field; and a second data acquisition module for acquiring second input data using the convolutional neural network, the second input data comprising a plurality of second data blocks arranged in a second mxn array; and a second data storage module, configured to store a second data block in an mth row and an nth column as a second subsegment of a kth field, and store a second data block in an (M + i) th row and an (N + j) th column as a second subsegment of a [ K + (i × N + j) ] th field, so as to obtain concatenated storage data including the first field to the kth field, where K is mxn and K is mxn; m, N, M and N are positive integers, i and j are non-negative integers, M is more than or equal to (M + i) and less than or equal to M, and N is more than or equal to (N + j) and less than or equal to M.
According to a third aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform the method for storing data provided herein.
According to a fourth aspect, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of storing data provided herein.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
FIG. 1 is a schematic diagram of an application scenario of a method, an apparatus, a device and a storage medium for storing data according to an embodiment of the present application;
FIG. 2 is a schematic flow chart diagram of a method of storing data according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a method of storing data according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a method of storing data according to another embodiment of the present application;
FIG. 5 is a schematic diagram of a method of storing data according to yet another embodiment of the present application;
FIG. 6 is a schematic diagram of a method of storing data according to yet another embodiment of the present application;
FIG. 7 is a schematic diagram of a method of performing a method of storing data using a convolutional neural network according to an embodiment of the present application;
FIG. 8 is a block diagram of an apparatus for storing data according to an embodiment of the present application; and
fig. 9 is a block diagram of an electronic device for implementing a method of storing data according to an embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
A method of storing data is provided for storing at least two input data acquired using a convolutional neural network. The at least two input data comprise data blocks arranged in an array of the same size. And storing two data blocks positioned at the adjacent position of the array in each input data in an interval storage mode, so that a storage space for storing other input data is reserved between the two data blocks positioned at the adjacent position of the array. Wherein the plurality of data blocks in each input data comprise an equal amount of data. The size of the left-over storage space is set according to the sum of the data amount included in each single data block of at least two input data. By the data storage method, the stored data can be arranged in the sequence required by splicing at least two input data. Compared with the related technology, the stored data does not need to be moved and rearranged, and a processor (such as a CPU) does not need to additionally initiate memory operation, so that the data processing efficiency of the convolutional neural network can be effectively improved.
An application scenario of the method and apparatus of the embodiment will be described below with reference to fig. 1.
Fig. 1 is a schematic view of an application scenario of a method, an apparatus, a device and a storage medium for storing data according to an embodiment of the present application.
As shown in fig. 1, the application scenario 100 of the embodiment may include, for example, an electronic device 110, and a processor 111 and a memory 112 are disposed in the electronic device 110. The processor 111 is communicatively connected to the memory 112, and the processor 111 can access the memory 112 to retrieve data from the memory 112 or write data to the memory 112.
According to an embodiment of the present application, the electronic device 110 may be, for example, a smart phone, a tablet computer, a laptop computer, a desktop computer, a server, or other electronic devices capable of integrating a processing chip. The server may be, for example, an application server, a server of a distributed system, or a server that incorporates a blockchain. Illustratively, the server may also be a virtual server or a cloud server, for example.
According to an embodiment of the present application, the electronic device 110 may further be provided with an artificial Neural Network chip, for example, and the processor is connected in communication with the artificial Neural Network chip, so that the processor can call a Convolutional Neural Network (CNN) provided by the artificial Neural Network chip to process data. For example, the electronic device 110 may perform feature extraction on the input image 120 or a locally stored picture by calling a convolutional neural network, and write the extracted feature data 130 into the memory 112 for subsequent calling. The electronic device 110 may also perform feature extraction on the input text 140 or locally stored text by invoking a convolutional neural network, and write the extracted feature data into the memory 112, for example.
According to an embodiment of the application, the processor may invoke a convolutional neural network to classify the image 120, or identify objects in the image 120, or the like. Alternatively, the processor may invoke a convolutional neural network to classify the text 140, or to identify words in the text 140, or the like. The feature data 130 may be data output by processing layers (e.g., convolutional layers, pooling layers, normalization layers, activation layers) in the convolutional neural network in invoking the convolutional neural network to process the image 120. This feature data 130 is used as input to a fully connected layer or output layer adjacent to the last processing layer in the convolutional neural network.
According to an embodiment of the present application, when the convolutional neural network includes a plurality of processing layers, a connection layer (Concat layer) is also generally disposed in the convolutional neural network, and in the process of the processor calling the convolutional neural network to process the image 120, the Concat layer is used to splice output data of the plurality of processing layers, and store the spliced data into the memory 112 via the processor. By setting the Concat layer, the context semantic information of the text (or the characteristics of different dimensionalities of the image) can be utilized, and the characteristic data obtained by the plurality of processing layers can be spliced, so that the purpose of characteristic combination is achieved. In this case, the data stored in the memory 112 by the processor 111 is the feature data obtained by stitching. The Concat layer may implement data splicing, and the spliced data is read from the memory 112 and input to an adjacent full connection layer or output layer.
According to the embodiment of the present application, in view of the huge amount of data generated during the convolutional neural network processing, in order to facilitate splicing the data before writing the data into the memory 112, a System-on-Chip (SOC-Chip) implementation scheme is generally adopted, and data splicing is completed through frequent data transmission between the processor 111 and the memory 112. The memory 112 is typically a Double Data Rate (DDR) memory. However, since the DDR has a limited bandwidth, the speed of the SOC chip processing the output data of the convolutional neural network is limited to a certain extent.
In the related art, an ARM processor (Advanced RISC Machines processor) embedded in a multiprocessor system on chip (ZYNQ MPSOC) chip is mainly used to complete data splicing. In the specific processing process, all the feature data to be spliced are obtained, and after all the feature data to be spliced are obtained, the whole data block is written into the corresponding position of each DDR to complete splicing preparation. And then scheduling one-time integral memory transport by the processor to rearrange all the characteristic data to be spliced, copying the arranged and combined data to a newly allocated memory space of the DDR, and completing data splicing.
Illustratively, for the sizes H × W × C, respectively0、H×W×C1、...H×W×CnThe data splicing operation of one time is completed by N groups of tensor data, firstly, N groups of data to be spliced are prepared at respective positions of DDR, then, an ARM processor schedules N times of data moving operation, and the moving granularity is respectively C0、C1、...CnThe small granularity data is moved, but the efficiency of the operation process of the processor memory with smaller granularity is relatively lower, the processing delay of the splicing operation with higher dimensionality is relatively larger, and even the processing delay occupies a larger proportion of the processing delay of the whole convolutional neural network; then, the product is processedTherefore, the whole data splicing operation needs the processor to start a plurality of threads to complete, and a large scheduling load is generated for the edge side SOC chip with processor resource shortage and poor processing capability.
The embodiment of the application can adopt an edge neural network acceleration suite (EdgeBoard platform) of a ZYNQ MPSOC chip provided by Xilinx company, adopt a mode of combining splicing operation and calculation operation of a processing layer, and store two adjacent data in the feature data output by the same processing layer into a DDR (double data rate) in a skip writing mode according to a preset writing rule after obtaining the feature data, thereby reserving a writing space of the feature data output by other processing layers between the two data. The arrangement sequence of the plurality of feature data finally written into the DDR is a splicing sequence. By this way, data movement is not required, and thus unnecessary chip resources are not consumed. Therefore, the number of times of DDR memory access by the ZYNQ MPSOC chip can be reduced, the cost of data interaction inside and outside the chip is reduced, and the processing performance of the convolutional neural network is improved.
It can be understood that, because the convolutional neural network can identify images or texts, the artificial neural network chip can be deployed in the applications of image or text identification of industry and agriculture, and image or text detection.
It should be understood that the method for storing data provided herein may be generally performed by the electronic device 110, and the apparatus for storing data provided herein may be disposed in the electronic device 110, for example, and may be specifically integrated in a processing chip disposed in the electronic device 110.
A method for storing data in a form of skip-writing into the storage data of the DDR provided by the present application will be described in detail with reference to fig. 2 to 7 in conjunction with fig. 1. It will be appreciated that storing data in the DDR is merely an example, and in practical applications, the data may be stored in any suitable memory.
Fig. 2 is a schematic flow chart of a method of storing data according to an embodiment of the present application.
As shown in fig. 2, the method 200 of storing data of this embodiment includes operations S210 to S260.
In operation S210, first input data is acquired using a convolutional neural network, the first input data including a plurality of first data blocks arranged in a first mxn array.
According to an embodiment of the present application, the first input data may be, for example, data output by any one of a plurality of processing layers in a convolutional neural network when processing an image or a text by using the convolutional neural network, the data is in a matrix form, and the number of rows M and the number of columns N of the data are both positive integers.
In operation S220, a first data block of the mth row and the nth column is stored as a first subsegment of the kth field.
In operation S230, a first data block of the (m + i) th row and the (N + j) th column is stored as a first subsegment of the [ k + (i × N + j) ] th field.
According to an embodiment of the present application, in order to directly splice the first input data with data output by other processing layers in the convolutional neural network, the embodiment may assume that the finally obtained spliced storage data has K fields, where K is M × N. After the plurality of input data are stored, each field comprises a plurality of data blocks which are positioned at the same position in the plurality of input data needing to be spliced. The data blocks can be sequentially ordered according to the acquired sequence and respectively stored as a first subsection, a second subsection and the like, so that each field comprises a plurality of subsections, and the number of the subsections is equal to the number of input data to be spliced.
Accordingly, a field corresponding to a first data block of an mth row and an nth column in the first M × N array may be determined as a kth field, where k is a product of M and N. And determining the field corresponding to the first data block of the (M + i) th row and the (N + j) th column in the first M × N array as the [ k + (i × N + j) ] th field. When the first input data is the data with the earliest output time in the plurality of input data output by the convolutional neural network, storing the first data block in the mth row and the nth column as the first subsegment of the kth field, and storing the first data block in the (m + i) th row and the (N + j) th column as the first subsegment of the [ k + (i × N + j) ] field. Wherein M and n are positive integers, i and j are non-negative integers, M is not less than (M + i) and not more than M, and n is not less than (n + j) and not more than M. Wherein the plurality of input data are needed to be spliced and then used as the input of a full connection layer or an output layer after the last processing layer.
It can be understood that, when the first input data is the p-th input data in the plurality of input data output by the convolutional neural network, the first data block in the m-th row and the N-th column is stored as the p-th sub-segment of the k-th field, and the first data block in the (m + i) -th row and the (N-th column is stored as the p-th sub-segment of the [ k + (i × N + j) ] field.
In operation S240, second input data is acquired using a convolutional neural network. The second input data includes a plurality of second data blocks arranged in a second M x N array.
According to an embodiment of the present application, the second input data may be, for example, data output by other processing layers except any one of the processing layers in the convolutional neural network when processing an image or text by using the convolutional neural network, and the data is in a matrix form.
In operation S250, a second data block of the mth row and the nth column is stored as a second subsegment of the kth field.
In operation S260, a second data block of the (m + i) th row and the (N + j) th column is stored as a second subsegment of the [ k + (i × N + j) ] th field.
According to an embodiment of the present application, the operations S250 and S260 are similar to the aforementioned operations S220 and S230. And when the second input data is the second input data in the plurality of input data output by the convolutional neural network, storing the second data block of the m rows and the N columns as the second subsegment of the k field, and storing the second data block of the (m + i) th row and the (N + j) th column as the second subsegment of the [ k + (i × N + j) ] field.
According to the embodiment of the application, when the number of the input data output by the convolutional neural network is two, all the first data blocks in the first input data and all the second data blocks in the second input data can be stored in the preset storage space in a similar manner to the foregoing operations S210 to S260. A first data block located in an nth row and column in the first M N array is stored adjacent to a second data block located in an nth row and column in the second M N array. A second data block in the second M N array at the nth row and column is stored adjacent to a first data block in the first M N array at the (N +1) th row and column. Thereby obtaining the spliced storage data comprising the first field to the Kth field which are stored sequentially. When other layers behind the processing layer of the convolutional neural network execute operation, the spliced storage data can be called as input. The predetermined memory space may be the DDR or any other memory described above.
According to an embodiment of the application, each of a plurality of input data to be spliced output by the convolutional neural network comprises data blocks that can be arranged in an mxn array. Thereby realizing the splicing of the plurality of input data.
In summary, in the embodiment of the present application, a plurality of input data obtained by using a convolutional neural network are stored according to a method for storing data, so that the data obtained by storage is data obtained by splicing Concat layers in the related art. Compared with the technical scheme that the Concat layer is adopted to rearrange and write the data in the related technology, the processor (such as a CPU) does not need to additionally initiate memory operation because the stored data does not need to be moved and rearranged, and therefore the efficiency of the convolutional neural network for processing the data can be effectively improved.
According to the embodiment of the application, when the input data to be spliced obtained by utilizing the convolutional neural network further comprises third input data obtained after the first input data and the second input data are obtained in addition to the first input data and the second input data. This embodiment may store the preset storage space in a similar manner to the first input data and the second input data.
Illustratively, the method of storing data of the present application further comprises the operation of obtaining third input data using a convolutional neural network, the third input data comprising a plurality of third data blocks arranged in a third mxn array. After the third input data is obtained, the method further includes storing a third data block in m rows and N columns as a third subsegment of a k field, and storing a third data block in (m + i) th row and (N + j) th column as a third subsegment of a [ k + (i × N + j) ] field. By the embodiment, the arrangement sequence of the three input data stored to the DDR can be the sequence in splicing, so that operations such as data rearrangement are not needed.
According to an embodiment of the application, when the input data to be spliced acquired by the convolutional neural network is image feature data, the input data comprises a plurality of data blocks arranged in an M × N array, and each data block can indicate pixel information of one pixel in the image feature data. The pixel information may include at least one of: color information, brightness, transparency, saturation, etc. The color information may be values of an R channel, a G channel, and a B channel expressed by an RGB standard. Alternatively, the color information may be information expressed by an NSTC color gamut, or the like. It will be appreciated that all data blocks comprise equal amounts of data in the same input data.
According to embodiments of the present application, the data blocks in the input data may further comprise row positions and column positions in the arranged array. When the spliced storage data needs to be called, the spliced storage data can be arranged into an array form according to the row position and the column position of the data block.
Illustratively, a first data block in the first input data includes row position information and column position information in the first array, and the first data block further includes first pixel information. Similarly, a second block of data in the second input data includes row position information and column position information in the second array, and the second block of data also includes second pixel information. The types and numbers of the first pixel information and the second pixel information may be the same or different, for example, the first pixel information may include a value of an R channel, a value of a G channel, and a value of a B channel. The second pixel information may include a hue value, a saturation value, and a brightness value.
According to an embodiment of the application, in a case that each data block may indicate pixel information of one pixel in the image feature data, and the pixel information includes a plurality of values, the sub-segment in which the data block is stored includes a plurality of data, that is, a data amount included in the data block. Accordingly, the length of the sub-section is determined according to the data amount of the data block. For example, the length of the first sub-segment obtained by storing the first data block is determined according to the data amount of the first data block, and the length of the second sub-segment obtained by storing the second data block is determined according to the data amount of the first data block. When the input data further includes third input data, the length of the third subsegment obtained by storing the third data block is determined according to the data amount of the third data block.
Fig. 3 is a schematic diagram illustrating a method of storing data according to an embodiment of the present application.
As in the embodiment 300 shown in fig. 3, it is assumed that the acquired first input data 301 comprises 12 first data blocks arranged in a 4 x 3 array. The acquired second input data 302 comprises 12 second data blocks arranged in a 4 x 3 array, i.e. M has a value of 4 and N has a value of 3 as described above. The first input data is a first one of the plurality of input data acquired using the convolutional neural network, and the second input data is a second one of the plurality of input data acquired using the convolutional neural network. The first data block includes a data amount of C1, and the second data block includes a data amount of C2. The data amount per line in the first input data 301 is nxc 1 and the data amount per line in the second input data 302 is nxc 2.
For example, after the first input data 301 is acquired, the default start position of the DDR may be the start storage position of the first data block H01_ B0 in the first row and the first column of the 12 first data blocks. If with PoriIndicating a default starting position of the DDR, the first data block H01_ B1 of the first row and the second column may be stored to the DDR later at Pori+ Stride as the starting storage location. When the first data block H01_ B1 of the first row and the third column is stored in DDR, the data block can be stored in Pori+2 × Stride as the starting storage location. By analogy, the 12 first data blocks may be stored in the DDR at intervals. The interval between the initial storage positions between two first data blocks stored in sequence is Stride, and the interval between the initial storage positions between two first data blocks located in adjacent rows of the same column is Stride × N. The Stride is determined according to the amount of data included in the first data block being C1 and the amount of data included in the second data block being C2. For example, if each first data block and eachWhen the second data block needs two bytes for storing to the DDR, Stride is (C1+ C2) × 2. In one embodiment, the first input data 301 is stored to the DDR in the sequence shown as 303 in fig. 3.
Similarly, after the second input data is acquired, the first input data may be (P)ori+ C1 × 2) as the starting storage location of the first data block H02_ B0 of the first row and the first column of the 12 second data blocks. Subsequently, when the second data block H02_ B1 of the first row and the second column is stored to the DDR, (P) may be writtenori+ C1 × 2) + Stride as starting storage location. When the second data block H02_ B1 of the first row and the third column is stored in DDR, the data block can be (P)ori+ C1 × 2) +2 × Stride as starting storage location. And by analogy, the 12 second data blocks can be stored at intervals in the positions of the DDR where the first data blocks are stored. In one embodiment, the second input data is stored to the DDR in the sequence shown as 304 in FIG. 3. In DDR, the second data block is stored in a memory space between two adjacent first data blocks.
According to the embodiment of the application, for convenience of processing, after the first data block in the first row and the first column is stored in the DDR, for example, the end position of the first data block in the first row and the first column may be recorded as a new default start position of the DDR, so that when the second input data is stored, the new default start position is used as a start storage position of the second data block in the first row and the first column in the second input data.
Through the storage mode, 12 fields can be obtained, wherein each field comprises two subsections, the first subsection is a first data block located in the mth row and the nth column in the first input data, and the second subsection is a second data block located in the mth row and the nth column in the second input data.
According to the embodiment of the application, after the convolutional neural network is designed, the number of the data needing to be spliced in the convolutional neural network and the size of each data needing to be spliced can be obtained. That is, M, N is a known number, and the data amount of each data block included in each output data is also a known number. Therefore, after the input data is acquired by using the convolutional neural network, the data blocks in the input data can be stored into the DDR line by line according to the position of each data block in the input data and the default starting position of the DDR, that is, Stride is taken as a Stride. The interval of the initial storage positions between two data blocks in two adjacent columns in each row is Stride, and the interval of the initial storage positions between the data blocks in two adjacent rows is Stride × N.
According to the embodiment of the application, when the processor processes the image by using the convolutional neural network, the acquired input data may be a feature map obtained after the convolutional neural network processes the image.
Illustratively, the operation of acquiring the first input data may include: the method comprises the steps of acquiring a first input image, extracting a first feature map from the first input image by utilizing any processing layer (such as a convolutional layer, a pooling layer, a normalization layer or an activation layer) in a convolutional neural network, and taking the first feature map as first input data. Accordingly, the first data block in the first input data is the feature data of the first feature map. Similarly, the aforementioned operation of acquiring the second input data may include: and acquiring a second input image, extracting a second feature map from the second input image by using other processing layers (such as a convolution layer or a pooling layer) except any processing layer in the convolutional neural network, and taking the second feature map as second input data. Accordingly, the second data block in the second input data is the feature data of the second feature map. It will be appreciated that any one of the input data acquired using the convolutional neural network may be obtained in a similar manner. The characteristic data may comprise, for example, at least one of the following characteristics: texture features, shape features, spatial relationship features, and the like.
According to the embodiment of the application, when the processor processes the text by using the convolutional neural network, the acquired input data may be a feature matrix obtained after the convolutional neural network processes the text. The feature matrix is used as input data and the first data block is an element in the feature matrix. This element may be used, for example, to describe at least one of the following features of the text: text features, semantic features, dictionary features, and the like.
According to the embodiment of the application, the data storage method only needs to store the input data to be spliced into the preset space, and the whole stored data does not need to be rearranged. Thus, the operation of storing data may be performed using the processing layer of a convolutional neural network. In order to improve the processing efficiency, in the process of processing the image or the text by the processing layer, the data obtained by real-time processing can be stored in a preset storage space. Wherein the size of the feature data for the image or text resulting from the processing layer is known. And the logic of the processing layer to output data after processing the image or text is also known, so that the position of the data to be output in real time in the feature data for the image or text can be determined according to the order of the data output by the processing layer. And therefore, according to the latest initial storage position of the DDR, the data output by the processing layer in real time is used as a part of the input data acquired by the convolutional neural network, and the part of the data is stored in the DDR.
FIG. 4 is a schematic diagram of a method of storing data according to another embodiment of the present application.
According to the embodiment of the application, when the processing layer obtains the feature data for the image or the text, the feature data can be output line by line, for example. The characteristic data output by the processing layer may be the aforementioned first input data or second input data. Therefore, this embodiment may store the feature data obtained by the processing layer as the concatenated storage data line by taking i as 0, j as a value equal to or greater than 1, and updating j in a loop as described above.
Illustratively, in the embodiment 400 shown in fig. 4, the feature data 401 obtained by the processing layer for an image or text is set as first input data, and the first input data includes a plurality of first data blocks arranged in an M × N array. Wherein, the value of M can be 4, and the value of N can be 5. The processing layer may output the first input data line by line. The total amount of data total of the first input data of each row is 5 × C1, where C1 is the amount of data of each first data block.
Illustratively, after the processing layer outputs the first data Block0_0, Block0_1, Block0_2, Block0_3, and Block0_4 of the first row in the first input data, operations S401 to S405 may be sequentially performed to sequentially store the five first data blocks Block0_0, Block0_1, Block0_2, Block0_3, and Block0_4 into the DDR.
For example, the first data block of the 1 st row and the 1 st column may be stored as the first subsection of the 1 st field, and the first data block of the 1+ i th row and the 1+ j th column may be stored as the first subsection of the [1+ (i × 5+ j) ] th field. Wherein, the value of i is 0, and the value of j is 1, 2, 3 and 4 in sequence. The length of the first field is Stride, and the Stride is determined according to the sum of data amount included in each data block of all data to be spliced. For example, if r pieces of data to be concatenated have sizes of mxnxc 0, mxnxc 1, and mxnxnxcr, respectively, and a data block in each piece of data occupies two bytes after storage, Stride ═ C0+ C1+.. + Cr) × 2.
After the storage of the first data Block of the first row is completed and the processing layer outputs the first data blocks Block1_0, Block1_1, Block1_2, Block1_3, and Block1_4 of the second row, the 5 first data blocks of the second row may be sequentially stored in the DDR in a manner similar to the first data Block of the first row, through operations similar to operations S401 to S405, which are sequentially performed. Wherein the starting storage position of the 5 first data blocks is the sum of the starting storage position of the 5 first data blocks in the first row and Stride × 5. By analogy, the first data blocks of 4 rows output by the processing layer can be sequentially stored into the DDR. Wherein, the interval between the initial storage positions of the first data blocks of two adjacent rows is Stride × 5. Accordingly, the storage order of the 4 lines of first data blocks included in the first input data is shown as 402 in fig. 4, wherein the storage order of the first data blocks in the second line Lin2 is shown as 412 in fig. 4. The interval between the storage locations of two adjacent first data blocks in the same row is Stride, and the interval between the storage locations of the first row Lin1 and the second row Lin2 is Stride × 5.
According to this embodiment, after the second input data is obtained by using the convolutional neural network, each second data block included in the second input data can be stored in the storage space between two adjacent first data blocks in a similar manner.
FIG. 5 is a schematic diagram illustrating a method of storing data according to yet another embodiment of the present application.
According to the embodiment of the application, when the processing layer obtains the feature data for the image or the text, the feature data can be output column by column, for example. The characteristic data output by the processing layer may be the aforementioned first input data or second input data. Therefore, this embodiment may store the feature data obtained by the processing layer column by column as the concatenated storage data by taking i as a value equal to or greater than 1, taking j as 0, and cyclically updating the value of i as described above.
Illustratively, in the embodiment 500 shown in fig. 5, the feature data 501 obtained by the processing layer for an image or text is set as second input data, and the second input data includes a plurality of second data blocks arranged in an M × N array. Wherein, the value of M can be 4, and the value of N can be 5. The processing layer may output the second input data column by column. The total amount of data total of the second input data of each column is 5 × C2, where C2 is the amount of data of each second data block.
Illustratively, after the processing layer outputs the second data blocks Block0_0, Block1_0, Block2_0 and Block3_0 of the first column of the second input data, operations S501 to S504 may be sequentially performed to sequentially store the four second data blocks into the DDR.
For example, the second data block of the 1 st row and the 1 st column may be stored as the second subsection of the 1 st field, and the second data block of the 1+ i th row and the 1+ j th column may be stored as the second subsection of the [1+ (i × 5+ j) ] th field. Wherein j takes a value of 0, and i takes values of 1, 2 and 3 in sequence. The length of the 1 st field is Stride, and the Stride is determined according to the sum of data amount included in each data block of all data to be spliced. For example, if r pieces of data to be concatenated have sizes of mxnxc 0, mxnxc 1, and mxnxnxcr, respectively, and a data block in each piece of data occupies two bytes after storage, Stride ═ C0+ C1+.. + Cr) × 2.
After the storage of the second data Block of the first column is completed and the processing layer outputs the second data blocks Block0_1, Block1_1, Block2_1 and Block3_1 of the second column, the 4 second data blocks of the second column may be sequentially stored in the DDR in a similar manner to the aforementioned second data Block of the first column. Wherein, the starting storage position of the 4 second data blocks is the sum of the starting storage position of the 4 first data blocks in the first column and Stride. By analogy, the 5 columns of second data blocks included in the second input data may be sequentially stored into the DDR. Illustratively, the storage order of the first column of second data blocks is shown as 502 in FIG. 5, and the storage order of the second column of second data blocks is shown as 503 in FIG. 5. And the interval between the initial storage positions of the second data blocks of two adjacent columns is Stride. The interval between the storage locations of two adjacent second data blocks in the same column is Stride × 5. And each second data block included in the latter column is stored in the storage space between two adjacent first data blocks.
According to the embodiment of the present application, when there are a plurality of input data, the manner of obtaining the output data of the plurality of processing layers of the plurality of input data may be the same or different, and the embodiment does not limit this. For example, a first processing layer of the plurality of processing layers may output data in a row-by-row manner, and a second processing layer of the plurality of processing layers may output data in a column-by-column manner. Alternatively, the plurality of processing layers may output data in a row-by-row manner or in a column-by-column manner.
FIG. 6 is a schematic diagram illustrating a method of storing data according to yet another embodiment of the present application.
According to an embodiment of the present application, when feature data for an image or text is obtained, the processing layer may output the feature data, for example, on a sub-array-by-sub-array basis. The characteristic data output by the processing layer may be the aforementioned first input data or second input data. Therefore, this embodiment may store the characteristic data obtained by the processing layer as the concatenated storage data subarray by taking i as a value equal to or greater than 1, taking j as a value equal to or greater than 1, and circularly updating the values of i and j.
Exemplarily, in the embodiment 600 shown in fig. 6, the feature data 601 for an image or text obtained by the processing layer is set as first input data, and the first input data includes a plurality of first data blocks arranged in an M × N array. Wherein, the value of M can be 4, and the value of N can be 15. The first input data may be evenly divided into 12 sub-arrays in the column direction, each sub-array including 1 row and 5 columns of the first data block. The processing layer may output the first input data sub-array by sub-array. The total amount of data total of each line of the first input data is BN × 3 × C1, where C1 is the amount of data of each first data block. BN is the number of columns of the first data block included in each sub-array, and 3 is the number of sub-arrays obtained by dividing each row.
Illustratively, when i is 1, that is, each sub-array only includes a plurality of first data blocks located in the same row, after the processing layer outputs the first data blocks Block0_0, Block0_1, Block0_2, Block0_3, and Block0_ BN of the first sub-array H0_ B0 in the first input data, the five first data blocks in the first sub-array H0_ B0 may be sequentially stored in the DDR by operations S401 to S405 similar to the foregoing operations.
For example, the first data Block0_0 may be stored as the first subsection of the 1 st field, and the first data Block of the 1+ i-th row and the 1+ j-th column may be stored as the first subsection of the [1+ (i × BN × 3+ j) ] th field. Wherein j is 1-BN in sequence. The length of the first field is Stride, and the Stride is determined according to the sum of data amount included in data blocks of each data in all data to be spliced. For example, if r pieces of data to be concatenated have sizes of mxnxco, mxnxc 1, a.
After the storage of the first data block in the first sub-array H0_ B0 is completed, and the processing layer outputs the first data block of the second sub-array H0_ B1, the BN first data blocks of the second sub-array may be sequentially stored in the DDR in a manner similar to the first data block of the aforementioned first sub-array through operations similar to operations S401 to S405 performed in sequence. Wherein, the initial storage position of the first data block in the second sub-array is the sum of the initial storage position of the first data block in the first sub-array and Stride × BN. By analogy, after the first data blocks of the three sub-arrays in the first row output by the processing layer are sequentially stored in the DDR, the first data blocks of the sub-arrays in the second row, the third row and the fourth row are sequentially stored in the DDR by using a similar method. Wherein, the interval between the initial storage positions of two sub-arrays positioned in the same column in two adjacent rows is Stride × BN × 3. Illustratively, as shown in fig. 6, the arrangement order of the sub-arrays into which the first input data 601 is divided is shown as 602 in fig. 6, wherein the arrangement order of the first data blocks included in the sub-array H0_ B1 of the first row and the second column is shown as 612 in fig. 6. Therefore, the storage location interval between two adjacent first data blocks in the same sub-array is Stride. The storage position interval between two adjacent sub arrays located in the same row is Stride × BN.
According to an embodiment of the present application, the processing layer of this embodiment may output the first sub-array in the first row before outputting the other sub-arrays in the first row. And after the other sub-arrays positioned in the first row output, outputting the first sub-array in the second row. And so on, for a total of 12 sub-arrays. In the output process, the values of i and j are correspondingly changed along with the output of the sub-array.
According to the embodiment of the application, when the convolutional neural network stores the acquired input data into the DDR, a preset buffer space is required to buffer the data to be stored. The embodiment stores the first input data and/or the second input data as the spliced storage data subarray by subarray output data of the processing layer. The condition that the number of columns or rows of input data cannot be overlarge due to the size of the cache space can be avoided, and therefore the universality of the data storage method can be improved.
According to the embodiment of the application, the value size of the column number BN of the data blocks included in each sub-array may be determined according to the cache capacity of the preset cache space, so as to ensure that the preset cache space can accommodate all the data blocks in one sub-array. Therefore, the limitation of storage resources on the SOC chip can be further avoided, and the universality of the SOC chip in the aspect of neural network adaptation is improved.
Fig. 7 is a schematic diagram of a method for performing data storage using a convolutional neural network according to an embodiment of the present application.
As shown in FIG. 7, in the embodiment 700, the convolutional neural network may include, for example, an input layer 710, three convolutional layers 721 to 723, three pooling layers 731 to 733 corresponding to the three convolutional layers one by one, a full connection layer 740, and an output layer 750.
Illustratively, the input layer 710 is used for taking an image to be recognized as an input and transferring the image to be recognized to three convolutional layers 721-723. The three convolutional layers 721-723 are used for extracting feature data of the image to be recognized from three dimensions of the contour, the texture and the spatial relation, and the feature data output respectively is used as the input of the corresponding pooling layer. The pooling layers 731 to 733 can be used, for example, to perform a dimension reduction process on input data to further extract feature data.
For example, each of the pooling layers 731 to 733 may be further configured to store the data obtained by dimensionality reduction as input data into the DDR71 according to the method for storing data described above, so that the data stored to the DDR71 of the three pooling layers 731 to 733 are sorted according to the splicing order of the feature data obtained by the three pooling layers, thereby obtaining spliced stored data. The spliced storage data may form data in an array form according to the row position information and the column position information of each data block to serve as an input of the full link layer 740. The full connection layer 740 is used to analyze the probability of including a target object in the image to be recognized, etc. according to the input data, and output the probability via the output layer 750.
According to an embodiment of the present application, the three convolutional layers in this embodiment may also store the respective output data into the DDR71 by using the method for storing data described above, for example, to splice the output data of the three convolutional layers. In an embodiment, the method for storing data described above may be performed by both the convolutional layer and the pooling layer according to actual requirements, so as to splice data output by the convolutional layer and the pooling layer.
It can be understood that, according to actual requirements, the convolutional neural network may not be provided with a pooling layer, and then the convolutional layer performs feature extraction on the image to be recognized to obtain feature data, which is input data to be spliced. The convolutional layer can implement concatenation between multiple feature data obtained by multiple convolutional layers by executing the method for storing data described above. In an embodiment, according to actual requirements, an activation layer (e.g., ReLU) and/or a normalization layer (Batchnorm) may be further disposed after the convolutional layer, and the activation layer and/or the normalization layer may be disposed as a separate layer structure after the convolutional layer, or the activation function and the normalization function may be integrated into the convolutional layer. When the active layer and/or the normalization layer are configured as independent layers, the active layer or the normalization layer may also be configured to store the feature data processed by the active function or the normalization function into the DDR by performing the method for storing data described above, so as to implement concatenation between data output by multiple active layers or multiple normalization layers.
According to the embodiment of the application, in order to improve the computational efficiency of a processing layer, a parallel computing mode of a plurality of Kernel sliding windows is adopted in the related art. This way, multiple paths of input data are generated, and each path of input data is a line of data. Therefore, each row of data needs to be buffered in a preset buffer space, and multiple rows of data need to be sequentially written back to the DDR memory. However, the extremely limited storage resources of the SOC chip will limit the size support of the output size of the calculated feature data, affecting the versatility of the chip scheme in terms of neural network adaptation. Where Kernel is a piece of computational logic that operates on zhangliang. In order to implement the parallel computation mode, generally, a large convolution needs to be split into a plurality of sub-convolutions, the processor schedules the ZYNQ MPSC to compute the plurality of sub-convolutions in sequence, and the computed data is written back to the DDR storage.
By adopting the data storage method provided by the application, when the convolution is divided into a plurality of sub-convolutions, the data obtained by calculating the three sub-convolution products can be skipped to the DDR, so that the data splicing operation is fused in the calculation stage. Therefore, the processing flow of skipping the data from the convolution layer to the DDR can be completed after each sub-convolution completes the calculation, and the whole process does not need a CPU to additionally initiate memory operation. Accordingly, the first input data and the second input data obtained as described above may be data calculated for different sub-convolution layers in the same convolution layer. Therefore, the embodiment can store the first data block in the first input data and the second data block in the second input data to the DDR through the same convolutional layer of the convolutional neural network to be stored as the subsegment of the field, so as to obtain the concatenated storage data.
According to the embodiment of the application, the method for storing data can be realized by a Direct Memory Access (DMA) function of a ZYNQ MPSOC chip. The overall idea of the present application can be understood as that one feature graph is scattered to write back the DDR according to the minimum data block granularity of the included feature data, and for the storage space of the DDR, the convolution calculation result is distributed in a discrete manner.
In summary, the method for storing data in the embodiment of the application adopts a data transmission mode of writing data in a small granularity and in a jumping manner, so that the DDR bandwidth transmission efficiency is not affected. In addition, the time required by rearrangement migration operation during additional data splicing can be saved. In particular, the higher the dimensionality of the data to be spliced, the higher the time saved in the total neural network prediction time. Therefore, the prediction performance of the chip can be effectively improved, and the improvement degree of the processing efficiency can reach 20-30% as determined by engineering practice.
Fig. 8 is a block diagram of an apparatus for storing data according to an embodiment of the present application.
As shown in fig. 8, the apparatus 800 for storing data of this embodiment may include a first data obtaining module 810, a first data storing module 820, a second data obtaining module 830, and a second data storing module 840.
The first data acquisition module 810 is for acquiring first input data using a convolutional neural network, the first input data including a plurality of first data blocks arranged in a first mxn array. In an embodiment, the first data obtaining module 810 may be configured to perform the operation S210 described in fig. 2, for example, and is not described herein again.
The first data storage module 820 is configured to store a first data block in the mth row and nth column as a first sub-segment of the kth field, and store a first data block in the (m + i) th row and (N + j) th column as a first sub-segment of the [ k + (i × N + j) ] th field. Wherein k is mxn, M, N, M and N are positive integers, i and j are non-negative integers, M is not less than (M + i) and not more than M, and N is not less than (N + j) and not more than M. In an embodiment, the first data storage module 820 may be used to perform operations S220 to S230 described in fig. 2, for example, and will not be described herein again.
The second data acquisition module 830 is configured to acquire second input data using a convolutional neural network, the second input data including a plurality of second data blocks arranged in a second M × N array. In an embodiment, the second data obtaining module 830 may be configured to perform the operation S240 described in fig. 2, for example, and is not described herein again.
The second data storage module 840 is configured to store the second data block in the mth row and nth column as the second subsegment of the kth field, and store the second data block in the (m + i) th row and (N + j) th column as the second subsegment of the [ K + (i × N + j) ] th field, so as to obtain the concatenated storage data including the first field to the kth field. Wherein K is mxn. In an embodiment, the second data storage module 840 may be configured to perform operations S250 to S260 described in fig. 2, for example, and will not be described herein again.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
Fig. 9 is a block diagram of an electronic device for implementing the method for storing data according to the embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 9, the electronic device 900 includes: one or more processors 901, memory 902, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 9 illustrates an example of a processor 901.
Memory 902 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the method of storing data provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method of storing data provided herein.
Memory 902, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the method of storing data in the embodiments of the present application (e.g., first data acquisition module 810, first data storage module 820, second data acquisition module 830, and second data storage module 840 shown in fig. 8). The processor 901 executes various functional applications of the server and data processing, i.e., a method of storing data in the above-described method embodiments, by executing non-transitory software programs, instructions, and modules stored in the memory 902.
The memory 902 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of an electronic device implementing the method of storing data, and the like. Further, the memory 902 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 902 may optionally include memory located remotely from the processor 901, which may be connected via a network to an electronic device operable to implement the method of storing data. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device to implement the method of storing data may further include: an input device 903 and an output device 904. The processor 901, the memory 902, the input device 903 and the output device 904 may be connected by a bus or other means, and fig. 9 illustrates the connection by a bus as an example.
The input device 903 may receive input numeric or character information and generate key signal inputs related to user settings and function control of an electronic apparatus for implementing the method of storing data, such as an input device of a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, etc. The output devices 904 may include a display device, auxiliary lighting devices (e.g., LEDs), tactile feedback devices (e.g., vibrating motors), and the like. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
According to the technical scheme of the embodiment of the application, the plurality of input data acquired by the convolutional neural network are stored according to a data storage method, so that the data acquired by storage is the data acquired by splicing the Concat layers in the related technology. Compared with the technical scheme of rearranging and writing data by adopting the Concat layer in the related art, the method does not need to carry out relocation and rearrangement operation on the stored data, so that a processor (such as a CPU) does not need to additionally initiate memory operation. Therefore, the efficiency of the convolutional neural network for processing data can be effectively improved.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (14)

1. A method of storing data, comprising:
obtaining first input data using a convolutional neural network, the first input data comprising a plurality of first data blocks arranged in a first mxn array;
storing a first data block of the mth row and the nth column as a first subsegment of the kth field; storing a first data block of the (m + i) th row and the (N + j) th column as a first subsection of the [ k + (i × N + j) ] th field;
obtaining second input data using the convolutional neural network, the second input data comprising a plurality of second data blocks arranged in a second mxn array;
storing a second data block of the mth row and the nth column as a second subsegment of the kth field; storing a second data block of the (m + i) th row and the (N + j) th column as a second subsection of the [ K + (i × N + j) ] th field to obtain spliced storage data comprising the first field to the Kth field,
wherein K is mxn, K is mxn; m, N, M and N are positive integers, i and j are non-negative integers, M is more than or equal to (M + i) and less than or equal to M, and N is more than or equal to (N + j) and less than or equal to M.
2. The method of claim 1, wherein a length of the first subsegment is determined according to a data amount of a first data block, and a length of the second subsegment is determined according to a data amount of a second data block.
3. The method of claim 1, further comprising:
obtaining third input data using a convolutional neural network, the third input data comprising a plurality of first data blocks arranged in a third mxn array; and
storing a third data block of the mth row and the nth column as a third subsection of the kth field; storing a third data block of the (m + i) th row and the (N + j) th column as a third subsection of the [ k + (i × N + j) ] th field,
wherein the length of the third sub-segment is determined according to the data amount of the third data block.
4. The method of claim 1, wherein j ≧ 1, at least one of the first input data and the second input data is stored as the concatenated storage data subarraywise, the subarray being a subarray of 1 xBN evenly divided in a column dimension according to an M x N array, where BN is a positive integer and N is a multiple of BN, the multiple being a positive integer.
5. The method of claim 4, wherein the size of the BN is determined according to a buffer capacity of a preset buffer space, the preset buffer space being used for buffering the sub-array.
6. The method of claim 1, wherein i ≧ 0, j ≧ 1, at least one of the first input data and the second input data is stored row-by-row as concatenated storage data.
7. The method of claim 1, wherein i ≧ 1, j ═ 0, at least one of the first input data and the second input data is stored column-by-column as concatenated storage data.
8. The method of claim 1, wherein:
storing the first data block of the mth row and nth column as a first subsegment of a kth field and storing the first data block of the (m + i) th row and (N + j) th column as a first subsegment of a [ k + (i × N + j) ] th field by at least one of convolutional layers and pooling layers of a convolutional neural network.
9. The method of claim 1, wherein:
and storing the first data block of the mth row and the nth column as the first subsegment of the kth field through the same convolutional layer of the convolutional neural network, and storing the first data block of the mth row and the nth column as the second subsegment of the kth field.
10. The method of claim 1, wherein:
the acquiring first input data comprises: acquiring a first input image, and extracting a first feature map from the first input image as the first input data; the first data block is characteristic data of the first characteristic diagram;
the acquiring of the second input data comprises: acquiring a second input image, and extracting a second feature map from the second input image as second input data; the second data block is the feature data of the second feature map.
11. The method of claim 1, wherein:
the first data block includes row position information and column position information in the first array and first pixel information;
the second data block includes row position information and column position information in the second array and second pixel information.
12. An apparatus for storing data, comprising:
a first data acquisition module to acquire first input data using a convolutional neural network, the first input data comprising a plurality of first data blocks arranged in a first mxn array;
the first data storage module is used for storing a first data block of the mth row and the nth column as a first subsegment of the kth field; storing a first data block of the (m + i) th row and the (N + j) th column as a first subsection of the [ k + (i × N + j) ] th field;
a second data acquisition module for acquiring second input data using the convolutional neural network, the second input data comprising a plurality of second data blocks arranged in a second mxn array; and
the second data storage module is used for storing a second data block of the mth row and the nth column as a second subsegment of the kth field; storing a second data block of the (m + i) th row and the (N + j) th column as a second subsection of the [ K + (i × N + j) ] th field to obtain spliced storage data comprising the first field to the Kth field,
wherein K is mxn, K is mxn; m, N, M and N are positive integers, i and j are non-negative integers, M is more than or equal to (M + i) and less than or equal to M, and N is more than or equal to (N + j) and less than or equal to M.
13. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-11.
14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any of claims 1-11.
CN202011069674.9A 2020-09-29 2020-09-29 Method, device, equipment and storage medium for storing data Active CN112099737B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011069674.9A CN112099737B (en) 2020-09-29 2020-09-29 Method, device, equipment and storage medium for storing data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011069674.9A CN112099737B (en) 2020-09-29 2020-09-29 Method, device, equipment and storage medium for storing data

Publications (2)

Publication Number Publication Date
CN112099737A true CN112099737A (en) 2020-12-18
CN112099737B CN112099737B (en) 2023-09-01

Family

ID=73783998

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011069674.9A Active CN112099737B (en) 2020-09-29 2020-09-29 Method, device, equipment and storage medium for storing data

Country Status (1)

Country Link
CN (1) CN112099737B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108133270A (en) * 2018-01-12 2018-06-08 清华大学 Convolutional neural networks accelerating method and device
CN108765247A (en) * 2018-05-15 2018-11-06 腾讯科技(深圳)有限公司 Image processing method, device, storage medium and equipment
CN110309906A (en) * 2019-05-23 2019-10-08 北京百度网讯科技有限公司 Image processing method, device, machine readable storage medium and processor
US20200090030A1 (en) * 2018-09-19 2020-03-19 British Cayman Islands Intelligo Technology Inc. Integrated circuit for convolution calculation in deep neural network and method thereof
WO2020164271A1 (en) * 2019-02-13 2020-08-20 平安科技(深圳)有限公司 Pooling method and device for convolutional neural network, storage medium and computer device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108133270A (en) * 2018-01-12 2018-06-08 清华大学 Convolutional neural networks accelerating method and device
CN108765247A (en) * 2018-05-15 2018-11-06 腾讯科技(深圳)有限公司 Image processing method, device, storage medium and equipment
US20200090030A1 (en) * 2018-09-19 2020-03-19 British Cayman Islands Intelligo Technology Inc. Integrated circuit for convolution calculation in deep neural network and method thereof
WO2020164271A1 (en) * 2019-02-13 2020-08-20 平安科技(深圳)有限公司 Pooling method and device for convolutional neural network, storage medium and computer device
CN110309906A (en) * 2019-05-23 2019-10-08 北京百度网讯科技有限公司 Image processing method, device, machine readable storage medium and processor

Also Published As

Publication number Publication date
CN112099737B (en) 2023-09-01

Similar Documents

Publication Publication Date Title
KR102572705B1 (en) Scalable Neural Network Processing Engine
KR20200069353A (en) Machine learning runtime library for neural network acceleration
EP3557425A1 (en) Accelerator and system for accelerating operations
US11157764B2 (en) Semantic image segmentation using gated dense pyramid blocks
CN114365185A (en) Generating images using one or more neural networks
US9953044B2 (en) Radix sort acceleration using custom ASIC
US11295236B2 (en) Machine learning in heterogeneous processing systems
CN111935506B (en) Method and apparatus for determining repeating video frames
US11521007B2 (en) Accelerator resource utilization by neural networks
CN110569972A (en) search space construction method and device of hyper network and electronic equipment
US11704894B2 (en) Semantic image segmentation using gated dense pyramid blocks
US11557047B2 (en) Method and apparatus for image processing and computer storage medium
WO2022041850A1 (en) Methods and apparatuses for coalescing function calls for ray-tracing
US20200090046A1 (en) System and method for cascaded dynamic max pooling in neural networks
US11748100B2 (en) Processing in memory methods for convolutional operations
CN114817845B (en) Data processing method, device, electronic equipment and storage medium
CN112099737B (en) Method, device, equipment and storage medium for storing data
US20230073661A1 (en) Accelerating data load and computation in frontend convolutional layer
CN112560928B (en) Negative sample mining method and device, electronic equipment and storage medium
US11676068B1 (en) Method, product, and apparatus for a machine learning process leveraging input sparsity on a pixel by pixel basis
US11315035B2 (en) Machine learning in heterogeneous processing systems
CN113095493A (en) System and method for reducing memory requirements in a neural network
US11657324B2 (en) Method, electronic device, and computer program product for processing data
US20230259467A1 (en) Direct memory access (dma) engine processing data transfer tasks in parallel
CN111507267B (en) Document orientation detection method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant