WO2020134546A1

WO2020134546A1 - Neural network processor, convolutional neural network data multiplexing method and related device

Info

Publication number: WO2020134546A1
Application number: PCT/CN2019/114725
Authority: WO
Inventors: 李炜; 曹庆新
Original assignee: 深圳云天励飞技术有限公司
Priority date: 2018-12-27
Filing date: 2019-10-31
Publication date: 2020-07-02
Also published as: CN109740732B; CN109740732A

Abstract

A neural network processor, a convolutional neural network data multiplexing method and apparatus, an electronic device and a storage medium. The neural network processor comprises: a storage circuit (10) for storing initial input data and a weight value required for performing a convolution operation; and at least one computing circuit (20), comprising: a data buffer (200) for buffering the initial input data; a weight buffer (202) for buffering the weight value; a convolution operation unit (204) for performing, according to the initial input data and the weight value, the convolution operation in the current layer of convolutional neural network to obtain a plurality of first convolution results, accumulating the first convolution results having a correlation therebetween to obtain a plurality of second convolution results, while also deleting the plurality of first convolution results; and a result buffer (206) for buffering the plurality of second convolution results as the initial input data of the next layer of convolutional neural network. According to the processor, by means of multiple layers of data multiplexing, the operation speed of the neural network processor is increased, and the power consumption is reduced.

Description

Neural network processor, convolutional neural network data multiplexing method and related equipment

Technical field

[0001] The present invention relates to the technical field of artificial intelligence, and in particular to a neural network processor, a convolutional neural network data multiplexing method, a convolutional neural network data multiplexing device, an electronic device, and a storage medium.

[0002] This application requires the Chinese patent application filed on December 27, 2018 with the application number 201811614780.3 and the invention titled "Neural Network Processor, Convolutional Neural Network Data Multiplexing Method and Related Equipment". Rights, the entire contents of which are incorporated by reference in this application.

Background technique

[0003] One of the most commonly used models in neural network processors is the convolutional neural network model. However, the convolutional neural network model has a series of problems such as slow speed and large power consumption when performing operations. Therefore, how to improve the operation speed of the convolutional neural network model in the neural network processor and reduce power consumption has become a technical problem to be solved urgently.

Summary of the invention

technical problem

Solution to the problem

Technical solution

[0004] In view of the above, it is necessary to propose a neural network processor, a convolutional neural network data multiplexing method, a convolutional neural network data multiplexing device, an electronic device, and a storage medium to improve neural network processing by multiplexing data The speed of the processor and reduce the power consumption of the neural network processor.

[0005] A first aspect of the present invention provides a neural network processor, the neural network processor includes:

[0006] a storage circuit for storing initial input data and weight values required for convolution operation;

[0007] at least one calculation circuit for reading the initial input data and the weight value from the storage circuit, and performing a convolution operation based on the initial input data and the weight value, wherein,

[0008] The at least one calculation circuit includes: [0009] a data buffer for buffering the initial input data read by the calculation circuit;

[0010] a weight buffer for buffering the weight value read by the calculation circuit;

[0011] A convolution operator is used to perform a convolution operation according to the initial input data and the weight value in the current layer convolutional neural network to obtain a plurality of first convolution results, and the corresponding relationship After accumulating the first convolution results, a plurality of second convolution results are obtained; meanwhile, after accumulating all the first convolution results having a corresponding relationship, the plurality of first convolution results are deleted;

[0012] A result buffer, configured to cache the plurality of second convolution results, and send the plurality of second convolution results to the data buffer according to a preset storage rule as a next-level volume The initial input data of the product neural network; or, sent to the storage circuit for storage.

[0013] Preferably, the preset storage rule includes:

[0014] When the current layer of convolutional neural network is not the last layer of convolutional neural network, the result buffer determines the plurality of second convolution results as intermediate convolution results, and the The intermediate convolution result is sent to the data buffer;

[0015] When the current layer of convolutional neural network is the last layer of convolutional neural network, the result buffer determines the plurality of second convolution results as the final convolution result, and the final The convolution result is sent to the storage circuit.

[0016] Preferably, the convolution operator performs a convolution operation according to the initial input data and the weight value in the current layer convolutional neural network to obtain a plurality of first convolution results including:

[0017] performing a convolution operation on the Qth line data of the initial input data and the Lth line data of the preset convolution kernel, and the corresponding data is the sub-line of the Q-L+1 line of the third convolution result Data

[0018] accumulating all the sub-data in the Q-L+1 line to obtain the data in the Q-L+1 line;

[0019] performing a convolution operation according to the third convolution result and the weight value to obtain a plurality of the first convolution results

[0020] Wherein, the value range of Q is 1 to M, M is the total number of lines of the initial input data, the value range of L is 1 to N, and N is the total number of lines of the preset convolution kernel.

[0021] Preferably, the data of the Qth line of the initial input data and each line of data of the preset convolution kernel are respectively subjected to a convolution operation, and when the data of the Qth line and the After all the row data of the convolution kernel are subjected to the convolution operation, the data of the Qth row of the initial input data is deleted until all the data The initial input data is deleted.

[0022] A second aspect of the present invention provides a convolutional neural network data multiplexing method, which is applied to an electronic device, where the electronic device includes the neural network processor described above, and the method includes:

[0023] storing the initial input data and the weight value required for the convolution operation through the storage circuit;

[0024] controlling the at least one calculation circuit to perform a convolution operation according to the initial input data and the weight value in the current layer convolutional neural network to obtain a plurality of first convolution results, and convert the corresponding After accumulating one convolution result, multiple second convolution results are obtained;

[0025] After accumulating all the first convolution results having a corresponding relationship, controlling the at least one calculation circuit to delete the plurality of first convolution results;

[0026] When the current layer of convolutional neural network is not the last layer of convolutional neural network, the plurality of second convolution results are determined as intermediate convolution results, and the intermediate convolution results Sent to the at least one calculation circuit for buffering as the initial input data of the convolutional neural network of the next layer;

[0027] When the current layer of convolutional neural network is the last layer of convolutional neural network, the plurality of second convolution results are determined as final convolution results, and the final convolution The result is sent to the storage circuit.

[0028] Preferably, controlling the at least one calculation circuit to perform a convolution operation according to the initial input data and the weight value in the current layer convolutional neural network to obtain a plurality of first convolution results includes:

[0029] Perform a convolution operation on the Qth line data of the initial input data and the Lth line data of the preset convolution kernel, and the corresponding data is the sub-line of the Q-L+1 line of the third convolution result Data

[0030] accumulate all the sub-data in the Q-L+1 line to obtain the data in the Q-L+1 line;

[0031] performing a convolution operation according to the third convolution result and the weight value to obtain a plurality of the first convolution results

[0032] Wherein, the value range of Q is 1 to M, M is the total number of lines of the initial input data, the value range of L is 1 to N, and N is the total number of lines of the preset convolution kernel.

[0033] Preferably, the data of the Qth line of the initial input data and each line of data of the preset convolution kernel are respectively subjected to a convolution operation, and when the data of the Qth line and the After all the row data of the convolution kernel are subjected to a convolution operation, the Q-th row data of the initial input data is deleted until the initial input data is deleted. [0034] A third aspect of the present invention provides a convolutional neural network data multiplexing device, installed in an electronic device, the electronic device includes the neural network processor described above, the device includes:

[0035] a storage module for storing, through the storage circuit, initial input data and weight values required for convolution operation;

[0036] The convolution operation module is configured to control the at least one calculation circuit to perform a convolution operation according to the initial input data and the weight value in the current layer convolutional neural network to obtain a plurality of first convolution results, After accumulating the first convolution results with a corresponding relationship, multiple second convolution results are obtained;

[0037] a deletion module, configured to control the at least one calculation circuit to delete the plurality of first convolution results after accumulating all the first convolution results having a corresponding relationship;

[0038] a first determining module, configured to determine the plurality of second convolution results as intermediate convolution results when the current layer convolutional neural network is not the last layer of convolutional neural networks, and Sending the intermediate convolution result to the at least one calculation circuit for buffering as the initial input data of the next layer of convolutional neural network;

[0039] a second determination module, configured to determine the plurality of second convolution results as final convolution results when the current layer convolutional neural network is the last layer convolutional neural network, and Sending the final convolution result to the storage circuit.

[0040] A fourth aspect of the present invention provides an electronic device, the electronic device includes a processor, and the processor is configured to implement the convolutional neural network data multiplexing method when executing an arithmetic program stored in a memory.

[0041] A fifth aspect of the present invention provides a computing machine readable storage medium, the computing machine readable storage medium stores a computing machine program, and the computing machine program is executed by a processor to implement the convolutional nerve Network data multiplexing method.

[0042] The present invention reads out the initial input data and the weight value from the storage circuit for the first convolution operation through at least one calculation circuit for the first time, and realizes the same initial input data and weight value in different calculation circuits. One-time data multiplexing; by accumulating the first convolution result with multiple other first convolution results with corresponding relationships, the second data replication of the same first convolution result in the same calculation circuit is realized Use; By accumulating multiple first convolution results with corresponding relationship to obtain the second convolution result, as the initial input data of the next layer of convolutional neural network, to achieve the convolutional neural network between layers The third data multiplexing. That is, through three times of data multiplexing, the data utilization rate is improved and the number is reduced. According to the number of visits, the calculation speed of the calculation circuit is increased, and the power consumption of the neural network processor is reduced.

[0043] Secondly, by performing a convolution operation on each row of data in the initial input data and the entire convolution kernel, the fourth multiplexing of each row of data on the initial input data is achieved, which can further improve the data utilization rate, The number of data accesses is reduced, thereby further increasing the calculation speed of the calculation circuit and reducing the power consumption of the neural network processor.

[0044] Again, after accumulating all the first convolution results with a corresponding relationship, the first convolution result is deleted, saving the storage space of the storage circuit; at a row of data of the initial input data and the convolution kernel After the convolution operation is completed, deleting the row of data of the initial input data can further save the storage space of the storage circuit.

[0045] In addition, in the case where a plurality of calculation circuits operate in parallel, the efficiency of parallel calculation can be improved.

Beneficial effects of invention

Brief description of the drawings

BRIEF DESCRIPTION

[0046] In order to more clearly explain the embodiments of the present invention or the technical solutions in the prior art, the following will briefly introduce the drawings needed to be used in the technical description of the embodiment or the fifth embodiment. Obviously, the following description The drawings are only embodiments of the present invention. For those of ordinary skill in the art, without paying any creative labor, other drawings may be obtained according to the provided drawings.

[0047] FIG. 1 is a schematic diagram of a neural network processor provided by a preferred embodiment of the present invention.

[0048] FIG. 2 is a schematic diagram of another neural network processor provided by an embodiment of the present invention.

[0049] FIG. 3 is a schematic diagram of data multiplexing when performing a convolution operation according to a preferred embodiment of the present invention.

[0050] FIG. 4 is a schematic flowchart of a data multiplexing method of a convolutional neural network according to a preferred embodiment of the present invention.

[0051] FIG. 5 is a structural diagram of a convolutional neural network data multiplexing device according to a preferred embodiment of the present invention.

6 is a schematic diagram of an electronic device provided by a preferred embodiment of the present invention.

Invention Example

Embodiments of the invention

[0053] The following describes the embodiments of the present application in detail. [0054] Embodiment One

[0055] Please refer to FIG. 1 and FIG. 2 at the same time, which is a schematic diagram of a neural network processor provided by an embodiment of the present invention.

[0056] In this embodiment, the neural network processor 1 may include: a storage circuit 10, at least one calculation circuit 20, wherein the calculation circuit 20 is connected to the storage circuit 10. The neural network processor 1 may be a programmable logic device, such as a field programmable logic array (Field Programmable Gate)

Array, FPGA), can also be a dedicated neural network processor (Application Specific Integrated Circuits, ASIC).

[0057] The number of calculation circuits 20 can be set according to the actual situation, and the number of calculation circuits required can be comprehensively considered according to the entire calculation amount and the calculation amount that each calculation circuit can process. For example, FIG. 1 shows Two parallel computing circuits 20.

[0058] In this embodiment, the neural network processor 1 is used to store the user-configured initial input data and weight values required for convolution operation in the storage circuit 10, through at least one calculation circuit 20 from The storage circuit 10 reads the initial input data and the weight value and performs a convolution operation based on the initial input data and the weight value.

[0059] Since the initial input data and weight values required for performing the convolution operation are collectively stored in the storage circuit 10, when there are a plurality of calculation circuits 20, the initial input data can be read from the storage circuit 10 synchronously And weight values. In this way, the initial input data and the weight value can be multiplexed, so as to reduce the number of data access times and reduce the power consumption of the processor.

[0060] In this embodiment, the plurality of calculation circuits 20 may form an operation array, and the plurality of calculation circuits 20 simultaneously read from the storage circuit 10 the initial input data and weight values required for convolution operation, and Perform convolution operations in parallel. The convolutional neural network model uses a fully connected method for operation

[0061] In this embodiment, the calculation circuit 20 may also pre-store the number of channels required for convolution operation

, Picture size and other parameters.

[0062] In this embodiment, the storage circuit 10 may include: a data memory 100 and a weight memory 102.

[0063] The data memory 100 is used to store initial input data required for performing convolution operations. The initial input data may be an input feature map, which participates in the calculation as the initial data. The data memory 100 can also be used to store the final convolution result calculated by the at least one calculation circuit 20. [0064] The weight memory 102 is used to store weight values required for performing convolution operations.

[0065] In this embodiment, the calculation circuit 20 may include: a data buffer 200, a weight buffer 202, a convolution operator 204, and a result buffer 206, wherein the result buffer 206 is also connected to all The data memory 100 in the storage circuit 10 and the data buffer 200 in the calculation circuit 20 are described.

[0066] The data buffer 200 is used to buffer the initial input data read by the calculation circuit 20 from the data memory. The initial input data has two sources: one is that the calculation circuit 20 reads from the data memory 100, and the other is the intermediate convolution result obtained by the calculation circuit 20, and the intermediate convolution result is determined by all The result buffer 206 is returned to the data storage 200 as the initial input data of the next layer of convolutional neural network. Each data buffer 200 can store multiple initial input data simultaneously. In other embodiments, each data buffer 200 may only store one initial input data. After performing a convolution operation on the initial input data to obtain all the convolution results, it can be deleted. All the convolution results are accumulated and then cached in the data buffer 200 as new initial input data. Therefore, the data buffer 200 is similar to First In First Out (FIFO). Cache.

[0067] The weight buffer 202 is used to cache the weight value read by the calculation circuit 20 from the weight memory 102.

[0068] The convolution operator 204 is configured to perform convolution in the current layer convolutional neural network according to the initial input data in the data buffer 200 and the weight value in the weight buffer 203 Multiple first convolution results are obtained by product operation, and a plurality of second convolution results are obtained by accumulating the first convolution results having corresponding relationships; at the same time, when all the first convolution results having corresponding relationships After the accumulation results are accumulated, the multiple first convolution results are deleted.

[0069] For the process of the convolution operation performed by the convolution operator 204, refer to FIG. 3 and its related description.

[0070] The result buffer 206 is used to cache the plurality of second convolution results and send the plurality of second convolution results to the data buffer 200 according to a preset storage rule as the following The initial input data of a layer of convolutional neural network; or, sent to the storage circuit 10 for storage. Different calculation circuits 20 use the same initial input data to perform convolution operations to obtain different convolution results. Therefore, different convolution results are stored in the result buffer 206 of different calculation circuits 20. The result buffer 206 of each calculation circuit 20 may also store multiple convolution results at the same time. [0071] In this embodiment, the preset storage rule is a preset storage rule, and the preset storage rule may include:

[0072] When the current layer of convolutional neural network is not the last layer of convolutional neural network, the result buffer 206 determines the plurality of second convolution results as intermediate convolution results, and The intermediate convolution result is sent to the data buffer 200;

[0073] When the current layer of convolutional neural network is the last layer of convolutional neural network, the result buffer 206 determines the plurality of second convolution results as the final convolution result, and the The final convolution result is sent to the storage circuit 10.

[0074] In the process of performing the convolution operation, the output of the convolutional neural network of the previous layer is usually used as the input of the convolutional neural network of the next layer, that is, the output of the first layer of the convolutional neural network is used as the second layer of convolution The input of the convolutional neural network, the output of the second layer of the convolutional neural network is used as the input of the third layer of the convolutional neural network, and so on, until the last layer of the convolutional neural network outputs the convolution result. If it is not the last layer of convolutional neural network, the result buffer 206 directly buffers the intermediate convolution result to the data buffer 200 corresponding to the calculation circuit 20 as the initial input data of the next layer of convolutional neural network Perform the convolution operation. If it is the last layer of convolutional neural network, the result buffer 206 sends the final convolution result to the data memory 100 in the storage circuit 10 for storage.

[0075] The following describes the data processing procedure of the neural network processor 1 provided by an embodiment of the present invention with reference to the schematic diagram shown in FIG. 2.

[0076] Exemplarily, it is assumed that the initial input data stored in the storage circuit 10 required for convolution operation is represented by CiO, and the weight value is represented by Weight, where the initial input data CiO is stored in the data memory In 100, the weight value Weight is stored in the weight memory 102.

[0077] In the first step, the storage circuit 10 broadcasts to all calculation circuits 20 (denoted by PE in the figure). After each computing circuit 20 receives the broadcast signal, it synchronously reads the initial input data CiO from the data memory 100 and buffers it to the data buffer 200; meanwhile, each computing circuit 20 also synchronizes the weights The weight value Weight is read from the memory 102 and cached in the weight buffer 202.

[0078] The convolution operator 204 (denoted by MAC in the figure) of each calculation circuit 20 is based on the initial input data CiO in the corresponding data memory 200 (denoted by IBUF in the figure) and the weight value in the corresponding weight memory 202 Weight performs the convolution operation of the first layer convolutional neural network to obtain the first layer convolution result CoO, And cache the first layer convolution result CoO into the result buffer 206. Since the convolution operation result CoO obtained in the first operation is not the final layer convolution operation result, the result buffer 206 (denoted by OBUF in the figure) returns the first layer convolution operation result CoO to The data buffer 200 of the calculation circuit 20 performs buffering as the initial input data Cil of the second layer convolutional neural network.

[0079] In the second step, the calculation circuit 20 reads the initial input data Cil from the data buffer 200 synchronously; the convolution operator 204 according to the initial input data Cil in the corresponding data memory 200 and the corresponding weight in the weight memory 202 Value to perform the convolution operation of the second layer convolutional neural network to obtain the convolution result Col of the second layer, and cache the convolution result Col of the first layer into the result buffer 206. The result buffer 206 returns the convolution operation result Col of the second layer to the data buffer 200 of the calculation circuit 20 for buffering as the initial input data Ci2 of the third layer convolutional neural network.

[0080] and so on.

[0081] In the last step, each calculation circuit 20 performs the convolution operation of the final layer convolutional neural network according to the convolution result and the weight value obtained in the penultimate step operation to obtain the final convolution result, and converts the final The convolution result is sent to the data memory 100 in the storage circuit 10 for storage.

[0082] It should be noted that, since the convolutional neural network model in this embodiment adopts a fully connected mode, each computing circuit 20 performs an initial operation in the data buffer 200 during the convolution operation. Input data, after performing convolution operations in the current layer of convolutional neural network, multiple first convolution results will be obtained, and the first convolution results with corresponding relationships (for example, the same neuron) are accumulated and obtained Multiple second convolution results. After accumulating all the first convolution results with a corresponding relationship, the initial input data may be deleted. Until the final layer of convolutional neural network completes the convolution operation, the final convolution result is obtained.

[0083] In this embodiment, to facilitate correspondence with the initial input data as the input feature map, the convolution result is also referred to as an output feature map. The above embodiment illustrates the data processing process of the neural network processor 1, in which there are three levels of data multiplexing process. Through these three levels of data multiplexing, the parallelism of the neural network processor can be greatly improved, and the power consumption of the entire processor can be effectively reduced.

[0084] The following specifically describes three levels of data multiplexing:

[0085] The first level of data multiplexing: each calculation circuit 20 reads the initial input data and the weight value from the storage circuit 10 for the first time to complete the convolution operation in the first layer of convolutional neural network, so that The same beginning The input data and weight values are first multiplexed in different calculation circuits 20 for the first time.

[0086] The second level of data multiplexing: The result buffer 206 of each calculation circuit 20 can store multiple multiple first convolution results at the same time, and the first convolution result and multiple other first relationships having a corresponding relationship The convolution results are accumulated to realize the second data multiplexing of the same first convolution result in the same calculation circuit 20

[0087] The third level of data multiplexing: all the convolution results (including the intermediate convolution result and the final convolution result) can be all cached in the result buffer 206, if the second convolution result is intermediate , The result buffer 206 directly returns the intermediate convolution result to the data buffer 200 for buffering and serves as the initial input data of the next layer of convolutional neural network. That is, by accumulating a plurality of corresponding first convolution results to obtain a second convolution result, as the initial input data of the convolutional neural network of the next layer, the convolutional neural network layer to layer is realized. The third data multiplexing.

[0088] The above three levels of data multiplexing can be shown in FIG. 1 and FIG. 2, the embodiment of the present invention also proposes a fourth level of data multiplexing scheme, through the fourth level of data multiplexing, the operation is further optimized The degree of parallelism improves the computational efficiency and data utilization rate of the convolution operator. Refer to the schematic diagram shown in Figure 3 below for details of the fourth level data multiplexing process.

[0089] FIG. 3 is a schematic diagram of a process in which the convolution operator uses a certain initial input data to calculate the corresponding convolution result. The left side of Fig. 3 is the convolution kernel, the middle part of Fig. 3 is the initial input data, and the right side of Fig. 3 is the corresponding convolution result.

[0090] The convolution operator performs a convolution operation according to the initial input data and the weight value in the current layer convolutional neural network to obtain a plurality of first convolution results including:

[0091] Perform a convolution operation on the Qth line data of the initial input data and the Lth line data of the preset convolution kernel, and the corresponding obtained data is the child of the Q-L+1 line of the third convolution result Data

[0092] accumulate all the sub-data in the Q-L+1 line to obtain the data in the Q-L+1 line;

[0093] performing a convolution operation according to the third convolution result and the weight value to obtain a plurality of the first convolution results

[0094] Wherein, the value range of Q is 1 to M, M is the total number of lines of the initial input data, the value range of L is 1 to N, and N is the total number of lines of the preset convolution kernel.

[0095] The data of the Qth line of the initial input data and each line of data of the preset convolution kernel Row convolution operation, and after the row Q data and all row data of the preset convolution kernel have been convolved, delete the row Q data of the initial input data until The initial input data is deleted.

[0096] Exemplarily, the convolution operator 204 of each calculation circuit 20 uses a 3*3 convolution kernel. The convolution kernel slides from the left to the right of the initial input data, and then from the top to the bottom of the initial input data. During the sliding process, the multiply-accumulate operation is performed to obtain the convolution result of the corresponding position.

[0097] When the convolution kernel slides to the position 1 shown in FIG. 3 (that is, the convolution kernel slides to the m-2th, m-1, and mth rows of the initial input data), where The w6, w7, and w8 in the convolution kernel need to perform a convolution operation on the data of the mth row, and the obtained data corresponds to the m-2th result of the convolution result.

[0098] When the convolution kernel slides to the position 2 shown in FIG. 3 (that is, the convolution kernel slides to the m-1th, mth, and m+1th rows of the initial input data), where The w3, w4, and w6 in the convolution kernel need to perform a convolution operation on the data of the mth row, and the obtained data corresponds to the result of the m-1th row of the convolution result.

[0099] When the convolution kernel slides to position 3 as shown in FIG. 3 (ie, the convolution kernel slides to the mth, m+1, and m+2th rows of the initial input data, where the volume The wl, w2, and w3 in the product kernel need to be convoluted with the data of the mth row, and the obtained data corresponds to the result of the mth row of the convolution result.

[0100] As can be seen from the above, the fourth level of data multiplexing: In the process of computing a convolution result by the convolution operator 204, the same line of initial input data, for example, the m-th line of data of the initial input data, The number of L* convolution results (L is the number of lines of the convolution kernel) can be multiplexed. That is, by performing a convolution operation on a row of data in the initial input data and the entire convolution kernel, the fourth multiplexing of the row of data on the initial input data is realized.

[0101] In summary, in the present invention, at least one calculation circuit reads out the initial input data and the weight value from the storage circuit for the first time to perform the first convolution operation, and realizes that the same initial input data and the weight value are different. The first data multiplexing in the calculation circuit; by accumulating the first convolution result with multiple other first convolution results with corresponding relationships, the same first convolution result in the same calculation circuit is realized The second data multiplexing; the second convolution result is obtained by accumulating multiple first convolution results with corresponding relationships, as the initial input data of the next layer of convolutional neural network, and the convolutional neural network layer is realized The third data multiplexing between layers. That is, through three times of data multiplexing, the utilization rate of data is increased, the number of data accesses is reduced, thereby increasing the calculation speed of the calculation circuit and reducing the nerve Power consumption of network processors.

[0102] Secondly, by performing a convolution operation on each row of data in the initial input data and the entire convolution kernel, the fourth multiplexing of each row of data on the initial input data is achieved, which can further improve the data utilization rate, The number of data accesses is reduced, thereby further increasing the calculation speed of the calculation circuit and reducing the power consumption of the neural network processor.

[0103] Again, after accumulating all the first convolution results with a corresponding relationship, the first convolution result is deleted, saving the storage space of the storage circuit; in a row of data of the initial input data and the convolution kernel After the convolution operation is completed, deleting the row of data of the initial input data can further save the storage space of the storage circuit.

[0104] In addition, in the case where a plurality of calculation circuits operate in parallel, the efficiency of parallel calculation can be improved.

Embodiment 2

[0106] FIG. 4 is a flowchart of a convolutional neural network data multiplexing method according to Embodiment 2 of the present invention.

[0107] The convolutional neural network data multiplexing method can be applied to mobile electronic devices or fixed electronic devices, the electronic devices are not limited to personal computers, smart phones, tablet computers, desktops or integrated cameras Machine etc. The electronic device stores the initial input data and weight values required by the user to perform the convolution operation in the storage circuit 10, and reads the initial input data and weights from the storage circuit 10 by controlling at least one calculation circuit 20 Value and perform a convolution operation based on the initial input data and the weight value. Since the initial input data and weight values required for performing the convolution operation are stored in the storage circuit 10 collectively, when there are multiple calculation circuits 20, the multiple calculation circuits can simultaneously read the initial value from the storage circuit 10 Enter data and weight values. In this way, multiplexing of initial input data and weight values can be achieved, so as to reduce the number of data accesses and reduce the power consumption of the processor.

[0108] For electronic devices that require convolutional neural network data multiplexing, the convolutional neural network data multiplexing function provided by the method of the present invention may be directly integrated on the electronic device. Or provide the interface of the convolutional neural network data multiplexing function in the form of a software development kit (S oftware Development Kit, SDK), and the electronic device realizes the multiplexing of the convolutional neural network data through the provided interface.

[0109] The method for convolutional neural network data multiplexing can also be applied to a hardware environment composed of a terminal and a server connected to the terminal through a network. The network includes but is not limited to: wide area network, metropolitan area network or local area network. The image feature extraction method of the embodiment of the present invention may be executed by the server, or It is executed by the terminal, or may be executed jointly by the server and the terminal.

[0110] The term “terminal or server” in this context refers to an intelligent terminal that can perform a predetermined processing procedure such as a numerical operation and/or a logical operation by running a predetermined program or instruction, which may include a processor and a memory, and the processor The execution of the surviving instructions pre-stored in the memory to execute the predetermined processing procedure, or the execution of the predetermined processing procedure by hardware such as ASI C, FPGA, DSP, or a combination of the two. Computer devices include but are not limited to servers, personal computers, laptops, tablets, smartphones, etc.

[0111] The methods discussed below (some of which are shown by flowcharts) may be implemented by hardware, software, firmware, middleware, microcode, hardware description language, or any combination thereof. When implemented in software, firmware, middleware, or microcode, the program code or code segments used to perform the necessary tasks may be stored in a machine- or computer-readable medium (such as a storage medium). One or more processors can perform the necessary tasks.

[0112] As shown in FIG. 4, the convolutional neural network data multiplexing method specifically includes the following steps. According to different requirements, the order of the steps in the flowchart may be changed, and some steps may be omitted.

[0113] S41: storing, by the storage circuit, initial input data and weight values required for convolution operation.

[0114] In this embodiment, the user may configure in advance the initial input data and weight values required for performing the convolution operation, and store them in the electronic device.

[0115] The electronic device obtains initial input data and weight values required for convolution operation and stores them in the storage circuit 10. Wherein, the initial input data may be stored in the data memory 100 of the storage circuit 10, and the initial input data may be an input feature map, which participates in the calculation as the initial data. The weight value may be stored in the weight memory 102 of the storage circuit 10.

[0116] S42: controlling the at least one calculation circuit to perform a convolution operation according to the initial input data and the weight value in the current layer convolutional neural network to obtain a plurality of first convolution results. After the first convolution results are accumulated, multiple second convolution results are obtained.

[0117] In this embodiment, a plurality of calculation circuits 20 may be provided to form an operation array, and the plurality of calculation circuits 20 simultaneously read the initial input data and weight values required for convolution operation from the storage circuit 10, and Convolution is performed in parallel. The convolutional neural network model uses a fully connected method for operation [0118] The number of calculation circuits 20 can be set according to the actual situation, and the number of calculation circuits required can be comprehensively considered according to the entire calculation amount and the calculation amount that each calculation circuit can process. For example, FIG. 1 shows Two parallel computing circuits 20.

[0119] Specifically, each computing circuit 20 is controlled to read the initial input data from the corresponding data memory 100 and buffered to the corresponding data buffer 200, and at the same time, controlling each computing circuit 20 to control the corresponding weight memory The weight value is read in 102 and cached in the corresponding weight buffer 202.

[0120] The initial input data has two sources: one is that the calculation circuit 20 reads from the data memory 100, and the other is the intermediate convolution result obtained by the calculation circuit 20, and the intermediate convolution The result is returned from the result buffer 206 to the data storage 200 as the initial input data of the next layer of convolutional neural network. Each data buffer 200 can store multiple initial input data simultaneously. In other embodiments, each data buffer 200 can also store only one initial input data. After the initial input data is subjected to a convolution operation to obtain all the convolution results, it can be deleted. All the convolution results are accumulated and then cached as new initial input data in the data buffer 200. Therefore, the data buffer 200 is similar to First In First Out (FIFO) Cache.

[0121] Specifically, the convolution operator 204 performs the current layer convolutional neural network according to the initial input data in the data buffer 200 and the weight value in the weight buffer 203 A plurality of first convolution results are obtained by a convolution operation, and a plurality of second convolution results are obtained after accumulating the first convolution results with a corresponding relationship.

[0122] The convolution results obtained by different calculation circuits 20 performing convolution operations are different. For the process of the convolution operation by the convolution operator 204, see FIG. 3 and its related description.

[0123] S43: After accumulating all the first convolution results having a corresponding relationship, controlling the at least one calculation circuit to delete the plurality of first convolution results.

[0124] The convolution operator 204 of the calculation circuit 20 deletes the plurality of first convolution results after accumulating all the first convolution results having a corresponding relationship.

[0125] S44: determine whether the current layer convolutional neural network is the last layer convolutional neural network.

[0126] After the result buffer 206 obtains the convolution result, it is determined whether the obtained convolution result is the final convolution result. [0127] When it is determined that the current layer convolutional neural network is not the last layer convolutional neural network, execute S45; No 1J, when it is determined that the current layer convolutional neural network is the last layer convolutional neural network, Go to S46.

[0128] S45: Determine the plurality of second convolution results as intermediate convolution results, and send the intermediate convolution results to the at least one calculation circuit for buffering as the next layer of convolution The initial input data of the neural network.

[0129] When the current layer convolutional neural network is not the last layer of convolutional neural network, the result buffer 206 determines multiple second convolution results of the current layer as intermediate convolution results And send the intermediate convolution result to the data buffer 200 in the calculation circuit 20 for buffering as the initial input data of the next layer of convolutional neural network.

[0130] S46: Determine the plurality of second convolution results as the final convolution result, and send the final convolution result to the storage circuit.

[0131] When the current layer convolutional neural network is the last layer convolutional neural network, the result buffer 206 determines a plurality of second convolution results of the current layer as final convolution results, and The final convolution result is sent to the data memory 100 in the storage circuit 10 for storage.

[0132] In the process of performing the convolution operation, the output of the convolutional neural network of the previous layer is usually used as the input of the convolutional neural network of the next layer, that is, the output of the first layer of the convolutional neural network is used as the second layer of convolution The input of the convolutional neural network, the output of the second layer of the convolutional neural network is used as the input of the third layer of the convolutional neural network, and so on, until the last layer of the convolutional neural network outputs the convolution result. If it is not the last layer of convolutional neural network, the result buffer 206 directly buffers the intermediate convolution result to the data buffer 200 corresponding to the calculation circuit 20 as the initial input data of the next layer of convolutional neural network Perform the convolution operation. If it is the last layer of convolutional neural network, the result buffer 206 sends the final convolution result to the data memory 100 in the storage circuit 10 for storage.

[0133] The data processing process of the neural network processor 1 provided by an embodiment of the present invention is exemplified below with reference to the schematic diagram shown in FIG. 2.

[0134] Exemplarily, it is assumed that the initial input data stored in the storage circuit 10 required for performing the convolution operation is represented by CiO, and the weight value is represented by Weight, where the initial input data CiO is stored in the data memory In 100, the weight value Weight is stored in the weight memory 102.

[0135] In the first step, the storage circuit 10 broadcasts to all calculation circuits 20 (denoted by PE in the figure). Every After receiving the broadcast signal, each computing circuit 20 synchronously reads the initial input data CiO from the data memory 100 and buffers it to the data buffer 200; meanwhile, each computing circuit 20 also synchronizes from the weight memory The weight value Weight is read in 102 and cached in the weight buffer 202.

[0136] The convolution operator 204 (denoted by MAC in the figure) of each calculation circuit 20 is based on the initial input data CiO in the corresponding data memory 200 (denoted by IBUF in the figure) and the weight value in the corresponding weight memory 202 Weight performs the convolution operation of the first layer convolutional neural network to obtain the first layer convolution result CoO, and caches the first layer convolution result CoO into the result buffer 206. Since the convolution operation result CoO obtained in the first operation is not the final layer convolution operation result, the result buffer 206 (denoted by OBUF in the figure) returns the first layer convolution operation result CoO to The data buffer 200 of the calculation circuit 20 performs buffering as the initial input data Cil of the second layer convolutional neural network.

[0137] In the second step, the calculation circuit 20 synchronously reads the initial input data Cil from the data buffer 200; the convolution operator 204 according to the initial input data Cil in the corresponding data memory 200 and the corresponding weight in the weight memory 202 Value to perform the convolution operation of the second layer convolutional neural network to obtain the convolution result Col of the second layer, and cache the convolution result Col of the first layer into the result buffer 206. The result buffer 206 returns the convolution operation result Col of the second layer to the data buffer 200 of the calculation circuit 20 for buffering as the initial input data Ci2 of the third layer convolutional neural network.

[0138] and so on.

[0139] In the last step, each calculation circuit 20 performs the convolution operation of the final layer convolutional neural network according to the convolution result and the weight value obtained in the penultimate step operation to obtain the final convolution result, and the final The convolution result is sent to the data memory 100 in the storage circuit 10 for storage.

[0140] It should be noted that, since the convolutional neural network model in this embodiment adopts a fully connected mode, each computing circuit 20 performs an initial operation in the data buffer 200 during the convolution operation. Input data, after performing convolution operations in the current layer of convolutional neural network, multiple first convolution results will be obtained, and the first convolution results with corresponding relationships (for example, the same neuron) are accumulated and obtained Multiple second convolution results. After accumulating all the first convolution results with a corresponding relationship, the initial input data may be deleted. Until the final layer of convolutional neural network completes the convolution operation, the final convolution result is obtained.

[0141] In this embodiment, to facilitate correspondence with the initial input data as the input feature map, the convolution The result is also called output feature map. The above embodiment illustrates the data processing process of the neural network processor 1, in which there are three levels of data multiplexing process. Through these three layers of data multiplexing, the parallelism of the neural network processor can be greatly improved, and the power consumption of the entire processor can be effectively reduced.

[0142] The following specifically describes three-layer data multiplexing:

[0143] The first level of data multiplexing: each calculation circuit 20 reads the initial input data and the weight value from the storage circuit 10 for the first time to complete the convolution operation in the first layer of convolutional neural network, so that The first data of the same initial input data and weight value in different calculation circuits 20 are multiplexed.

[0144] The second level of data multiplexing: The result buffer 206 of each calculation circuit 20 can store multiple multiple first convolution results at the same time, and the first convolution result and multiple other first relationships having a corresponding relationship The convolution results are accumulated to realize the second data multiplexing of the same first convolution result in the same calculation circuit 20

[0145] Third level data multiplexing: all the convolution results (including the intermediate convolution result and the final convolution result) can be all cached in the result buffer 206, if the second convolution result is intermediate , The result buffer 206 directly returns the intermediate convolution result to the data buffer 200 for buffering and serves as the initial input data of the next layer of convolutional neural network. That is, by accumulating a plurality of corresponding first convolution results to obtain a second convolution result, as the initial input data of the convolutional neural network of the next layer, the convolutional neural network layer to layer is realized. The third data multiplexing.

[0146] The above three levels of data multiplexing can be shown in FIG. 1 and FIG. 2, the embodiment of the present invention also proposes a fourth level of data multiplexing scheme, through the fourth level of data multiplexing, the operation is further optimized The degree of parallelism improves the computational efficiency and data utilization rate of the convolution operator. Refer to the schematic diagram shown in Figure 3 below for details of the fourth level data multiplexing process.

[0147] FIG. 3 is a schematic diagram of a process in which a convolution operator uses a certain initial input data to calculate a corresponding convolution result. The left side of Fig. 3 is the convolution kernel, the middle part of Fig. 3 is the initial input data, and the right side of Fig. 3 is the corresponding convolution result.

[0148] The convolution operator performs a convolution operation according to the initial input data and the weight value in the current layer convolutional neural network to obtain a plurality of first convolution results including:

[0149] Perform a convolution operation on the Qth line data of the initial input data and the Lth line data of the preset convolution kernel, and the corresponding data is the sub-line of the Q-L+1 line of the third convolution result data; [0150] accumulate all the sub-data located in the Q-L+1 line to obtain the data in the Q-L+1 line;

[0151] performing a convolution operation according to the third convolution result and the weight value to obtain a plurality of the first convolution results

[0152] Wherein, the value range of Q is 1 to M, M is the total number of lines of the initial input data, the value range of L is 1 to N, and N is the total number of lines of the preset convolution kernel.

[0153] performing a convolution operation on the Q-th row data of the initial input data and each row of data of the preset convolution kernel, and when the Q-th row data and the preset convolution After the convolution operation is performed on all the row data of the kernel, the Qth row data of the initial input data is deleted until the initial input data is deleted.

[0154] Exemplarily, the convolution operator 204 of each calculation circuit 20 uses a 3*3 convolution kernel. The convolution kernel slides from the left to the right of the initial input data, and then from the top to the bottom of the initial input data. During the sliding process, the multiply-accumulate operation is performed to obtain the convolution result of the corresponding position.

[0155] When the convolution kernel slides to the position 1 shown in FIG. 3 (that is, the convolution kernel slides to the m-2th, m-1, and mth rows of the initial input data), where The w6, w7, and w8 in the convolution kernel need to perform a convolution operation on the data of the mth row, and the obtained data corresponds to the m-2th result of the convolution result.

[0156] When the convolution kernel slides to the position 2 shown in FIG. 3 (ie, the convolution kernel slides to the m-1th, mth, and m+1th rows of the initial input data), where The w3, w4, and w6 in the convolution kernel need to perform a convolution operation on the data of the mth row, and the obtained data corresponds to the result of the m-1th row of the convolution result.

[0157] When the convolution kernel slides to the position 3 shown in FIG. 3 (ie, the convolution kernel slides to the mth, m+1, and m+2th rows of the initial input data, where The wl, w2, and w3 in the product kernel need to be convoluted with the data of the mth row, and the obtained data corresponds to the result of the mth row of the convolution result.

[0158] As can be seen from the above, the fourth level of data multiplexing: In the process of the convolution operator 204 computing a convolution result, the same line of initial input data, for example, the m-th line of data of the initial input data, The number of L* convolution results (L is the number of lines of the convolution kernel) can be multiplexed. That is, by performing a convolution operation on a row of data in the initial input data and the entire convolution kernel, the fourth multiplexing of the row of data on the initial input data is realized.

[0159] In summary, the present invention reads out the initial input data and the weight value from the storage circuit for the first convolution operation through the at least one calculation circuit for the first time, and realizes that the same initial input data and the weight value are different. The first data multiplexing in the calculation circuit; by accumulating the first convolution result with multiple other first convolution results with corresponding relationships, the same first convolution result in the same calculation circuit is realized The second data multiplexing; the second convolution result is obtained by accumulating multiple first convolution results with corresponding relationships, as the initial input data of the next layer of convolutional neural network, and the convolutional neural network layer is realized The third data multiplexing between layers. That is, through three times of data multiplexing, the data utilization rate is increased, the number of data accesses is reduced, thereby increasing the calculation speed of the calculation circuit and reducing the power consumption of the neural network processor.

[0160] Secondly, by performing a convolution operation on each row of data in the initial input data and the entire convolution kernel, the fourth multiplexing of each row of data on the initial input data is realized, which can further improve the data utilization rate, The number of data accesses is reduced, thereby further increasing the calculation speed of the calculation circuit and reducing the power consumption of the neural network processor.

[0161] Again, after accumulating all the first convolution results with a corresponding relationship, the first convolution result is deleted to save the storage space of the storage circuit; a row of data in the initial input data and the convolution kernel After the convolution operation is completed, deleting the row of data of the initial input data can further save the storage space of the storage circuit.

[0162] In addition, in the case where a plurality of calculation circuits operate in parallel, the efficiency of parallel calculation can be improved.

[0163] The above FIG. 4 details the convolutional neural network data multiplexing method of the present invention. The following describes the functional modules and implementations of the software system that implements the convolutional neural network data multiplexing method in conjunction with FIGS. 5 and 6 respectively. The hardware system architecture of the convolutional neural network data multiplexing method is introduced.

[0164] It should be understood that the embodiments are for illustrative purposes only, and are not limited by the structure in the scope of patent applications

Embodiment 3

[0166] Referring to FIG. 5, it is a functional block diagram of a preferred embodiment of a convolutional neural network data multiplexing device of the present invention.

[0167] In some embodiments, the convolutional neural network data multiplexing device 50 runs in an electronic device. The convolutional neural network data multiplexing device 50 may include multiple function modules composed of program code segments. The program codes of each program segment in the convolutional neural network data multiplexing device 50 can be stored in the memory of the electronic device and executed by at least one processor to execute (see FIG. 4 for details) The data reuse of the product neural network.

[0168] In this embodiment, the convolutional neural network data multiplexing device 50 may be divided into multiple functional modules according to the functions it performs. The functional module may include: a storage module 501, a convolution operation module 502, a deletion module 503, a judgment module 504, a first determination module 505, and a second determination module 506. The module referred to in the present invention refers to a series of computer program segments that can be executed by at least one processor and can perform fixed functions, and are stored in the memory. In this embodiment, the functions of each module will be described in detail in subsequent embodiments.

[0169] a storage module 501, configured to store, through the storage circuit, initial input data and weight values required for convolution operation;

[0170] The convolution operation module 502 is configured to control the at least one calculation circuit to perform a convolution operation according to the initial input data and the weight value in the current layer convolutional neural network to obtain a plurality of first convolution results, Accumulating the first convolution results with corresponding relationships to obtain multiple second convolution results;

[0171] a deletion module 503, configured to control the at least one calculation circuit to delete the plurality of first convolution results after accumulating all the first convolution results having a corresponding relationship;

[0172] The judgment module 504 is configured to judge whether the current layer convolutional neural network is the last layer convolutional neural network.

[0173] The first determination module 505 is configured to determine the plurality of second convolution results as intermediate when the determination module 504 determines that the current layer convolutional neural network is not the last layer convolutional neural network And the intermediate convolution result is sent to the at least one calculation circuit for buffering as the initial input data of the convolutional neural network of the next layer;

[0174] The second determination module 506 is configured to determine the plurality of second convolution results when the determination module 504 determines that the current layer convolutional neural network is the last layer convolutional neural network The final convolution result, and the final convolution result is sent to the storage circuit.

[0175] For a specific description of the above modules (501-506), refer to the data multiplexing method of the convolutional neural network described in the embodiment, which will not be elaborated in detail herein.

[0176] In summary, the neural network processor provided by the embodiment of the present invention implements the first convolution operation by at least one calculation circuit reading out the initial input data and the weight value from the storage circuit for the first time The first data multiplexing of the same initial input data and weight value in different calculation circuits; A convolution result is accumulated with a plurality of other first convolution results having a corresponding relationship, and the second data multiplexing of the same first convolution result in the same calculation circuit is realized; by multiple corresponding relationships The first convolution result is accumulated and the second convolution result is obtained. As the initial input data of the next layer of convolutional neural network, the third data multiplexing between the layers of the convolutional neural network is realized; A row of data in the initial input data is convoluted with the entire convolution kernel to realize the fourth multiplexing of a row of data of the initial input data. That is, through four times of data multiplexing, the utilization rate of data is improved, the number of data accesses is reduced, and the power consumption of the processor is effectively reduced, which reduces the power consumption of the processor and improves the parallelism of the calculation circuit .

[0177] Secondly, after accumulating all the first convolution results having a corresponding relationship, the first convolution result is deleted to save the storage space of the calculation circuit; and a row of data and convolution in the initial input data After completing the convolution operation for each row of data in the kernel, the row of data of the initial input data is deleted, which further saves the storage space of the calculation circuit, thereby effectively reducing the power consumption of the entire neural network processor and improving the calculation circuit. Operational efficiency of convolution operations.

Embodiment 4

[0179] Referring to FIG. 6, in a preferred embodiment of the present invention, the electronic device 6 includes a memory 61, at least one processor 62, at least one communication bus 63, a display screen 64, and at least one neural network processor 66

[0180] Those skilled in the art should understand that the structure of the electronic device shown in FIG. 6 does not constitute a limitation of the embodiment of the present invention, and may be a bus-type structure or a star structure, and the electronic device 6 may also It includes more or less other hardware or software than shown, or a different arrangement of components.

[0181] In some embodiments, the electronic device 6 includes a device that can automatically perform numerical operations and/or information processing according to instructions set or stored in advance. The hardware of the electronic device 6 includes but is not limited to: a microprocessor, a dedicated neural network processor (Application Specific Integrated Circuit, ASIC), and a programmable gate array (Field-Programmable Gate)

Array, FPGA), Digital Signal Processor (DSP) and embedded devices, etc.

. The electronic device 6 may further include user equipment. The user equipment includes, but is not limited to, any electronic product that can interact with a user through a keyboard, a mouse, a remote control, a touchpad, or a voice control device. For example, Personal computer, tablet, smart phone, digital camera, etc. [0182] It should be noted that the electronic device 6 is only an example, and other existing or future electronic products that may be adapted to the present invention should also be included in the scope of protection of the present invention, and by reference Included here.

[0183] In some embodiments, the memory 61 is used to store program codes and various data, such as a convolutional neural network data multiplexing device 50 installed in the electronic device 6, and is running on the electronic device 6. In the process, high-speed and automatic access to programs or data is achieved. The memory 61 includes read-only memory (Rea d-Only Memory, ROM), random access memory (Random Access Memory, RAM), programmable read-only memory (Programmable Read-Only Memory, PROM), and erasable programmable read-only Memory (Erasable Programmable Read-Only Memory, EPROM), One-time Programmable Read-Only Memory (OTPROM), Electronically Erasable Programmable Read-Only Memory (Electrically-Erasable Programmable Read-Only Memory, EEPROM), Compact Disc Read-Only Memory (CD-ROM) or other optical disk storage, magnetic disk storage, magnetic tape storage, or any other medium readable by a computer that can be used to carry or store data.

[0184] In some embodiments, the at least one processor 62 may be composed of a neural network processor, for example, may be composed of a single packaged neural network processor, or may be packaged by multiple same functions or different functions Neural network processor, including one or more central processors (Central

Processing unit (CPU), microprocessor, digital processing chip, graphics processor and various control chip combinations. The at least one processor 62 is a control unit (Control Unit) of the electronic device 6, connects various components of the entire electronic device 6 with various interfaces and lines, and runs or executes a program stored in the memory 61 or A module, and calling data stored in the memory 61 to perform various functions of the electronic device 6 and process data, such as a function of performing multiplexing of convolutional neural network data.

[0185] In some embodiments, the at least one communication bus 63 is configured to implement one of the memory 61, the at least one processor 62, the display screen 64, the at least one neural network processor 66, etc. Connection communication.

[0186] In some embodiments, the display screen 64 may be used to display information input by the viewer or provided to the viewer and various graphical viewer interfaces of the electronic device 6, these graphical viewer interfaces may be It consists of graphics, text, icons, video and any combination thereof. The display screen 64 may include a display panel. Alternatively, the display panel may be configured in the form of a liquid crystal display (Liquid Crystal Display, LCD), organic light-emitting diode (Organic Light-Emitting Diode, OLED), or the like.

[0187] The display screen 64 may further include a touch panel. If the display screen 64 includes a touch panel, the display screen 64 may be implemented as a touch screen to receive input signals from the viewer. The touch panel includes one or more touch sensors to sense touch, swipe, and gestures on the touch panel. The above-mentioned touch sensor can not only sense the boundary of the touch or slide operation, but also detect the duration and pressure related to the above-mentioned touch or slide operation. The display panel and the touch panel may be implemented as two independent components to realize input and input functions, but in some embodiments, the display panel and the touch panel may be integrated to realize input and output functions .

[0188] Although not shown, the electronic device 6 may further include a power supply (such as a battery) to supply power to various components

Preferably, the power supply may be logically connected to the at least one processor 62 through a power management system, so that functions such as charging, discharging, and power consumption management are implemented through the power management system. The power supply may also include any component such as one or more DC or AC power supplies, recharging systems, power failure detection circuits, power converters or inverters, and power status indicators. The electronic device 6 may also include various sensors, Bluetooth modules, communication modules, and the like. The invention is not repeated here.

[0189] It should be understood that the described embodiments are for illustrative purposes only, and are not limited by this structure in the scope of patent applications

[0190] The above integrated unit implemented in the form of a software function module may be stored in a computer-readable storage medium. The above software function modules are stored in a storage medium, and include several instructions to enable a computing device (which may be a personal computing device, a client, or a network device, etc.) or a processor to execute various embodiments of the present invention. Part of the method.

[0191] In a further embodiment, with reference to FIG. 1, the at least one processor 62 may execute the operating system of the electronic device 6 and various installed application programs (such as the convolutional neural network data multiplexing described above) Device 60), program code, etc.

[0192] The memory 61 stores program codes, and the at least one processor 62 may call the program codes stored in the memory 61 to perform related functions. For example, each module described in FIG. 5 is a program code stored in the memory 61, and is executed by the at least one processor 62, so as to realize The function of each module is described to achieve the purpose of generating a neural network model according to user needs.

[0193] In one embodiment of the present invention, the memory 61 stores a plurality of instructions, and the plurality of instructions are executed by the at least one processor 62 to implement a function of randomly generating a neural network model.

[0194] Specifically, for a specific implementation method of the at least one processor 62 for the above instruction, reference may be made to the description of relevant steps in the embodiment corresponding to FIG. 1, and details are not described herein again.

[0195] In the several embodiments provided by the present invention, it should be understood that the disclosed system, device, and method may be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of the module is only a division of logical functions, and in actual implementation, there may be another division manner.

[0196] The module described as a separate component may or may not be physically separated, and the component displayed as a module may or may not be a physical unit, that is, may be in one place, or may be distributed to multiple network units on. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

[0197] In addition, each functional module in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware, or in the form of hardware plus software function modules.

Claims

[Claim i] A neural network processor, characterized in that the neural network processor comprises: a storage circuit for storing initial input data and weight values required for convolution operation; at least one calculation circuit, used Reading the initial input data and the weight value from the storage circuit, and performing a convolution operation based on the initial input data and the weight value, where,

The at least one calculation circuit includes:

A data buffer for buffering the initial input data read by the calculation circuit; a weight buffer for buffering the weight value read by the calculation circuit;

A convolution operator, configured to perform a convolution operation according to the initial input data and the weight value in the current layer convolutional neural network to obtain multiple first convolution results, and convert the first volume having a corresponding relationship After accumulating the product results, a plurality of second convolution results are obtained; meanwhile, after accumulating all the first convolution results having a corresponding relationship, the plurality of first convolution results are deleted;

A result buffer, used to cache the plurality of second convolution results, and send the plurality of second convolution results to the data buffer according to a preset storage rule, as a next layer of convolutional neural network The initial input data; or, sent to the storage circuit for storage.

[Claim 2] The neural network processor according to claim 1, wherein the preset storage rule includes:

When the current layer convolutional neural network is not the last layer convolutional neural network, the result buffer determines the plurality of second convolution results as an intermediate convolution result, and the intermediate convolution The product result is sent to the data buffer;

When the current layer convolutional neural network is the last layer convolutional neural network, the result buffer determines the plurality of second convolution results as the final convolution result, and the final convolution The result is sent to the storage circuit.

[Claim 3] The neural network processor according to claim 1 or 2, wherein the convolution operator performs the current input convolutional neural network based on the initial input data and the weight value The multiple first convolution results obtained by the convolution operation include:

Perform a convolution operation on the Q-line data of the initial input data and the L-line data of the preset convolution kernel, and the corresponding data is the sub-data of the Q-L+1 line of the third convolution result; Accumulate all the sub-data located in the Q-L+1 line to obtain the data in the Q-L+1 line; perform a convolution operation according to the third convolution result and the weight value to obtain a plurality of the first Convolution results;

Wherein, the value range of Q is 1 to M, M is the total number of lines of the initial input data, and the value range of L is 1 to N, and N is the total number of lines of the preset convolution kernel.

[Claim 4] The neural network processor according to claim 3, characterized in that the Q-th line data of the initial input data and each line of data of the preset convolution kernel are respectively rolled Product operation, and after performing a convolution operation on the Q-th row data and all the row data of the preset convolution kernel, delete the Q-th row data of the initial input data until all The initial input data is deleted.

[Claim 5] A convolutional neural network data multiplexing method applied to an electronic device, characterized in that the electronic device includes the neural network processor according to any one of claims 1 to 4. The methods described include:

Storing, by the storage circuit, initial input data and weight values required for performing convolution operations; controlling the at least one calculation circuit to perform convolution operations according to the initial input data and the weight values in the current layer convolutional neural network Obtaining multiple first convolution results, and accumulating the first convolution results with corresponding relationships to obtain multiple second convolution results; when accumulating all the first convolution results with corresponding relationships , Controlling the at least one calculation circuit to delete the plurality of first convolution results; when the current layer convolution neural network is not the last layer convolution neural network, determine the plurality of second convolution results Is an intermediate convolution result, and the intermediate convolution result is sent to the at least one calculation circuit for buffering as the initial input data of the next layer of convolution neural network;

When the current layer convolutional neural network is the last layer convolutional neural network, the plurality of second convolution results are determined as final convolution results, and the final convolution results Sent to the memory circuit.

[Claim 6] The method according to claim 5, characterized in that the at least one calculation circuit is controlled to perform a convolution operation based on the initial input data and the weight value in the current layer convolutional neural network to obtain multiple The first convolution results include:

[Claim 7] The neural network processor according to claim 6, wherein the Q-th line data of the initial input data and each line of data of the preset convolution kernel are respectively rolled Product operation, and after performing a convolution operation on the Q-th row data and all the row data of the preset convolution kernel, delete the Q-th row data of the initial input data until all The initial input data is deleted.

[Claim 8] A convolutional neural network data multiplexing device, installed in an electronic device, characterized in that the electronic device includes the neural network processor according to any one of claims 1 to 4. The device includes:

A storage module, configured to store the initial input data and weight values required for convolution operation through the storage circuit;

A convolution operation module, configured to control the at least one calculation circuit to perform a convolution operation according to the initial input data and the weight value in the current layer convolutional neural network to obtain multiple first convolution results, which will have a corresponding relationship After accumulating the first convolution results, multiple second convolution results are obtained;

A deleting module, configured to control the at least one calculation circuit to delete the plurality of first convolution results after accumulating all the first convolution results having a corresponding relationship; The current layer of convolutional neural network is not the last layer of convolution When passing through the network, determine the plurality of second convolution results as intermediate convolution results, and send the intermediate convolution results to the at least one calculation circuit for buffering as the next layer of convolution neurals The initial input data of the network;

A second determining module, configured to determine the plurality of second convolution results as the final convolution result when the current layer of convolutional neural network is the last layer of convolutional neural network, and convert the The final convolution result is sent to the storage circuit.

[Claim 9] An electronic device, characterized in that the electronic device includes a processor, and the processor is used to implement a computer program stored in a memory to implement the volume according to any one of claims 5 to 7. Product multiplexing method of product neural network.

[Claim 10] A computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the volume according to any one of claims 5 to 7 is realized Product multiplexing method of product neural network.