WO2020134546A1 - Neural network processor, convolutional neural network data multiplexing method and related device - Google Patents

Neural network processor, convolutional neural network data multiplexing method and related device Download PDF

Info

Publication number
WO2020134546A1
WO2020134546A1 PCT/CN2019/114725 CN2019114725W WO2020134546A1 WO 2020134546 A1 WO2020134546 A1 WO 2020134546A1 CN 2019114725 W CN2019114725 W CN 2019114725W WO 2020134546 A1 WO2020134546 A1 WO 2020134546A1
Authority
WO
WIPO (PCT)
Prior art keywords
convolution
data
neural network
initial input
input data
Prior art date
Application number
PCT/CN2019/114725
Other languages
French (fr)
Chinese (zh)
Inventor
李炜
曹庆新
Original Assignee
深圳云天励飞技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳云天励飞技术有限公司 filed Critical 深圳云天励飞技术有限公司
Publication of WO2020134546A1 publication Critical patent/WO2020134546A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Definitions

  • Neural network processor convolutional neural network data multiplexing method and related equipment
  • One of the most commonly used models in neural network processors is the convolutional neural network model.
  • the convolutional neural network model has a series of problems such as slow speed and large power consumption when performing operations. Therefore, how to improve the operation speed of the convolutional neural network model in the neural network processor and reduce power consumption has become a technical problem to be solved urgently.
  • a first aspect of the present invention provides a neural network processor, the neural network processor includes:
  • At least one calculation circuit for reading the initial input data and the weight value from the storage circuit, and performing a convolution operation based on the initial input data and the weight value, wherein,
  • the at least one calculation circuit includes: [0009] a data buffer for buffering the initial input data read by the calculation circuit;
  • a convolution operator is used to perform a convolution operation according to the initial input data and the weight value in the current layer convolutional neural network to obtain a plurality of first convolution results, and the corresponding relationship After accumulating the first convolution results, a plurality of second convolution results are obtained; meanwhile, after accumulating all the first convolution results having a corresponding relationship, the plurality of first convolution results are deleted;
  • a result buffer configured to cache the plurality of second convolution results, and send the plurality of second convolution results to the data buffer according to a preset storage rule as a next-level volume The initial input data of the product neural network; or, sent to the storage circuit for storage.
  • the preset storage rule includes:
  • the result buffer determines the plurality of second convolution results as intermediate convolution results, and the The intermediate convolution result is sent to the data buffer;
  • the convolution operator performs a convolution operation according to the initial input data and the weight value in the current layer convolutional neural network to obtain a plurality of first convolution results including:
  • the value range of Q is 1 to M
  • M is the total number of lines of the initial input data
  • the value range of L is 1 to N
  • N is the total number of lines of the preset convolution kernel.
  • the data of the Qth line of the initial input data and each line of data of the preset convolution kernel are respectively subjected to a convolution operation, and when the data of the Qth line and the After all the row data of the convolution kernel are subjected to the convolution operation, the data of the Qth row of the initial input data is deleted until all the data The initial input data is deleted.
  • the plurality of second convolution results are determined as final convolution results, and the final convolution The result is sent to the storage circuit.
  • controlling the at least one calculation circuit to perform a convolution operation according to the initial input data and the weight value in the current layer convolutional neural network to obtain a plurality of first convolution results includes:
  • the value range of Q is 1 to M
  • M is the total number of lines of the initial input data
  • the value range of L is 1 to N
  • N is the total number of lines of the preset convolution kernel.
  • a storage module for storing, through the storage circuit, initial input data and weight values required for convolution operation
  • the convolution operation module is configured to control the at least one calculation circuit to perform a convolution operation according to the initial input data and the weight value in the current layer convolutional neural network to obtain a plurality of first convolution results, After accumulating the first convolution results with a corresponding relationship, multiple second convolution results are obtained;
  • a deletion module configured to control the at least one calculation circuit to delete the plurality of first convolution results after accumulating all the first convolution results having a corresponding relationship
  • a first determining module configured to determine the plurality of second convolution results as intermediate convolution results when the current layer convolutional neural network is not the last layer of convolutional neural networks, and Sending the intermediate convolution result to the at least one calculation circuit for buffering as the initial input data of the next layer of convolutional neural network;
  • a fourth aspect of the present invention provides an electronic device, the electronic device includes a processor, and the processor is configured to implement the convolutional neural network data multiplexing method when executing an arithmetic program stored in a memory.
  • the present invention reads out the initial input data and the weight value from the storage circuit for the first convolution operation through at least one calculation circuit for the first time, and realizes the same initial input data and weight value in different calculation circuits.
  • One-time data multiplexing by accumulating the first convolution result with multiple other first convolution results with corresponding relationships, the second data replication of the same first convolution result in the same calculation circuit is realized Use; By accumulating multiple first convolution results with corresponding relationship to obtain the second convolution result, as the initial input data of the next layer of convolutional neural network, to achieve the convolutional neural network between layers
  • the third data multiplexing That is, through three times of data multiplexing, the data utilization rate is improved and the number is reduced. According to the number of visits, the calculation speed of the calculation circuit is increased, and the power consumption of the neural network processor is reduced.
  • the first convolution result is deleted, saving the storage space of the storage circuit; at a row of data of the initial input data and the convolution kernel After the convolution operation is completed, deleting the row of data of the initial input data can further save the storage space of the storage circuit.
  • FIG. 1 is a schematic diagram of a neural network processor provided by a preferred embodiment of the present invention.
  • FIG. 2 is a schematic diagram of another neural network processor provided by an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of data multiplexing when performing a convolution operation according to a preferred embodiment of the present invention.
  • FIG. 4 is a schematic flowchart of a data multiplexing method of a convolutional neural network according to a preferred embodiment of the present invention.
  • FIG. 5 is a structural diagram of a convolutional neural network data multiplexing device according to a preferred embodiment of the present invention.
  • FIG. 6 is a schematic diagram of an electronic device provided by a preferred embodiment of the present invention.
  • FIG. 1 and FIG. 2 is a schematic diagram of a neural network processor provided by an embodiment of the present invention.
  • the neural network processor 1 may include: a storage circuit 10, at least one calculation circuit 20, wherein the calculation circuit 20 is connected to the storage circuit 10.
  • the neural network processor 1 may be a programmable logic device, such as a field programmable logic array (Field Programmable Gate)
  • Array FPGA
  • ASIC Application Specific Integrated Circuits
  • the number of calculation circuits 20 can be set according to the actual situation, and the number of calculation circuits required can be comprehensively considered according to the entire calculation amount and the calculation amount that each calculation circuit can process.
  • FIG. 1 shows Two parallel computing circuits 20.
  • the neural network processor 1 is used to store the user-configured initial input data and weight values required for convolution operation in the storage circuit 10, through at least one calculation circuit 20 from The storage circuit 10 reads the initial input data and the weight value and performs a convolution operation based on the initial input data and the weight value.
  • the initial input data and weight values required for performing the convolution operation are collectively stored in the storage circuit 10, when there are a plurality of calculation circuits 20, the initial input data can be read from the storage circuit 10 synchronously And weight values. In this way, the initial input data and the weight value can be multiplexed, so as to reduce the number of data access times and reduce the power consumption of the processor.
  • the plurality of calculation circuits 20 may form an operation array, and the plurality of calculation circuits 20 simultaneously read from the storage circuit 10 the initial input data and weight values required for convolution operation, and Perform convolution operations in parallel.
  • the convolutional neural network model uses a fully connected method for operation
  • the calculation circuit 20 may also pre-store the number of channels required for convolution operation
  • the storage circuit 10 may include: a data memory 100 and a weight memory 102.
  • the data memory 100 is used to store initial input data required for performing convolution operations.
  • the initial input data may be an input feature map, which participates in the calculation as the initial data.
  • the data memory 100 can also be used to store the final convolution result calculated by the at least one calculation circuit 20.
  • the weight memory 102 is used to store weight values required for performing convolution operations.
  • the calculation circuit 20 may include: a data buffer 200, a weight buffer 202, a convolution operator 204, and a result buffer 206, wherein the result buffer 206 is also connected to all
  • the data memory 100 in the storage circuit 10 and the data buffer 200 in the calculation circuit 20 are described.
  • the data buffer 200 is used to buffer the initial input data read by the calculation circuit 20 from the data memory.
  • the initial input data has two sources: one is that the calculation circuit 20 reads from the data memory 100, and the other is the intermediate convolution result obtained by the calculation circuit 20, and the intermediate convolution result is determined by all
  • the result buffer 206 is returned to the data storage 200 as the initial input data of the next layer of convolutional neural network.
  • Each data buffer 200 can store multiple initial input data simultaneously. In other embodiments, each data buffer 200 may only store one initial input data. After performing a convolution operation on the initial input data to obtain all the convolution results, it can be deleted. All the convolution results are accumulated and then cached in the data buffer 200 as new initial input data. Therefore, the data buffer 200 is similar to First In First Out (FIFO). Cache.
  • FIFO First In First Out
  • the weight buffer 202 is used to cache the weight value read by the calculation circuit 20 from the weight memory 102.
  • the convolution operator 204 is configured to perform convolution in the current layer convolutional neural network according to the initial input data in the data buffer 200 and the weight value in the weight buffer 203 Multiple first convolution results are obtained by product operation, and a plurality of second convolution results are obtained by accumulating the first convolution results having corresponding relationships; at the same time, when all the first convolution results having corresponding relationships After the accumulation results are accumulated, the multiple first convolution results are deleted.
  • the result buffer 206 is used to cache the plurality of second convolution results and send the plurality of second convolution results to the data buffer 200 according to a preset storage rule as the following
  • Different calculation circuits 20 use the same initial input data to perform convolution operations to obtain different convolution results. Therefore, different convolution results are stored in the result buffer 206 of different calculation circuits 20.
  • the result buffer 206 of each calculation circuit 20 may also store multiple convolution results at the same time.
  • the preset storage rule is a preset storage rule, and the preset storage rule may include:
  • the result buffer 206 determines the plurality of second convolution results as intermediate convolution results, and The intermediate convolution result is sent to the data buffer 200;
  • the result buffer 206 determines the plurality of second convolution results as the final convolution result, and the The final convolution result is sent to the storage circuit 10.
  • the output of the convolutional neural network of the previous layer is usually used as the input of the convolutional neural network of the next layer, that is, the output of the first layer of the convolutional neural network is used as the second layer of convolution
  • the input of the convolutional neural network, the output of the second layer of the convolutional neural network is used as the input of the third layer of the convolutional neural network, and so on, until the last layer of the convolutional neural network outputs the convolution result.
  • the result buffer 206 directly buffers the intermediate convolution result to the data buffer 200 corresponding to the calculation circuit 20 as the initial input data of the next layer of convolutional neural network Perform the convolution operation. If it is the last layer of convolutional neural network, the result buffer 206 sends the final convolution result to the data memory 100 in the storage circuit 10 for storage.
  • the initial input data stored in the storage circuit 10 required for convolution operation is represented by CiO
  • the weight value is represented by Weight
  • the initial input data CiO is stored in the data memory In 100
  • the weight value Weight is stored in the weight memory 102.
  • the storage circuit 10 broadcasts to all calculation circuits 20 (denoted by PE in the figure). After each computing circuit 20 receives the broadcast signal, it synchronously reads the initial input data CiO from the data memory 100 and buffers it to the data buffer 200; meanwhile, each computing circuit 20 also synchronizes the weights The weight value Weight is read from the memory 102 and cached in the weight buffer 202.
  • the convolution operator 204 (denoted by MAC in the figure) of each calculation circuit 20 is based on the initial input data CiO in the corresponding data memory 200 (denoted by IBUF in the figure) and the weight value in the corresponding weight memory 202 Weight performs the convolution operation of the first layer convolutional neural network to obtain the first layer convolution result CoO, And cache the first layer convolution result CoO into the result buffer 206. Since the convolution operation result CoO obtained in the first operation is not the final layer convolution operation result, the result buffer 206 (denoted by OBUF in the figure) returns the first layer convolution operation result CoO to The data buffer 200 of the calculation circuit 20 performs buffering as the initial input data Cil of the second layer convolutional neural network.
  • the calculation circuit 20 reads the initial input data Cil from the data buffer 200 synchronously; the convolution operator 204 according to the initial input data Cil in the corresponding data memory 200 and the corresponding weight in the weight memory 202 Value to perform the convolution operation of the second layer convolutional neural network to obtain the convolution result Col of the second layer, and cache the convolution result Col of the first layer into the result buffer 206.
  • the result buffer 206 returns the convolution operation result Col of the second layer to the data buffer 200 of the calculation circuit 20 for buffering as the initial input data Ci2 of the third layer convolutional neural network.
  • each calculation circuit 20 performs the convolution operation of the final layer convolutional neural network according to the convolution result and the weight value obtained in the penultimate step operation to obtain the final convolution result, and converts the final The convolution result is sent to the data memory 100 in the storage circuit 10 for storage.
  • each computing circuit 20 performs an initial operation in the data buffer 200 during the convolution operation.
  • Input data after performing convolution operations in the current layer of convolutional neural network, multiple first convolution results will be obtained, and the first convolution results with corresponding relationships (for example, the same neuron) are accumulated and obtained Multiple second convolution results. After accumulating all the first convolution results with a corresponding relationship, the initial input data may be deleted. Until the final layer of convolutional neural network completes the convolution operation, the final convolution result is obtained.
  • the convolution result is also referred to as an output feature map.
  • the above embodiment illustrates the data processing process of the neural network processor 1, in which there are three levels of data multiplexing process. Through these three levels of data multiplexing, the parallelism of the neural network processor can be greatly improved, and the power consumption of the entire processor can be effectively reduced.
  • each calculation circuit 20 reads the initial input data and the weight value from the storage circuit 10 for the first time to complete the convolution operation in the first layer of convolutional neural network, so that The same beginning The input data and weight values are first multiplexed in different calculation circuits 20 for the first time.
  • the second level of data multiplexing The result buffer 206 of each calculation circuit 20 can store multiple multiple first convolution results at the same time, and the first convolution result and multiple other first relationships having a corresponding relationship The convolution results are accumulated to realize the second data multiplexing of the same first convolution result in the same calculation circuit 20
  • the third level of data multiplexing all the convolution results (including the intermediate convolution result and the final convolution result) can be all cached in the result buffer 206, if the second convolution result is intermediate ,
  • the result buffer 206 directly returns the intermediate convolution result to the data buffer 200 for buffering and serves as the initial input data of the next layer of convolutional neural network. That is, by accumulating a plurality of corresponding first convolution results to obtain a second convolution result, as the initial input data of the convolutional neural network of the next layer, the convolutional neural network layer to layer is realized.
  • the third data multiplexing is, by accumulating a plurality of corresponding first convolution results to obtain a second convolution result, as the initial input data of the convolutional neural network of the next layer, the convolutional neural network layer to layer is realized.
  • FIG. 1 and FIG. 2 The above three levels of data multiplexing can be shown in FIG. 1 and FIG. 2, the embodiment of the present invention also proposes a fourth level of data multiplexing scheme, through the fourth level of data multiplexing, the operation is further optimized
  • the degree of parallelism improves the computational efficiency and data utilization rate of the convolution operator. Refer to the schematic diagram shown in Figure 3 below for details of the fourth level data multiplexing process.
  • FIG. 3 is a schematic diagram of a process in which the convolution operator uses a certain initial input data to calculate the corresponding convolution result.
  • the left side of Fig. 3 is the convolution kernel
  • the middle part of Fig. 3 is the initial input data
  • the right side of Fig. 3 is the corresponding convolution result.
  • the convolution operator performs a convolution operation according to the initial input data and the weight value in the current layer convolutional neural network to obtain a plurality of first convolution results including:
  • the convolution operator 204 of each calculation circuit 20 uses a 3*3 convolution kernel.
  • the convolution kernel slides from the left to the right of the initial input data, and then from the top to the bottom of the initial input data. During the sliding process, the multiply-accumulate operation is performed to obtain the convolution result of the corresponding position.
  • At least one calculation circuit reads out the initial input data and the weight value from the storage circuit for the first time to perform the first convolution operation, and realizes that the same initial input data and the weight value are different.
  • the first data multiplexing in the calculation circuit by accumulating the first convolution result with multiple other first convolution results with corresponding relationships, the same first convolution result in the same calculation circuit is realized.
  • the second data multiplexing; the second convolution result is obtained by accumulating multiple first convolution results with corresponding relationships, as the initial input data of the next layer of convolutional neural network, and the convolutional neural network layer is realized
  • the third data multiplexing between layers That is, through three times of data multiplexing, the utilization rate of data is increased, the number of data accesses is reduced, thereby increasing the calculation speed of the calculation circuit and reducing the nerve Power consumption of network processors.
  • the first convolution result is deleted, saving the storage space of the storage circuit; in a row of data of the initial input data and the convolution kernel After the convolution operation is completed, deleting the row of data of the initial input data can further save the storage space of the storage circuit.
  • FIG. 4 is a flowchart of a convolutional neural network data multiplexing method according to Embodiment 2 of the present invention.
  • the convolutional neural network data multiplexing method can be applied to mobile electronic devices or fixed electronic devices, the electronic devices are not limited to personal computers, smart phones, tablet computers, desktops or integrated cameras Machine etc.
  • the electronic device stores the initial input data and weight values required by the user to perform the convolution operation in the storage circuit 10, and reads the initial input data and weights from the storage circuit 10 by controlling at least one calculation circuit 20 Value and perform a convolution operation based on the initial input data and the weight value. Since the initial input data and weight values required for performing the convolution operation are stored in the storage circuit 10 collectively, when there are multiple calculation circuits 20, the multiple calculation circuits can simultaneously read the initial value from the storage circuit 10 Enter data and weight values. In this way, multiplexing of initial input data and weight values can be achieved, so as to reduce the number of data accesses and reduce the power consumption of the processor.
  • the convolutional neural network data multiplexing function provided by the method of the present invention may be directly integrated on the electronic device. Or provide the interface of the convolutional neural network data multiplexing function in the form of a software development kit (S oftware Development Kit, SDK), and the electronic device realizes the multiplexing of the convolutional neural network data through the provided interface.
  • a software development kit S oftware Development Kit, SDK
  • the method for convolutional neural network data multiplexing can also be applied to a hardware environment composed of a terminal and a server connected to the terminal through a network.
  • the network includes but is not limited to: wide area network, metropolitan area network or local area network.
  • the image feature extraction method of the embodiment of the present invention may be executed by the server, or It is executed by the terminal, or may be executed jointly by the server and the terminal.
  • terminal or server in this context refers to an intelligent terminal that can perform a predetermined processing procedure such as a numerical operation and/or a logical operation by running a predetermined program or instruction, which may include a processor and a memory, and the processor The execution of the surviving instructions pre-stored in the memory to execute the predetermined processing procedure, or the execution of the predetermined processing procedure by hardware such as ASI C, FPGA, DSP, or a combination of the two.
  • Computer devices include but are not limited to servers, personal computers, laptops, tablets, smartphones, etc.
  • the convolutional neural network data multiplexing method specifically includes the following steps. According to different requirements, the order of the steps in the flowchart may be changed, and some steps may be omitted.
  • S41 storing, by the storage circuit, initial input data and weight values required for convolution operation.
  • the user may configure in advance the initial input data and weight values required for performing the convolution operation, and store them in the electronic device.
  • the electronic device obtains initial input data and weight values required for convolution operation and stores them in the storage circuit 10.
  • the initial input data may be stored in the data memory 100 of the storage circuit 10
  • the initial input data may be an input feature map, which participates in the calculation as the initial data.
  • the weight value may be stored in the weight memory 102 of the storage circuit 10.
  • S42 controlling the at least one calculation circuit to perform a convolution operation according to the initial input data and the weight value in the current layer convolutional neural network to obtain a plurality of first convolution results. After the first convolution results are accumulated, multiple second convolution results are obtained.
  • a plurality of calculation circuits 20 may be provided to form an operation array, and the plurality of calculation circuits 20 simultaneously read the initial input data and weight values required for convolution operation from the storage circuit 10, and Convolution is performed in parallel.
  • the convolutional neural network model uses a fully connected method for operation [0118]
  • the number of calculation circuits 20 can be set according to the actual situation, and the number of calculation circuits required can be comprehensively considered according to the entire calculation amount and the calculation amount that each calculation circuit can process. For example, FIG. 1 shows Two parallel computing circuits 20.
  • each computing circuit 20 is controlled to read the initial input data from the corresponding data memory 100 and buffered to the corresponding data buffer 200, and at the same time, controlling each computing circuit 20 to control the corresponding weight memory
  • the weight value is read in 102 and cached in the corresponding weight buffer 202.
  • the initial input data has two sources: one is that the calculation circuit 20 reads from the data memory 100, and the other is the intermediate convolution result obtained by the calculation circuit 20, and the intermediate convolution The result is returned from the result buffer 206 to the data storage 200 as the initial input data of the next layer of convolutional neural network.
  • Each data buffer 200 can store multiple initial input data simultaneously. In other embodiments, each data buffer 200 can also store only one initial input data. After the initial input data is subjected to a convolution operation to obtain all the convolution results, it can be deleted. All the convolution results are accumulated and then cached as new initial input data in the data buffer 200. Therefore, the data buffer 200 is similar to First In First Out (FIFO) Cache.
  • FIFO First In First Out
  • the convolution operator 204 performs the current layer convolutional neural network according to the initial input data in the data buffer 200 and the weight value in the weight buffer 203 A plurality of first convolution results are obtained by a convolution operation, and a plurality of second convolution results are obtained after accumulating the first convolution results with a corresponding relationship.
  • the convolution operator 204 of the calculation circuit 20 deletes the plurality of first convolution results after accumulating all the first convolution results having a corresponding relationship.
  • S44 determine whether the current layer convolutional neural network is the last layer convolutional neural network.
  • S45 Determine the plurality of second convolution results as intermediate convolution results, and send the intermediate convolution results to the at least one calculation circuit for buffering as the next layer of convolution The initial input data of the neural network.
  • the result buffer 206 determines multiple second convolution results of the current layer as intermediate convolution results And send the intermediate convolution result to the data buffer 200 in the calculation circuit 20 for buffering as the initial input data of the next layer of convolutional neural network.
  • S46 Determine the plurality of second convolution results as the final convolution result, and send the final convolution result to the storage circuit.
  • the result buffer 206 determines a plurality of second convolution results of the current layer as final convolution results, and The final convolution result is sent to the data memory 100 in the storage circuit 10 for storage.
  • the output of the convolutional neural network of the previous layer is usually used as the input of the convolutional neural network of the next layer, that is, the output of the first layer of the convolutional neural network is used as the second layer of convolution
  • the input of the convolutional neural network, the output of the second layer of the convolutional neural network is used as the input of the third layer of the convolutional neural network, and so on, until the last layer of the convolutional neural network outputs the convolution result.
  • the result buffer 206 directly buffers the intermediate convolution result to the data buffer 200 corresponding to the calculation circuit 20 as the initial input data of the next layer of convolutional neural network Perform the convolution operation. If it is the last layer of convolutional neural network, the result buffer 206 sends the final convolution result to the data memory 100 in the storage circuit 10 for storage.
  • the initial input data stored in the storage circuit 10 required for performing the convolution operation is represented by CiO
  • the weight value is represented by Weight
  • the initial input data CiO is stored in the data memory In 100
  • the weight value Weight is stored in the weight memory 102.
  • the storage circuit 10 broadcasts to all calculation circuits 20 (denoted by PE in the figure). Every After receiving the broadcast signal, each computing circuit 20 synchronously reads the initial input data CiO from the data memory 100 and buffers it to the data buffer 200; meanwhile, each computing circuit 20 also synchronizes from the weight memory The weight value Weight is read in 102 and cached in the weight buffer 202.
  • the convolution operator 204 (denoted by MAC in the figure) of each calculation circuit 20 is based on the initial input data CiO in the corresponding data memory 200 (denoted by IBUF in the figure) and the weight value in the corresponding weight memory 202 Weight performs the convolution operation of the first layer convolutional neural network to obtain the first layer convolution result CoO, and caches the first layer convolution result CoO into the result buffer 206.
  • the result buffer 206 (denoted by OBUF in the figure) returns the first layer convolution operation result CoO to
  • the data buffer 200 of the calculation circuit 20 performs buffering as the initial input data Cil of the second layer convolutional neural network.
  • the calculation circuit 20 synchronously reads the initial input data Cil from the data buffer 200; the convolution operator 204 according to the initial input data Cil in the corresponding data memory 200 and the corresponding weight in the weight memory 202 Value to perform the convolution operation of the second layer convolutional neural network to obtain the convolution result Col of the second layer, and cache the convolution result Col of the first layer into the result buffer 206.
  • the result buffer 206 returns the convolution operation result Col of the second layer to the data buffer 200 of the calculation circuit 20 for buffering as the initial input data Ci2 of the third layer convolutional neural network.
  • each calculation circuit 20 performs the convolution operation of the final layer convolutional neural network according to the convolution result and the weight value obtained in the penultimate step operation to obtain the final convolution result, and the final The convolution result is sent to the data memory 100 in the storage circuit 10 for storage.
  • the convolution The result is also called output feature map.
  • the above embodiment illustrates the data processing process of the neural network processor 1, in which there are three levels of data multiplexing process. Through these three layers of data multiplexing, the parallelism of the neural network processor can be greatly improved, and the power consumption of the entire processor can be effectively reduced.
  • each calculation circuit 20 reads the initial input data and the weight value from the storage circuit 10 for the first time to complete the convolution operation in the first layer of convolutional neural network, so that The first data of the same initial input data and weight value in different calculation circuits 20 are multiplexed.
  • Third level data multiplexing all the convolution results (including the intermediate convolution result and the final convolution result) can be all cached in the result buffer 206, if the second convolution result is intermediate , The result buffer 206 directly returns the intermediate convolution result to the data buffer 200 for buffering and serves as the initial input data of the next layer of convolutional neural network. That is, by accumulating a plurality of corresponding first convolution results to obtain a second convolution result, as the initial input data of the convolutional neural network of the next layer, the convolutional neural network layer to layer is realized.
  • the third data multiplexing is, by accumulating a plurality of corresponding first convolution results to obtain a second convolution result, as the initial input data of the convolutional neural network of the next layer, the convolutional neural network layer to layer is realized.
  • FIG. 1 and FIG. 2 The above three levels of data multiplexing can be shown in FIG. 1 and FIG. 2, the embodiment of the present invention also proposes a fourth level of data multiplexing scheme, through the fourth level of data multiplexing, the operation is further optimized
  • the degree of parallelism improves the computational efficiency and data utilization rate of the convolution operator. Refer to the schematic diagram shown in Figure 3 below for details of the fourth level data multiplexing process.
  • the convolution operator performs a convolution operation according to the initial input data and the weight value in the current layer convolutional neural network to obtain a plurality of first convolution results including:
  • the value range of Q is 1 to M
  • M is the total number of lines of the initial input data
  • the value range of L is 1 to N
  • N is the total number of lines of the preset convolution kernel.
  • the convolution operator 204 of each calculation circuit 20 uses a 3*3 convolution kernel.
  • the convolution kernel slides from the left to the right of the initial input data, and then from the top to the bottom of the initial input data. During the sliding process, the multiply-accumulate operation is performed to obtain the convolution result of the corresponding position.
  • the fourth level of data multiplexing In the process of the convolution operator 204 computing a convolution result, the same line of initial input data, for example, the m-th line of data of the initial input data, The number of L* convolution results (L is the number of lines of the convolution kernel) can be multiplexed. That is, by performing a convolution operation on a row of data in the initial input data and the entire convolution kernel, the fourth multiplexing of the row of data on the initial input data is realized.
  • the present invention reads out the initial input data and the weight value from the storage circuit for the first convolution operation through the at least one calculation circuit for the first time, and realizes that the same initial input data and the weight value are different.
  • the first data multiplexing in the calculation circuit by accumulating the first convolution result with multiple other first convolution results with corresponding relationships, the same first convolution result in the same calculation circuit is realized.
  • the second data multiplexing; the second convolution result is obtained by accumulating multiple first convolution results with corresponding relationships, as the initial input data of the next layer of convolutional neural network, and the convolutional neural network layer is realized
  • the third data multiplexing between layers That is, through three times of data multiplexing, the data utilization rate is increased, the number of data accesses is reduced, thereby increasing the calculation speed of the calculation circuit and reducing the power consumption of the neural network processor.
  • the first convolution result is deleted to save the storage space of the storage circuit; a row of data in the initial input data and the convolution kernel After the convolution operation is completed, deleting the row of data of the initial input data can further save the storage space of the storage circuit.
  • FIG. 5 it is a functional block diagram of a preferred embodiment of a convolutional neural network data multiplexing device of the present invention.
  • the convolutional neural network data multiplexing device 50 runs in an electronic device.
  • the convolutional neural network data multiplexing device 50 may include multiple function modules composed of program code segments.
  • the program codes of each program segment in the convolutional neural network data multiplexing device 50 can be stored in the memory of the electronic device and executed by at least one processor to execute (see FIG. 4 for details) The data reuse of the product neural network.
  • the convolutional neural network data multiplexing device 50 may be divided into multiple functional modules according to the functions it performs.
  • the functional module may include: a storage module 501, a convolution operation module 502, a deletion module 503, a judgment module 504, a first determination module 505, and a second determination module 506.
  • the module referred to in the present invention refers to a series of computer program segments that can be executed by at least one processor and can perform fixed functions, and are stored in the memory. In this embodiment, the functions of each module will be described in detail in subsequent embodiments.
  • a storage module 501 configured to store, through the storage circuit, initial input data and weight values required for convolution operation
  • the judgment module 504 is configured to judge whether the current layer convolutional neural network is the last layer convolutional neural network.
  • the first determination module 505 is configured to determine the plurality of second convolution results as intermediate when the determination module 504 determines that the current layer convolutional neural network is not the last layer convolutional neural network And the intermediate convolution result is sent to the at least one calculation circuit for buffering as the initial input data of the convolutional neural network of the next layer;
  • the second determination module 506 is configured to determine the plurality of second convolution results when the determination module 504 determines that the current layer convolutional neural network is the last layer convolutional neural network The final convolution result, and the final convolution result is sent to the storage circuit.
  • the neural network processor provided by the embodiment of the present invention implements the first convolution operation by at least one calculation circuit reading out the initial input data and the weight value from the storage circuit for the first time The first data multiplexing of the same initial input data and weight value in different calculation circuits; A convolution result is accumulated with a plurality of other first convolution results having a corresponding relationship, and the second data multiplexing of the same first convolution result in the same calculation circuit is realized; by multiple corresponding relationships The first convolution result is accumulated and the second convolution result is obtained.
  • the first convolution result is deleted to save the storage space of the calculation circuit; and a row of data and convolution in the initial input data
  • the row of data of the initial input data is deleted, which further saves the storage space of the calculation circuit, thereby effectively reducing the power consumption of the entire neural network processor and improving the calculation circuit. Operational efficiency of convolution operations.
  • the electronic device 6 may further include user equipment.
  • the user equipment includes, but is not limited to, any electronic product that can interact with a user through a keyboard, a mouse, a remote control, a touchpad, or a voice control device.
  • a keyboard e.g., a keyboard, a mouse, a remote control, a touchpad, or a voice control device.
  • a voice control device e.g., Personal computer, tablet, smart phone, digital camera, etc.
  • the electronic device 6 is only an example, and other existing or future electronic products that may be adapted to the present invention should also be included in the scope of protection of the present invention, and by reference Included here.
  • the memory 61 is used to store program codes and various data, such as a convolutional neural network data multiplexing device 50 installed in the electronic device 6, and is running on the electronic device 6. In the process, high-speed and automatic access to programs or data is achieved.
  • the memory 61 includes read-only memory (Rea d-Only Memory, ROM), random access memory (Random Access Memory, RAM), programmable read-only memory (Programmable Read-Only Memory, PROM), and erasable programmable read-only Memory (Erasable Programmable Read-Only Memory, EPROM), One-time Programmable Read-Only Memory (OTPROM), Electronically Erasable Programmable Read-Only Memory (Electrically-Erasable Programmable Read-Only Memory, EEPROM), Compact Disc Read-Only Memory (CD-ROM) or other optical disk storage, magnetic disk storage, magnetic tape storage, or any other medium readable by a computer that can be used to carry or store data.
  • ROM read-only memory
  • RAM Random Access Memory
  • PROM Programmable Read-Only Memory
  • EPROM programmable Read-Only Memory
  • OTPROM One-time Programmable Read-Only Memory
  • OTPROM One-time Programmable Read-Only Memory
  • EEPROM Electronically Era
  • the at least one communication bus 63 is configured to implement one of the memory 61, the at least one processor 62, the display screen 64, the at least one neural network processor 66, etc. Connection communication.
  • the power supply may be logically connected to the at least one processor 62 through a power management system, so that functions such as charging, discharging, and power consumption management are implemented through the power management system.
  • the power supply may also include any component such as one or more DC or AC power supplies, recharging systems, power failure detection circuits, power converters or inverters, and power status indicators.
  • the electronic device 6 may also include various sensors, Bluetooth modules, communication modules, and the like. The invention is not repeated here.
  • the at least one processor 62 may execute the operating system of the electronic device 6 and various installed application programs (such as the convolutional neural network data multiplexing described above) Device 60), program code, etc.
  • the memory 61 stores program codes, and the at least one processor 62 may call the program codes stored in the memory 61 to perform related functions.
  • each module described in FIG. 5 is a program code stored in the memory 61, and is executed by the at least one processor 62, so as to realize The function of each module is described to achieve the purpose of generating a neural network model according to user needs.
  • the module described as a separate component may or may not be physically separated, and the component displayed as a module may or may not be a physical unit, that is, may be in one place, or may be distributed to multiple network units on. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

Abstract

A neural network processor, a convolutional neural network data multiplexing method and apparatus, an electronic device and a storage medium. The neural network processor comprises: a storage circuit (10) for storing initial input data and a weight value required for performing a convolution operation; and at least one computing circuit (20), comprising: a data buffer (200) for buffering the initial input data; a weight buffer (202) for buffering the weight value; a convolution operation unit (204) for performing, according to the initial input data and the weight value, the convolution operation in the current layer of convolutional neural network to obtain a plurality of first convolution results, accumulating the first convolution results having a correlation therebetween to obtain a plurality of second convolution results, while also deleting the plurality of first convolution results; and a result buffer (206) for buffering the plurality of second convolution results as the initial input data of the next layer of convolutional neural network. According to the processor, by means of multiple layers of data multiplexing, the operation speed of the neural network processor is increased, and the power consumption is reduced.

Description

神经网络处理器、 卷积神经网络数据复用方法及相关设 备 Neural network processor, convolutional neural network data multiplexing method and related equipment
技术领域 Technical field
[0001] 本发明涉及人工智能技术领域, 具体涉及一种神经网络处理器、 卷积神经网络 数据复用方法、 卷积神经网络数据复用装置、 电子设备以及存储介质。 [0001] The present invention relates to the technical field of artificial intelligence, and in particular to a neural network processor, a convolutional neural network data multiplexing method, a convolutional neural network data multiplexing device, an electronic device, and a storage medium.
[0002] 本申请要求于 2018年 12月 27日提交中国专利局, 申请号为 201811614780.3、 发 明名称为“神经网络处理器、 卷积神经网络数据复用方法及相关设备”的中国专利 申请的优先权, 其全部内容通过引用结合在本申请中。 [0002] This application requires the Chinese patent application filed on December 27, 2018 with the application number 201811614780.3 and the invention titled "Neural Network Processor, Convolutional Neural Network Data Multiplexing Method and Related Equipment". Rights, the entire contents of which are incorporated by reference in this application.
背景技术 Background technique
[0003] 神经网络处理器中最常用的一种模型为卷积神经网络模型, 然而, 卷积神经网 络模型在进行运算时存在速度慢, 功耗大等一系列问题。 因此, 如何提高神经 网络处理器中卷积神经网络模型的运算速度并降低功耗, 成为当前亟待解决的 技术问题。 [0003] One of the most commonly used models in neural network processors is the convolutional neural network model. However, the convolutional neural network model has a series of problems such as slow speed and large power consumption when performing operations. Therefore, how to improve the operation speed of the convolutional neural network model in the neural network processor and reduce power consumption has become a technical problem to be solved urgently.
发明概述 Summary of the invention
技术问题 technical problem
问题的解决方案 Solution to the problem
技术解决方案 Technical solution
[0004] 鉴于以上内容, 有必要提出一种神经网络处理器、 卷积神经网络数据复用方法 、 卷积神经网络数据复用装置、 电子设备及存储介质, 通过复用数据来提高神 经网络处理器的运算速度, 并降低神经网络处理器的功耗。 [0004] In view of the above, it is necessary to propose a neural network processor, a convolutional neural network data multiplexing method, a convolutional neural network data multiplexing device, an electronic device, and a storage medium to improve neural network processing by multiplexing data The speed of the processor and reduce the power consumption of the neural network processor.
[0005] 本发明的第一方面提供一种神经网络处理器, 所述神经网络处理器包括: [0005] A first aspect of the present invention provides a neural network processor, the neural network processor includes:
[0006] 存储电路, 用于存储进行卷积运算所需的初始输入数据和权重值; [0006] a storage circuit for storing initial input data and weight values required for convolution operation;
[0007] 至少一个计算电路, 用于从所述存储电路中读取所述初始输入数据和所述权重 值, 并基于所述初始输入数据和所述权重值进行卷积运算, 其中, [0007] at least one calculation circuit for reading the initial input data and the weight value from the storage circuit, and performing a convolution operation based on the initial input data and the weight value, wherein,
[0008] 所述至少一个计算电路包括: [0009] 数据缓存器, 用于缓存所述计算电路读取的所述初始输入数据; [0008] The at least one calculation circuit includes: [0009] a data buffer for buffering the initial input data read by the calculation circuit;
[0010] 权重缓存器, 用于缓存所述计算电路读取的所述权重值; [0010] a weight buffer for buffering the weight value read by the calculation circuit;
[0011] 卷积运算器, 用于在当前层卷积神经网络中根据所述初始输入数据及所述权重 值进行卷积运算得到多个第一卷积结果, 并将具有对应关系的所述第一卷积结 果进行累加后得到多个第二卷积结果; 同时, 当对所有具有对应关系的所述第 一卷积结果进行累加后, 删除所述多个第一卷积结果; [0011] A convolution operator is used to perform a convolution operation according to the initial input data and the weight value in the current layer convolutional neural network to obtain a plurality of first convolution results, and the corresponding relationship After accumulating the first convolution results, a plurality of second convolution results are obtained; meanwhile, after accumulating all the first convolution results having a corresponding relationship, the plurality of first convolution results are deleted;
[0012] 结果缓存器, 用于缓存所述多个第二卷积结果, 并根据预设存储规则将所述多 个第二卷积结果发送至所述数据缓存器中, 作为下一层卷积神经网络的所述初 始输入数据; 或者, 发送至所述存储电路中进行存储。 [0012] A result buffer, configured to cache the plurality of second convolution results, and send the plurality of second convolution results to the data buffer according to a preset storage rule as a next-level volume The initial input data of the product neural network; or, sent to the storage circuit for storage.
[0013] 优选的, 所述预设存储规则包括: [0013] Preferably, the preset storage rule includes:
[0014] 当所述当前层卷积神经网络不为最后一层卷积神经网络时, 所述结果缓存器将 所述多个第二卷积结果确定为中间的卷积结果, 并将所述中间的卷积结果发送 至所述数据缓存器; [0014] When the current layer of convolutional neural network is not the last layer of convolutional neural network, the result buffer determines the plurality of second convolution results as intermediate convolution results, and the The intermediate convolution result is sent to the data buffer;
[0015] 当所述当前层卷积神经网络为最后一层卷积神经网络时, 所述结果缓存器将所 述多个第二卷积结果确定为最终的卷积结果, 并将所述最终的卷积结果发送至 所述存储电路中。 [0015] When the current layer of convolutional neural network is the last layer of convolutional neural network, the result buffer determines the plurality of second convolution results as the final convolution result, and the final The convolution result is sent to the storage circuit.
[0016] 优选的, 所述卷积运算器在当前层卷积神经网络中根据所述初始输入数据及所 述权重值进行卷积运算得到多个第一卷积结果包括: [0016] Preferably, the convolution operator performs a convolution operation according to the initial input data and the weight value in the current layer convolutional neural network to obtain a plurality of first convolution results including:
[0017] 将所述初始输入数据的第 Q行数据与预设卷积核的第 L行数据进行卷积运算, 对应得到的数据为第三卷积结果的第 Q-L+1行的子数据; [0017] performing a convolution operation on the Qth line data of the initial input data and the Lth line data of the preset convolution kernel, and the corresponding data is the sub-line of the Q-L+1 line of the third convolution result Data
[0018] 将所有位于第 Q-L+1行的子数据进行累加, 得到第 Q-L+1行的数据; [0018] accumulating all the sub-data in the Q-L+1 line to obtain the data in the Q-L+1 line;
[0019] 根据所述第三卷积结果与所述权重值进行卷积运算得到多个所述第一卷积结果 [0019] performing a convolution operation according to the third convolution result and the weight value to obtain a plurality of the first convolution results
[0020] 其中, Q的取值范围为 1至 M, M为所述初始输入数据的总行数, L的取值范围 为 1至 N, N为所述预设卷积核的总行数。 [0020] Wherein, the value range of Q is 1 to M, M is the total number of lines of the initial input data, the value range of L is 1 to N, and N is the total number of lines of the preset convolution kernel.
[0021] 优选的, 将所述初始输入数据的所述第 Q行数据分别与所述预设卷积核的每行 数据都进行卷积运算, 并当所述第 Q行数据与所述预设卷积核的所有行数据都进 行了卷积运算之后, 对所述初始输入数据的所述第 Q行数据进行删除, 直至将所 述初始输入数据删除完毕。 [0021] Preferably, the data of the Qth line of the initial input data and each line of data of the preset convolution kernel are respectively subjected to a convolution operation, and when the data of the Qth line and the After all the row data of the convolution kernel are subjected to the convolution operation, the data of the Qth row of the initial input data is deleted until all the data The initial input data is deleted.
[0022] 本发明的第二方面提供一种卷积神经网络数据复用方法, 应用于电子设备中, 所述电子设备包括上述的神经网络处理器, 所述方法包括: [0022] A second aspect of the present invention provides a convolutional neural network data multiplexing method, which is applied to an electronic device, where the electronic device includes the neural network processor described above, and the method includes:
[0023] 通过所述存储电路存储进行卷积运算所需的初始输入数据和权重值; [0023] storing the initial input data and the weight value required for the convolution operation through the storage circuit;
[0024] 控制所述至少一个计算电路在当前层卷积神经网络中根据所述初始输入数据及 所述权重值进行卷积运算得到多个第一卷积结果, 将具有对应关系的所述第一 卷积结果进行累加后得到多个第二卷积结果; [0024] controlling the at least one calculation circuit to perform a convolution operation according to the initial input data and the weight value in the current layer convolutional neural network to obtain a plurality of first convolution results, and convert the corresponding After accumulating one convolution result, multiple second convolution results are obtained;
[0025] 当对所有具有对应关系的所述第一卷积结果进行累加后, 控制所述至少一个计 算电路删除所述多个第一卷积结果; [0025] After accumulating all the first convolution results having a corresponding relationship, controlling the at least one calculation circuit to delete the plurality of first convolution results;
[0026] 当所述当前层卷积神经网络不为最后一层卷积神经网络时, 将所述多个第二卷 积结果确定为中间的卷积结果, 并将所述中间的卷积结果发送至所述至少一个 计算电路中进行缓存, 作为下一层卷积神经网络的所述初始输入数据; [0026] When the current layer of convolutional neural network is not the last layer of convolutional neural network, the plurality of second convolution results are determined as intermediate convolution results, and the intermediate convolution results Sent to the at least one calculation circuit for buffering as the initial input data of the convolutional neural network of the next layer;
[0027] 当所述当前层卷积神经网络为所述最后一层卷积神经网络时, 将所述多个第二 卷积结果确定为最终的卷积结果, 并将所述最终的卷积结果发送至所述存储电 路中。 [0027] When the current layer of convolutional neural network is the last layer of convolutional neural network, the plurality of second convolution results are determined as final convolution results, and the final convolution The result is sent to the storage circuit.
[0028] 优选的, 控制所述至少一个计算电路在当前层卷积神经网络中根据所述初始输 入数据及所述权重值进行卷积运算得到多个第一卷积结果包括: [0028] Preferably, controlling the at least one calculation circuit to perform a convolution operation according to the initial input data and the weight value in the current layer convolutional neural network to obtain a plurality of first convolution results includes:
[0029] 将所述初始输入数据的第 Q行数据与预设卷积核的第 L行数据进行卷积运算, 对应得到的数据为第三卷积结果的第 Q-L+1行的子数据; [0029] Perform a convolution operation on the Qth line data of the initial input data and the Lth line data of the preset convolution kernel, and the corresponding data is the sub-line of the Q-L+1 line of the third convolution result Data
[0030] 将所有位于第 Q-L+1行的子数据进行累加, 得到第 Q-L+1行的数据; [0030] accumulate all the sub-data in the Q-L+1 line to obtain the data in the Q-L+1 line;
[0031] 根据所述第三卷积结果与所述权重值进行卷积运算得到多个所述第一卷积结果 [0031] performing a convolution operation according to the third convolution result and the weight value to obtain a plurality of the first convolution results
[0032] 其中, Q的取值范围为 1至 M, M为所述初始输入数据的总行数, L的取值范围 为 1至 N, N为所述预设卷积核的总行数。 [0032] Wherein, the value range of Q is 1 to M, M is the total number of lines of the initial input data, the value range of L is 1 to N, and N is the total number of lines of the preset convolution kernel.
[0033] 优选的, 将所述初始输入数据的所述第 Q行数据分别与所述预设卷积核的每行 数据都进行卷积运算, 并当所述第 Q行数据与所述预设卷积核的所有行数据都进 行了卷积运算之后, 对所述初始输入数据的所述第 Q行数据进行删除, 直至将所 述初始输入数据删除完毕。 [0034] 本发明第三方面提供一种卷积神经网络数据复用装置, 安装于电子设备中, 所 述电子设备包括上述的神经网络处理器, 所述装置包括: [0033] Preferably, the data of the Qth line of the initial input data and each line of data of the preset convolution kernel are respectively subjected to a convolution operation, and when the data of the Qth line and the After all the row data of the convolution kernel are subjected to a convolution operation, the Q-th row data of the initial input data is deleted until the initial input data is deleted. [0034] A third aspect of the present invention provides a convolutional neural network data multiplexing device, installed in an electronic device, the electronic device includes the neural network processor described above, the device includes:
[0035] 存储模块, 用于通过所述存储电路存储进行卷积运算所需的初始输入数据和权 重值; [0035] a storage module for storing, through the storage circuit, initial input data and weight values required for convolution operation;
[0036] 卷积运算模块, 用于控制所述至少一个计算电路在当前层卷积神经网络中根据 所述初始输入数据及所述权重值进行卷积运算得到多个第一卷积结果, 将具有 对应关系的所述第一卷积结果进行累加后得到多个第二卷积结果; [0036] The convolution operation module is configured to control the at least one calculation circuit to perform a convolution operation according to the initial input data and the weight value in the current layer convolutional neural network to obtain a plurality of first convolution results, After accumulating the first convolution results with a corresponding relationship, multiple second convolution results are obtained;
[0037] 删除模块, 用于当对所有具有对应关系的所述第一卷积结果进行累加后, 控制 所述至少一个计算电路删除所述多个第一卷积结果; [0037] a deletion module, configured to control the at least one calculation circuit to delete the plurality of first convolution results after accumulating all the first convolution results having a corresponding relationship;
[0038] 第一确定模块, 用于当所述当前层卷积神经网络不为最后一层卷积神经网络时 , 将所述多个第二卷积结果确定为中间的卷积结果, 并将所述中间的卷积结果 发送至所述至少一个计算电路中进行缓存, 作为下一层卷积神经网络的所述初 始输入数据; [0038] a first determining module, configured to determine the plurality of second convolution results as intermediate convolution results when the current layer convolutional neural network is not the last layer of convolutional neural networks, and Sending the intermediate convolution result to the at least one calculation circuit for buffering as the initial input data of the next layer of convolutional neural network;
[0039] 第二确定模块, 用于当所述当前层卷积神经网络为所述最后一层卷积神经网络 时, 将所述多个第二卷积结果确定为最终的卷积结果, 并将所述最终的卷积结 果发送至所述存储电路中。 [0039] a second determination module, configured to determine the plurality of second convolution results as final convolution results when the current layer convolutional neural network is the last layer convolutional neural network, and Sending the final convolution result to the storage circuit.
[0040] 本发明的第四方面提供一种电子设备, 所述电子设备包括处理器, 所述处理器 用于执行存储器中存储的运算机程序时实现所述卷积神经网络数据复用方法。 [0040] A fourth aspect of the present invention provides an electronic device, the electronic device includes a processor, and the processor is configured to implement the convolutional neural network data multiplexing method when executing an arithmetic program stored in a memory.
[0041] 本发明的第五方面提供一种运算机可读存储介质, 所述运算机可读存储介质上 存储有运算机程序, 所述运算机程序被处理器执行时实现所述卷积神经网络数 据复用方法。 [0041] A fifth aspect of the present invention provides a computing machine readable storage medium, the computing machine readable storage medium stores a computing machine program, and the computing machine program is executed by a processor to implement the convolutional nerve Network data multiplexing method.
[0042] 本发明通过至少一个计算电路第一次从存储电路中读出初始输入数据和权重值 进行第一次卷积运算, 实现了同一个初始输入数据和权重值在不同计算电路中 的第一次数据复用; 通过将第一卷积结果与多个具有对应关系的其他第一卷积 结果进行累加, 实现了同一个第一卷积结果在同一个计算电路中的第二次数据 复用; 通过将多个具有对应关系的第一卷积结果进行累加后得到第二卷积结果 , 作为下一层卷积神经网络的初始输入数据, 实现了卷积神经网络层与层之间 的第三次数据复用。 即, 通过三次数据复用, 提高了数据的利用率, 减少了数 据访问的次数, 从而提高了计算电路的运算速度, 并降低了神经网络处理器的 功耗。 [0042] The present invention reads out the initial input data and the weight value from the storage circuit for the first convolution operation through at least one calculation circuit for the first time, and realizes the same initial input data and weight value in different calculation circuits. One-time data multiplexing; by accumulating the first convolution result with multiple other first convolution results with corresponding relationships, the second data replication of the same first convolution result in the same calculation circuit is realized Use; By accumulating multiple first convolution results with corresponding relationship to obtain the second convolution result, as the initial input data of the next layer of convolutional neural network, to achieve the convolutional neural network between layers The third data multiplexing. That is, through three times of data multiplexing, the data utilization rate is improved and the number is reduced. According to the number of visits, the calculation speed of the calculation circuit is increased, and the power consumption of the neural network processor is reduced.
[0043] 其次, 通过将初始输入数据中的每一行数据与整个卷积核进行卷积运算, 实现 了初始输入数据的每一行数据的第四次复用, 能够进一步地提高数据的利用率 , 减少数据访问的次数, 从而进一步地提高了计算电路的运算速度, 并降低了 神经网络处理器的功耗。 [0043] Secondly, by performing a convolution operation on each row of data in the initial input data and the entire convolution kernel, the fourth multiplexing of each row of data on the initial input data is achieved, which can further improve the data utilization rate, The number of data accesses is reduced, thereby further increasing the calculation speed of the calculation circuit and reducing the power consumption of the neural network processor.
[0044] 再次, 在对所有具有对应关系的所述第一卷积结果进行累加后, 删除第一卷积 结果, 节省了存储电路的存储空间; 在初始输入数据的某行数据与卷积核完成 了卷积运算后, 将初始输入数据的该行数据进行删除, 能够进一步节省存储电 路的存储空间。 [0044] Again, after accumulating all the first convolution results with a corresponding relationship, the first convolution result is deleted, saving the storage space of the storage circuit; at a row of data of the initial input data and the convolution kernel After the convolution operation is completed, deleting the row of data of the initial input data can further save the storage space of the storage circuit.
[0045] 另外, 在有多个计算电路并行运行的情况下, 能够提高并行计算的效率。 [0045] In addition, in the case where a plurality of calculation circuits operate in parallel, the efficiency of parallel calculation can be improved.
发明的有益效果 Beneficial effects of invention
对附图的简要说明 Brief description of the drawings
附图说明 BRIEF DESCRIPTION
[0046] 为了更清楚地说明本发明实施例或现有技术中的技术方案, 下面将对实施例或 5见有技术描述中所需要使用的附图作简单地介绍, 显而易见地, 下面描述中的 附图仅仅是本发明的实施例, 对于本领域普通技术人员来讲, 在不付出创造性 劳动的前提下, 还可以根据提供的附图获得其他的附图。 [0046] In order to more clearly explain the embodiments of the present invention or the technical solutions in the prior art, the following will briefly introduce the drawings needed to be used in the technical description of the embodiment or the fifth embodiment. Obviously, the following description The drawings are only embodiments of the present invention. For those of ordinary skill in the art, without paying any creative labor, other drawings may be obtained according to the provided drawings.
[0047] 图 1是本发明较佳实施例提供的神经网络处理器的示意图。 [0047] FIG. 1 is a schematic diagram of a neural network processor provided by a preferred embodiment of the present invention.
[0048] 图 2是本发明实施例提供的另一神经网络处理器的示意图。 [0048] FIG. 2 is a schematic diagram of another neural network processor provided by an embodiment of the present invention.
[0049] 图 3是本发明较佳实施例提供的进行卷积运算时数据复用的示意图。 [0049] FIG. 3 is a schematic diagram of data multiplexing when performing a convolution operation according to a preferred embodiment of the present invention.
[0050] 图 4是本发明较佳实施例提供的卷积神经网络数据复用方法的流程示意图。 [0050] FIG. 4 is a schematic flowchart of a data multiplexing method of a convolutional neural network according to a preferred embodiment of the present invention.
[0051] 图 5是本发明较佳实施例提供的卷积神经网络数据复用装置的结构图。 [0051] FIG. 5 is a structural diagram of a convolutional neural network data multiplexing device according to a preferred embodiment of the present invention.
[0052] 图 6是本发明较佳实施例提供的电子设备的示意图。 6 is a schematic diagram of an electronic device provided by a preferred embodiment of the present invention.
发明实施例 Invention Example
本发明的实施方式 Embodiments of the invention
[0053] 下面对本申请实施例进行详细介绍。 [0054] 实施例一 [0053] The following describes the embodiments of the present application in detail. [0054] Embodiment One
[0055] 请同时参阅图 1和图 2所示, 为本发明实施例提供的神经网络处理器的示意图。 [0055] Please refer to FIG. 1 and FIG. 2 at the same time, which is a schematic diagram of a neural network processor provided by an embodiment of the present invention.
[0056] 本实施例中, 所述神经网络处理器 1可以包括: 存储电路 10、 至少一个计算电 路 20, 其中, 计算电路 20连接至所述存储电路 10。 所述神经网络处理器 1可以是 可编程逻辑器件, 例如现场可编程逻辑阵列 (Field Programmable Gate [0056] In this embodiment, the neural network processor 1 may include: a storage circuit 10, at least one calculation circuit 20, wherein the calculation circuit 20 is connected to the storage circuit 10. The neural network processor 1 may be a programmable logic device, such as a field programmable logic array (Field Programmable Gate)
Array, FPGA) , 还可以是专用神经网络处理器 (Application Specific Integrated Circuits, ASIC) 。 Array, FPGA), can also be a dedicated neural network processor (Application Specific Integrated Circuits, ASIC).
[0057] 关于计算电路 20的数量可以依据实际情况自行设定, 可根据整个运算量及每一 个计算电路所能处理的运算量综合考虑所需要的计算电路的数量, 例如, 图 1中 显示了两个并列的计算电路 20。 [0057] The number of calculation circuits 20 can be set according to the actual situation, and the number of calculation circuits required can be comprehensively considered according to the entire calculation amount and the calculation amount that each calculation circuit can process. For example, FIG. 1 shows Two parallel computing circuits 20.
[0058] 本实施例中, 所述神经网络处理器 1用于将用户配置的进行卷积运算所需的初 始输入数据和权重值存储于所述存储电路 10中, 通过至少一个计算电路 20从所 述存储电路 10中读取所述初始输入数据和所述权重值并基于所述初始输入数据 和所述权重值进行卷积运算。 [0058] In this embodiment, the neural network processor 1 is used to store the user-configured initial input data and weight values required for convolution operation in the storage circuit 10, through at least one calculation circuit 20 from The storage circuit 10 reads the initial input data and the weight value and performs a convolution operation based on the initial input data and the weight value.
[0059] 由于进行卷积运算所需的初始输入数据和权重值统一存储于所述存储电路 10中 , 当有多个计算电路 20时, 可以同步从所述存储电路 10中读取初始输入数据和 权重值。 如此, 可以实现初始输入数据和权重值的复用, 达到减少数据访问次 数, 降低处理器功耗的目的。 [0059] Since the initial input data and weight values required for performing the convolution operation are collectively stored in the storage circuit 10, when there are a plurality of calculation circuits 20, the initial input data can be read from the storage circuit 10 synchronously And weight values. In this way, the initial input data and the weight value can be multiplexed, so as to reduce the number of data access times and reduce the power consumption of the processor.
[0060] 本实施例中, 所述多个计算电路 20可以组成运算阵列, 多个计算电路 20同步从 所述存储电路 10中读取进行卷积运算所需的初始输入数据和权重值, 并以并行 处理的方式进行卷积运算。 所述卷积神经网络模型采用全连接的方式进行运算 [0060] In this embodiment, the plurality of calculation circuits 20 may form an operation array, and the plurality of calculation circuits 20 simultaneously read from the storage circuit 10 the initial input data and weight values required for convolution operation, and Perform convolution operations in parallel. The convolutional neural network model uses a fully connected method for operation
[0061] 本实施例中, 所述计算电路 20中还可以预先存储有进行卷积运算所需的通道数[0061] In this embodiment, the calculation circuit 20 may also pre-store the number of channels required for convolution operation
、 图片大小等参数。 , Picture size and other parameters.
[0062] 本实施例中, 所述存储电路 10可以包括: 数据存储器 100及权重存储器 102。 [0062] In this embodiment, the storage circuit 10 may include: a data memory 100 and a weight memory 102.
[0063] 所述数据存储器 100用于存储进行卷积运算所需的初始输入数据。 所述初始输 入数据可以为输入特征图, 作为初始数据参与运算。 所述数据存储器 100还可以 用于存储所述至少一个计算电路 20运算得到的最终的卷积结果。 [0064] 所述权重存储器 102用于存储进行卷积运算所需的权重值。 [0063] The data memory 100 is used to store initial input data required for performing convolution operations. The initial input data may be an input feature map, which participates in the calculation as the initial data. The data memory 100 can also be used to store the final convolution result calculated by the at least one calculation circuit 20. [0064] The weight memory 102 is used to store weight values required for performing convolution operations.
[0065] 本实施例中, 所述计算电路 20可以包括: 数据缓存器 200、 权重缓存器 202、 卷 积运算器 204及结果缓存器 206 , 其中, 所述结果缓存器 206还分别连接至所述存 储电路 10中的数据存储器 100和计算电路 20中的数据缓存器 200。 [0065] In this embodiment, the calculation circuit 20 may include: a data buffer 200, a weight buffer 202, a convolution operator 204, and a result buffer 206, wherein the result buffer 206 is also connected to all The data memory 100 in the storage circuit 10 and the data buffer 200 in the calculation circuit 20 are described.
[0066] 所述数据缓存器 200用于缓存所述计算电路 20从所述数据存储器中读取的所述 初始输入数据。 所述初始输入数据有两方面的来源: 一是计算电路 20从所述数 据存储器 100中读取的, 二是计算电路 20运算得到的中间的卷积结果, 所述中间 的卷积结果由所述结果缓存器 206回传至所述数据存储器 200中, 作为下一层卷 积神经网络的初始输入数据。 每一个数据缓存器 200中可以同时存储多个初始输 入数据。 在其他实施例中, 每一个数据缓存器 200中还可以仅能存储一个初始输 入数据。 当所述一个初始输入数据进行卷积运算得到所有的卷积结果之后, 可 以删除掉。 所述所有的卷积结果进行累加后再作为新的初始输入数据缓存入所 述数据缓存器 200中, 因而, 所述数据缓存器 200是一个类似先入先出 (First ln First Out, FIFO) 的缓存器。 [0066] The data buffer 200 is used to buffer the initial input data read by the calculation circuit 20 from the data memory. The initial input data has two sources: one is that the calculation circuit 20 reads from the data memory 100, and the other is the intermediate convolution result obtained by the calculation circuit 20, and the intermediate convolution result is determined by all The result buffer 206 is returned to the data storage 200 as the initial input data of the next layer of convolutional neural network. Each data buffer 200 can store multiple initial input data simultaneously. In other embodiments, each data buffer 200 may only store one initial input data. After performing a convolution operation on the initial input data to obtain all the convolution results, it can be deleted. All the convolution results are accumulated and then cached in the data buffer 200 as new initial input data. Therefore, the data buffer 200 is similar to First In First Out (FIFO). Cache.
[0067] 所述权重缓存器 202用于缓存所述计算电路 20从所述权重存储器 102中读取的所 述权重值。 [0067] The weight buffer 202 is used to cache the weight value read by the calculation circuit 20 from the weight memory 102.
[0068] 所述卷积运算器 204用于根据所述数据缓存器 200中的所述初始输入数据及所述 权重缓存器 203中的所述权重值, 在当前层卷积神经网络中进行卷积运算得到多 个第一卷积结果, 并将具有对应关系的所述第一卷积结果进行累加后得到多个 第二卷积结果; 同时, 当对所有具有对应关系的所述第一卷积结果进行累加后 , 删除所述多个第一卷积结果。 [0068] The convolution operator 204 is configured to perform convolution in the current layer convolutional neural network according to the initial input data in the data buffer 200 and the weight value in the weight buffer 203 Multiple first convolution results are obtained by product operation, and a plurality of second convolution results are obtained by accumulating the first convolution results having corresponding relationships; at the same time, when all the first convolution results having corresponding relationships After the accumulation results are accumulated, the multiple first convolution results are deleted.
[0069] 关于卷积运算器 204进行卷积运算的过程参见图 3及其相关描述。 [0069] For the process of the convolution operation performed by the convolution operator 204, refer to FIG. 3 and its related description.
[0070] 所述结果缓存器 206用于缓存所述多个第二卷积结果, 并根据预设存储规则将 所述多个第二卷积结果发送至所述数据缓存器 200中, 作为下一层卷积神经网络 的所述初始输入数据; 或者, 发送至所述存储电路 10中进行存储。 不同的计算 电路 20使用相同的初始输入数据进行卷积运算得到的卷积结果不同。 因而, 不 同的计算电路 20的结果缓存器 206中存放的是不同的卷积结果。 每一个计算电路 20的结果缓存器 206中也可以同时存放多个卷积结果。 [0071] 本实施例中, 所述预设存储规则为预先设置的存储规则, 所述预设存储规则可 以包括: [0070] The result buffer 206 is used to cache the plurality of second convolution results and send the plurality of second convolution results to the data buffer 200 according to a preset storage rule as the following The initial input data of a layer of convolutional neural network; or, sent to the storage circuit 10 for storage. Different calculation circuits 20 use the same initial input data to perform convolution operations to obtain different convolution results. Therefore, different convolution results are stored in the result buffer 206 of different calculation circuits 20. The result buffer 206 of each calculation circuit 20 may also store multiple convolution results at the same time. [0071] In this embodiment, the preset storage rule is a preset storage rule, and the preset storage rule may include:
[0072] 当所述当前层卷积神经网络不为最后一层卷积神经网络时, 所述结果缓存器 20 6将所述多个第二卷积结果确定为中间的卷积结果, 并将所述中间的卷积结果发 送至所述数据缓存器 200; [0072] When the current layer of convolutional neural network is not the last layer of convolutional neural network, the result buffer 206 determines the plurality of second convolution results as intermediate convolution results, and The intermediate convolution result is sent to the data buffer 200;
[0073] 当所述当前层卷积神经网络为最后一层卷积神经网络时, 所述结果缓存器 206 将所述多个第二卷积结果确定为最终的卷积结果, 并将所述最终的卷积结果发 送至所述存储电路 10中。 [0073] When the current layer of convolutional neural network is the last layer of convolutional neural network, the result buffer 206 determines the plurality of second convolution results as the final convolution result, and the The final convolution result is sent to the storage circuit 10.
[0074] 由于在进行卷积运算的过程中, 通常是上一层卷积神经网络的输出作为下一层 卷积神经网络的输入, 即第一层卷积神经网络的输出作为第二层卷积神经网络 的输入, 第二层卷积神经网络的输出作为第三层卷积神经网络的输入, 以此类 推, 直到最后一层卷积神经网络输出卷积结果为止。 如果不是最后一层卷积神 经网络, 所述结果缓存器 206将中间的卷积结果直接缓存到对应所述计算电路 20 的数据缓存器 200中, 作为下一层卷积神经网络的初始输入数据进行卷积运算。 如果是最后一层卷积神经网络, 所述结果缓存器 206将最终的卷积结果发送至存 储电路 10中的数据存储器 100中进行存储。 [0074] In the process of performing the convolution operation, the output of the convolutional neural network of the previous layer is usually used as the input of the convolutional neural network of the next layer, that is, the output of the first layer of the convolutional neural network is used as the second layer of convolution The input of the convolutional neural network, the output of the second layer of the convolutional neural network is used as the input of the third layer of the convolutional neural network, and so on, until the last layer of the convolutional neural network outputs the convolution result. If it is not the last layer of convolutional neural network, the result buffer 206 directly buffers the intermediate convolution result to the data buffer 200 corresponding to the calculation circuit 20 as the initial input data of the next layer of convolutional neural network Perform the convolution operation. If it is the last layer of convolutional neural network, the result buffer 206 sends the final convolution result to the data memory 100 in the storage circuit 10 for storage.
[0075] 下面结合图 2所示的示意图举例说明本发明实施例提供的所述神经网络处理器 1 的数据处理过程。 [0075] The following describes the data processing procedure of the neural network processor 1 provided by an embodiment of the present invention with reference to the schematic diagram shown in FIG. 2.
[0076] 示例性的, 假设存储电路 10中存储的进行卷积运算所需的所述初始输入数据以 CiO表示, 权重值以 Weight表示, 其中, 所述初始输入数据 CiO存储于所述数据存 储器 100中, 所述权重值 Weight存储于所述权重存储器 102中。 [0076] Exemplarily, it is assumed that the initial input data stored in the storage circuit 10 required for convolution operation is represented by CiO, and the weight value is represented by Weight, where the initial input data CiO is stored in the data memory In 100, the weight value Weight is stored in the weight memory 102.
[0077] 第一步, 所述存储电路 10向所有计算电路 20 (图中用 PE表示) 进行广播。 每一 个计算电路 20接收到广播信号后, 同步从所述数据存储器 100中读取所述初始输 入数据 CiO并缓存至所述数据缓存器 200; 同时, 每一个计算电路 20还同步从所述 权重存储器 102中读取所述权重值 Weight并缓存至所述权重缓存器 202中。 [0077] In the first step, the storage circuit 10 broadcasts to all calculation circuits 20 (denoted by PE in the figure). After each computing circuit 20 receives the broadcast signal, it synchronously reads the initial input data CiO from the data memory 100 and buffers it to the data buffer 200; meanwhile, each computing circuit 20 also synchronizes the weights The weight value Weight is read from the memory 102 and cached in the weight buffer 202.
[0078] 每一个计算电路 20的卷积运算器 204 (图中用 MAC表示) 根据对应的数据存储 器 200 (图中用 IBUF表示) 中的初始输入数据 CiO及对应的权重存储器 202中的权 重值 Weight进行第一层卷积神经网络的卷积运算, 得到第一层的卷积结果 CoO, 并将所述第一层的卷积结果 CoO缓存至所述结果缓存器 206中。 由于第一步运算 得到的卷积运算结果 CoO并不是最后层的卷积运算结果, 因而, 结果缓存器 206 (图中用 OBUF表示) 将所述第一层的卷积运算结果 CoO回传至所述计算电路 20 的数据缓存器 200中进行缓存, 作为第二层卷积神经网络的初始输入数据 Cil。 [0078] The convolution operator 204 (denoted by MAC in the figure) of each calculation circuit 20 is based on the initial input data CiO in the corresponding data memory 200 (denoted by IBUF in the figure) and the weight value in the corresponding weight memory 202 Weight performs the convolution operation of the first layer convolutional neural network to obtain the first layer convolution result CoO, And cache the first layer convolution result CoO into the result buffer 206. Since the convolution operation result CoO obtained in the first operation is not the final layer convolution operation result, the result buffer 206 (denoted by OBUF in the figure) returns the first layer convolution operation result CoO to The data buffer 200 of the calculation circuit 20 performs buffering as the initial input data Cil of the second layer convolutional neural network.
[0079] 第二步, 计算电路 20同步从数据缓存器 200中读取初始输入数据 Cil ; 卷积运算 器 204根据对应的数据存储器 200中的初始输入数据 Cil及对应的权重存储器 202中 的权重值进行第二层卷积神经网络的卷积运算, 得到第二层的卷积结果 Col, 并 将所述第一层的卷积结果 Col缓存至所述结果缓存器 206中。 结果缓存器 206将所 述第二层的卷积运算结果 Col回传至所述计算电路 20的数据缓存器 200中进行缓 存, 作为第三层卷积神经网络的初始输入数据 Ci2。 [0079] In the second step, the calculation circuit 20 reads the initial input data Cil from the data buffer 200 synchronously; the convolution operator 204 according to the initial input data Cil in the corresponding data memory 200 and the corresponding weight in the weight memory 202 Value to perform the convolution operation of the second layer convolutional neural network to obtain the convolution result Col of the second layer, and cache the convolution result Col of the first layer into the result buffer 206. The result buffer 206 returns the convolution operation result Col of the second layer to the data buffer 200 of the calculation circuit 20 for buffering as the initial input data Ci2 of the third layer convolutional neural network.
[0080] 以此类推。 [0080] and so on.
[0081] 最后一步, 每一个计算电路 20根据倒数第二步运算得到的卷积结果和权重值进 行最后层卷积神经网络的卷积运算, 得到最终的卷积结果, 并将所述最终的卷 积结果发送至所述存储电路 10中的数据存储器 100中进行存储。 [0081] In the last step, each calculation circuit 20 performs the convolution operation of the final layer convolutional neural network according to the convolution result and the weight value obtained in the penultimate step operation to obtain the final convolution result, and converts the final The convolution result is sent to the data memory 100 in the storage circuit 10 for storage.
[0082] 需要说明的是, 由于本实施例中的卷积神经网络模型采用的是全连接的方式, 则每一个计算电路 20在进行卷积运算的过程中, 数据缓存器 200中的一个初始输 入数据, 在当前层卷积神经网络中进行卷积运算后, 会得到多个第一卷积结果 , 具有对应关系 (例如, 同一个神经元) 的所述第一卷积结果进行累加后得到 多个第二卷积结果。 当对所有具有对应关系的所述第一卷积结果进行累加后, 该初始输入数据可以被删掉。 直到最后一层卷积神经网络完成卷积运算之后, 就得到了最终的卷积结果。 [0082] It should be noted that, since the convolutional neural network model in this embodiment adopts a fully connected mode, each computing circuit 20 performs an initial operation in the data buffer 200 during the convolution operation. Input data, after performing convolution operations in the current layer of convolutional neural network, multiple first convolution results will be obtained, and the first convolution results with corresponding relationships (for example, the same neuron) are accumulated and obtained Multiple second convolution results. After accumulating all the first convolution results with a corresponding relationship, the initial input data may be deleted. Until the final layer of convolutional neural network completes the convolution operation, the final convolution result is obtained.
[0083] 本实施例中, 为便于与作为输入特征图的所述初始输入数据相对应, 所述卷积 结果也称之为输出特征图。 上述实施例阐述了所述神经网络处理器 1的数据处理 过程, 其中有三个层次的数据复用过程。 通过这三个层次的数据复用可以大大 提升神经网络处理器的运算并行度, 有效的降低整个处理器的功耗。 [0083] In this embodiment, to facilitate correspondence with the initial input data as the input feature map, the convolution result is also referred to as an output feature map. The above embodiment illustrates the data processing process of the neural network processor 1, in which there are three levels of data multiplexing process. Through these three levels of data multiplexing, the parallelism of the neural network processor can be greatly improved, and the power consumption of the entire processor can be effectively reduced.
[0084] 下面具体说明三个层次的数据复用: [0084] The following specifically describes three levels of data multiplexing:
[0085] 第一层次数据复用: 每个计算电路 20第一次同步从存储电路 10中读取初始输入 数据和权重值, 完成第一层卷积神经网络中的卷积运算, 如此实现了同一个初 始输入数据和权重值在不同计算电路 20中的第一次数据复用。 [0085] The first level of data multiplexing: each calculation circuit 20 reads the initial input data and the weight value from the storage circuit 10 for the first time to complete the convolution operation in the first layer of convolutional neural network, so that The same beginning The input data and weight values are first multiplexed in different calculation circuits 20 for the first time.
[0086] 第二层次数据复用: 每个计算电路 20的结果缓存器 206中可以同时存放多个多 个第一卷积结果, 将第一卷积结果与多个具有对应关系的其他第一卷积结果进 行累加, 实现了同一个第一卷积结果在同一个计算电路 20中的第二次数据复用 [0086] The second level of data multiplexing: The result buffer 206 of each calculation circuit 20 can store multiple multiple first convolution results at the same time, and the first convolution result and multiple other first relationships having a corresponding relationship The convolution results are accumulated to realize the second data multiplexing of the same first convolution result in the same calculation circuit 20
[0087] 第三层次数据复用: 所述所有卷积结果 (包括中间的卷积结果和最终的卷积结 果) 可以全部缓存于所述结果缓存器 206中, 如果第二卷积结果为中间的卷积结 果, 则结果缓存器 206将中间的卷积结果直接回传到所述数据缓存器 200中进行 缓存且作为下一层卷积神经网络的初始输入数据。 即, 通过将多个具有对应关 系的第一卷积结果进行累加后得到第二卷积结果, 作为下一层卷积神经网络的 初始输入数据, 实现了卷积神经网络层与层之间的第三次数据复用。 [0087] The third level of data multiplexing: all the convolution results (including the intermediate convolution result and the final convolution result) can be all cached in the result buffer 206, if the second convolution result is intermediate , The result buffer 206 directly returns the intermediate convolution result to the data buffer 200 for buffering and serves as the initial input data of the next layer of convolutional neural network. That is, by accumulating a plurality of corresponding first convolution results to obtain a second convolution result, as the initial input data of the convolutional neural network of the next layer, the convolutional neural network layer to layer is realized. The third data multiplexing.
[0088] 上述三个层次的数据复用可以从图 1和图 2中体现出, 本发明实施例还提出了第 四层次数据复用的方案, 通过第四层次数据复用, 进一步优化了运算的并行度 、 提高了卷积运算器的运算效率和数据利用率。 第四层次数据复用的过程详情 参见如下图 3所示的示意图。 [0088] The above three levels of data multiplexing can be shown in FIG. 1 and FIG. 2, the embodiment of the present invention also proposes a fourth level of data multiplexing scheme, through the fourth level of data multiplexing, the operation is further optimized The degree of parallelism improves the computational efficiency and data utilization rate of the convolution operator. Refer to the schematic diagram shown in Figure 3 below for details of the fourth level data multiplexing process.
[0089] 图 3为卷积运算器使用某个初始输入数据运算对应的卷积结果的过程的示意图 。 图 3左边的为卷积核, 图 3中间的为初始输入数据, 图 3右边的为对应得到的卷 积结果。 [0089] FIG. 3 is a schematic diagram of a process in which the convolution operator uses a certain initial input data to calculate the corresponding convolution result. The left side of Fig. 3 is the convolution kernel, the middle part of Fig. 3 is the initial input data, and the right side of Fig. 3 is the corresponding convolution result.
[0090] 所述卷积运算器在当前层卷积神经网络中根据所述初始输入数据及所述权重值 进行卷积运算得到多个第一卷积结果包括: [0090] The convolution operator performs a convolution operation according to the initial input data and the weight value in the current layer convolutional neural network to obtain a plurality of first convolution results including:
[0091] 将所述初始输入数据的第 Q行数据与预设卷积核的第 L行数据进行卷积运算, 对应得到的数据为第三卷积结果的第 Q-L+1行的子数据; [0091] Perform a convolution operation on the Qth line data of the initial input data and the Lth line data of the preset convolution kernel, and the corresponding obtained data is the child of the Q-L+1 line of the third convolution result Data
[0092] 将所有位于第 Q-L+1行的子数据进行累加, 得到第 Q-L+1行的数据; [0092] accumulate all the sub-data in the Q-L+1 line to obtain the data in the Q-L+1 line;
[0093] 根据所述第三卷积结果与所述权重值进行卷积运算得到多个所述第一卷积结果 [0093] performing a convolution operation according to the third convolution result and the weight value to obtain a plurality of the first convolution results
[0094] 其中, Q的取值范围为 1至 M, M为所述初始输入数据的总行数, L的取值范围 为 1至 N, N为所述预设卷积核的总行数。 [0094] Wherein, the value range of Q is 1 to M, M is the total number of lines of the initial input data, the value range of L is 1 to N, and N is the total number of lines of the preset convolution kernel.
[0095] 将所述初始输入数据的所述第 Q行数据分别与所述预设卷积核的每行数据都进 行卷积运算, 并当所述第 Q行数据与所述预设卷积核的所有行数据都进行了卷积 运算之后, 对所述初始输入数据的所述第 Q行数据进行删除, 直至将所述初始输 入数据删除完毕。 [0095] The data of the Qth line of the initial input data and each line of data of the preset convolution kernel Row convolution operation, and after the row Q data and all row data of the preset convolution kernel have been convolved, delete the row Q data of the initial input data until The initial input data is deleted.
[0096] 示例性的, 每一个计算电路 20的卷积运算器 204采用 3*3的卷积核。 卷积核依次 从初始输入数据的左边滑到右边, 从初始输入数据的上边滑到下边, 在滑动的 过程中进行累乘加运算, 得到对应位置的卷积结果。 [0096] Exemplarily, the convolution operator 204 of each calculation circuit 20 uses a 3*3 convolution kernel. The convolution kernel slides from the left to the right of the initial input data, and then from the top to the bottom of the initial input data. During the sliding process, the multiply-accumulate operation is performed to obtain the convolution result of the corresponding position.
[0097] 当所述卷积核滑到如图 3所示的位置 1 (即卷积核滑到初始输入数据的第 m-2行 、 第 m-1行和第 m行) 的时候, 其中卷积核中的 w6、 w7、 w8需要与第 m行的数据 进行卷积运算, 得到的数据对应的是卷积结果的第 m-2的结果。 [0097] When the convolution kernel slides to the position 1 shown in FIG. 3 (that is, the convolution kernel slides to the m-2th, m-1, and mth rows of the initial input data), where The w6, w7, and w8 in the convolution kernel need to perform a convolution operation on the data of the mth row, and the obtained data corresponds to the m-2th result of the convolution result.
[0098] 当所述卷积核滑到如图 3所示的位置 2 (即卷积核滑到初始输入数据的第 m-1行 、 第 m行和第 m+1行) 的时候, 其中卷积核中的 w3、 w4、 w6需要与第 m行的数 据进行卷积运算, 得到的数据对应的是卷积结果的第 m-1行的结果。 [0098] When the convolution kernel slides to the position 2 shown in FIG. 3 (that is, the convolution kernel slides to the m-1th, mth, and m+1th rows of the initial input data), where The w3, w4, and w6 in the convolution kernel need to perform a convolution operation on the data of the mth row, and the obtained data corresponds to the result of the m-1th row of the convolution result.
[0099] 当所述卷积核滑到如图 3所示的位置 3 (即卷积核滑到初始输入数据的第 m行、 第 m+1行和第 m+2行的时候, 其中卷积核中的 wl、 w2、 w3需要与第 m行的数据 进行卷积运算, 得到的数据对应的是卷积结果的第 m行的结果。 [0099] When the convolution kernel slides to position 3 as shown in FIG. 3 (ie, the convolution kernel slides to the mth, m+1, and m+2th rows of the initial input data, where the volume The wl, w2, and w3 in the product kernel need to be convoluted with the data of the mth row, and the obtained data corresponds to the result of the mth row of the convolution result.
[0100] 由上述可以看到, 第四层次数据复用: 在所述卷积运算器 204运算一个卷积结 果过程中, 初始输入数据的同一行, 例如, 初始输入数据的第 m行数据, 可以进 行 L*卷积结果的个数 (L为卷积核的行数) 次复用。 即, 通过将初始输入数据中 的一行数据与整个卷积核进行卷积运算, 实现了初始输入数据的一行数据的第 四次复用。 [0100] As can be seen from the above, the fourth level of data multiplexing: In the process of computing a convolution result by the convolution operator 204, the same line of initial input data, for example, the m-th line of data of the initial input data, The number of L* convolution results (L is the number of lines of the convolution kernel) can be multiplexed. That is, by performing a convolution operation on a row of data in the initial input data and the entire convolution kernel, the fourth multiplexing of the row of data on the initial input data is realized.
[0101] 综上所述, 本本发明通过至少一个计算电路第一次从存储电路中读出初始输入 数据和权重值进行第一次卷积运算, 实现了同一个初始输入数据和权重值在不 同计算电路中的第一次数据复用; 通过将第一卷积结果与多个具有对应关系的 其他第一卷积结果进行累加, 实现了同一个第一卷积结果在同一个计算电路中 的第二次数据复用; 通过将多个具有对应关系的第一卷积结果进行累加后得到 第二卷积结果, 作为下一层卷积神经网络的初始输入数据, 实现了卷积神经网 络层与层之间的第三次数据复用。 即, 通过三次数据复用, 提高了数据的利用 率, 减少了数据访问的次数, 从而提高了计算电路的运算速度, 并降低了神经 网络处理器的功耗。 [0101] In summary, in the present invention, at least one calculation circuit reads out the initial input data and the weight value from the storage circuit for the first time to perform the first convolution operation, and realizes that the same initial input data and the weight value are different. The first data multiplexing in the calculation circuit; by accumulating the first convolution result with multiple other first convolution results with corresponding relationships, the same first convolution result in the same calculation circuit is realized The second data multiplexing; the second convolution result is obtained by accumulating multiple first convolution results with corresponding relationships, as the initial input data of the next layer of convolutional neural network, and the convolutional neural network layer is realized The third data multiplexing between layers. That is, through three times of data multiplexing, the utilization rate of data is increased, the number of data accesses is reduced, thereby increasing the calculation speed of the calculation circuit and reducing the nerve Power consumption of network processors.
[0102] 其次, 通过将初始输入数据中的每一行数据与整个卷积核进行卷积运算, 实现 了初始输入数据的每一行数据的第四次复用, 能够进一步地提高数据的利用率 , 减少数据访问的次数, 从而进一步地提高了计算电路的运算速度, 并降低了 神经网络处理器的功耗。 [0102] Secondly, by performing a convolution operation on each row of data in the initial input data and the entire convolution kernel, the fourth multiplexing of each row of data on the initial input data is achieved, which can further improve the data utilization rate, The number of data accesses is reduced, thereby further increasing the calculation speed of the calculation circuit and reducing the power consumption of the neural network processor.
[0103] 再次, 在对所有具有对应关系的所述第一卷积结果进行累加后, 删除第一卷积 结果, 节省了存储电路的存储空间; 在初始输入数据的某行数据与卷积核完成 了卷积运算后, 将初始输入数据的该行数据进行删除, 能够进一步节省存储电 路的存储空间。 [0103] Again, after accumulating all the first convolution results with a corresponding relationship, the first convolution result is deleted, saving the storage space of the storage circuit; in a row of data of the initial input data and the convolution kernel After the convolution operation is completed, deleting the row of data of the initial input data can further save the storage space of the storage circuit.
[0104] 另外, 在有多个计算电路并行运行的情况下, 能够提高并行计算的效率。 [0104] In addition, in the case where a plurality of calculation circuits operate in parallel, the efficiency of parallel calculation can be improved.
[0105] 实施例二 Embodiment 2
[0106] 图 4是本发明实施例二提供的卷积神经网络数据复用方法的流程图。 [0106] FIG. 4 is a flowchart of a convolutional neural network data multiplexing method according to Embodiment 2 of the present invention.
[0107] 所述卷积神经网络数据复用方法可以应用于移动电子设备或者固定电子设备中 , 所述电子设备并不限定于个人电脑、 智能手机、 平板电脑、 安装有摄像头的 台式机或一体机等。 电子设备将用户配置的进行卷积运算所需的初始输入数据 和权重值存储于存储电路 10中, 通过控制至少一个计算电路 20从所述存储电路 1 0中读取所述初始输入数据和权重值并基于所述初始输入数据和所述权重值进行 卷积运算。 由于进行卷积运算所需的初始输入数据和权重值统一存储于所述存 储电路 10中, 当有多个计算电路 20时, 多个计算电路可以同步从所述存储电路 1 0中读取初始输入数据和权重值。 如此, 可以实现初始输入数据和权重值的复用 , 达到减少数据访问次数, 降低处理器功耗的目的。 [0107] The convolutional neural network data multiplexing method can be applied to mobile electronic devices or fixed electronic devices, the electronic devices are not limited to personal computers, smart phones, tablet computers, desktops or integrated cameras Machine etc. The electronic device stores the initial input data and weight values required by the user to perform the convolution operation in the storage circuit 10, and reads the initial input data and weights from the storage circuit 10 by controlling at least one calculation circuit 20 Value and perform a convolution operation based on the initial input data and the weight value. Since the initial input data and weight values required for performing the convolution operation are stored in the storage circuit 10 collectively, when there are multiple calculation circuits 20, the multiple calculation circuits can simultaneously read the initial value from the storage circuit 10 Enter data and weight values. In this way, multiplexing of initial input data and weight values can be achieved, so as to reduce the number of data accesses and reduce the power consumption of the processor.
[0108] 对于需要进行卷积神经网络数据复用的电子设备, 可以直接在电子设备上集成 本发明的方法所提供的卷积神经网络数据复用功能。 或者以软件开发工具包 (S oftware Development Kit, SDK) 的形式提供卷积神经网络数据复用的功能的接 口, 电子设备通过所提供的接口实现卷积神经网络数据的复用。 [0108] For electronic devices that require convolutional neural network data multiplexing, the convolutional neural network data multiplexing function provided by the method of the present invention may be directly integrated on the electronic device. Or provide the interface of the convolutional neural network data multiplexing function in the form of a software development kit (S oftware Development Kit, SDK), and the electronic device realizes the multiplexing of the convolutional neural network data through the provided interface.
[0109] 所述卷积神经网络数据复用方法方法也可以应用于由终端和通过网络与所述终 端进行连接的服务器所构成的硬件环境中。 网络包括但不限于: 广域网、 城域 网或局域网。 本发明实施例的图像特征提取方法可以由服务器来执行, 也可以 由终端来执行, 还可以是由服务器和终端共同执行。 [0109] The method for convolutional neural network data multiplexing can also be applied to a hardware environment composed of a terminal and a server connected to the terminal through a network. The network includes but is not limited to: wide area network, metropolitan area network or local area network. The image feature extraction method of the embodiment of the present invention may be executed by the server, or It is executed by the terminal, or may be executed jointly by the server and the terminal.
[0110] 在上下文中所称的终端或者服务器, 是指可以通过运行预定程序或指令来执行 数值运算和 /或逻辑运算等预定处理过程的智能终端, 其可以包括处理器与存储 器, 由处理器执行在存储器中预存的存续指令来执行预定处理过程, 或是由 ASI C、 FPGA、 DSP等硬件执行预定处理过程, 或是由上述二者组合来实现。 运算 机设备包括但不限于服务器、 个人电脑、 笔记本电脑、 平板电脑、 智能手机等 [0110] The term “terminal or server” in this context refers to an intelligent terminal that can perform a predetermined processing procedure such as a numerical operation and/or a logical operation by running a predetermined program or instruction, which may include a processor and a memory, and the processor The execution of the surviving instructions pre-stored in the memory to execute the predetermined processing procedure, or the execution of the predetermined processing procedure by hardware such as ASI C, FPGA, DSP, or a combination of the two. Computer devices include but are not limited to servers, personal computers, laptops, tablets, smartphones, etc.
[0111] 后面所讨论的方法 (其中一些通过流程图示出) 可以通过硬件、 软件、 固件、 中间件、 微代码、 硬件描述语言或者其任意组合来实施。 当用软件、 固件、 中 间件或微代码来实施时, 用以实施必要任务的程序代码或代码段可以被存储在 机器或运算机可读介质 (比如存储介质) 中。 一个或多个处理器可以实施必要 的任务。 [0111] The methods discussed below (some of which are shown by flowcharts) may be implemented by hardware, software, firmware, middleware, microcode, hardware description language, or any combination thereof. When implemented in software, firmware, middleware, or microcode, the program code or code segments used to perform the necessary tasks may be stored in a machine- or computer-readable medium (such as a storage medium). One or more processors can perform the necessary tasks.
[0112] 如图 4所示, 所述卷积神经网络数据复用方法具体包括以下步骤, 根据不同的 需求, 该流程图中步骤的顺序可以改变, 某些步骤可以省略。 [0112] As shown in FIG. 4, the convolutional neural network data multiplexing method specifically includes the following steps. According to different requirements, the order of the steps in the flowchart may be changed, and some steps may be omitted.
[0113] S41: 通过所述存储电路存储进行卷积运算所需的初始输入数据和权重值。 [0113] S41: storing, by the storage circuit, initial input data and weight values required for convolution operation.
[0114] 本实施例中, 用户可以事先配置进行卷积运算所需的初始输入数据及权重值, 并存储于电子设备中。 [0114] In this embodiment, the user may configure in advance the initial input data and weight values required for performing the convolution operation, and store them in the electronic device.
[0115] 所述电子设备获取进行卷积运算所需的初始输入数据及权重值后存储于存储电 路 10中。 其中, 所述初始输入数据可以存储于所述存储电路 10的数据存储器 100 中, 所述初始输入数据可以为输入特征图, 作为初始数据参与运算。 所述权重 值可以存储于所述存储电路 10的权重存储器 102中。 [0115] The electronic device obtains initial input data and weight values required for convolution operation and stores them in the storage circuit 10. Wherein, the initial input data may be stored in the data memory 100 of the storage circuit 10, and the initial input data may be an input feature map, which participates in the calculation as the initial data. The weight value may be stored in the weight memory 102 of the storage circuit 10.
[0116] S42: 控制所述至少一个计算电路在当前层卷积神经网络中根据所述初始输入 数据及所述权重值进行卷积运算得到多个第一卷积结果, 将具有对应关系的所 述第一卷积结果进行累加后得到多个第二卷积结果。 [0116] S42: controlling the at least one calculation circuit to perform a convolution operation according to the initial input data and the weight value in the current layer convolutional neural network to obtain a plurality of first convolution results. After the first convolution results are accumulated, multiple second convolution results are obtained.
[0117] 本实施例中, 可以设置多个计算电路 20组成运算阵列, 多个计算电路 20同步从 所述存储电路 10中读取进行卷积运算所需的初始输入数据和权重值, 并以并行 处理的方式进行卷积运算。 所述卷积神经网络模型采用全连接的方式进行运算 [0118] 关于计算电路 20的数量可以依据实际情况自行设定, 可根据整个运算量及每一 个计算电路所能处理的运算量综合考虑所需要的计算电路的数量, 例如, 图 1中 显示了两个并列的计算电路 20。 [0117] In this embodiment, a plurality of calculation circuits 20 may be provided to form an operation array, and the plurality of calculation circuits 20 simultaneously read the initial input data and weight values required for convolution operation from the storage circuit 10, and Convolution is performed in parallel. The convolutional neural network model uses a fully connected method for operation [0118] The number of calculation circuits 20 can be set according to the actual situation, and the number of calculation circuits required can be comprehensively considered according to the entire calculation amount and the calculation amount that each calculation circuit can process. For example, FIG. 1 shows Two parallel computing circuits 20.
[0119] 具体的, 控制每一个计算电路 20从对应的数据存储器 100中读取所述初始输入 数据并缓存至对应的数据缓存器 200中, 同时, 控制每一个计算电路 20从对应的 权重存储器 102中读取所述权重值并缓存至对应的权重缓存器 202中。 [0119] Specifically, each computing circuit 20 is controlled to read the initial input data from the corresponding data memory 100 and buffered to the corresponding data buffer 200, and at the same time, controlling each computing circuit 20 to control the corresponding weight memory The weight value is read in 102 and cached in the corresponding weight buffer 202.
[0120] 所述初始输入数据有两方面的来源: 一是计算电路 20从所述数据存储器 100中 读取的, 二是计算电路 20运算得到的中间的卷积结果, 所述中间的卷积结果由 所述结果缓存器 206回传至所述数据存储器 200中, 作为下一层卷积神经网络的 初始输入数据。 每一个数据缓存器 200中可以同时存储多个初始输入数据。 在其 他实施例中, 每一个数据缓存器 200中还可以仅能存储一个初始输入数据。 当所 述一个初始输入数据进行卷积运算得到所有的卷积结果之后, 可以删除掉。 所 述所有的卷积结果进行累加后再作为新的初始输入数据缓存入所述数据缓存器 2 00中, 因而, 所述数据缓存器 200是一个类似先入先出 (First In First Out, FIFO ) 的缓存器。 [0120] The initial input data has two sources: one is that the calculation circuit 20 reads from the data memory 100, and the other is the intermediate convolution result obtained by the calculation circuit 20, and the intermediate convolution The result is returned from the result buffer 206 to the data storage 200 as the initial input data of the next layer of convolutional neural network. Each data buffer 200 can store multiple initial input data simultaneously. In other embodiments, each data buffer 200 can also store only one initial input data. After the initial input data is subjected to a convolution operation to obtain all the convolution results, it can be deleted. All the convolution results are accumulated and then cached as new initial input data in the data buffer 200. Therefore, the data buffer 200 is similar to First In First Out (FIFO) Cache.
[0121] 具体的, 所述卷积运算器 204根据所述数据缓存器 200中的所述初始输入数据及 所述权重缓存器 203中的所述权重值, 在当前层卷积神经网络中进行卷积运算得 到多个第一卷积结果, 并将具有对应关系的所述第一卷积结果进行累加后得到 多个第二卷积结果。 [0121] Specifically, the convolution operator 204 performs the current layer convolutional neural network according to the initial input data in the data buffer 200 and the weight value in the weight buffer 203 A plurality of first convolution results are obtained by a convolution operation, and a plurality of second convolution results are obtained after accumulating the first convolution results with a corresponding relationship.
[0122] 不同的计算电路 20进行卷积运算得到的卷积结果均不相同。 关于卷积运算器 20 4进行卷积运算的过程参见图 3及其相关描述。 [0122] The convolution results obtained by different calculation circuits 20 performing convolution operations are different. For the process of the convolution operation by the convolution operator 204, see FIG. 3 and its related description.
[0123] S43: 当对所有具有对应关系的所述第一卷积结果进行累加后, 控制所述至少 一个计算电路删除所述多个第一卷积结果。 [0123] S43: After accumulating all the first convolution results having a corresponding relationship, controlling the at least one calculation circuit to delete the plurality of first convolution results.
[0124] 计算电路 20的卷积运算器 204当对所有具有对应关系的所述第一卷积结果进行 累加后, 删除所述多个第一卷积结果。 [0124] The convolution operator 204 of the calculation circuit 20 deletes the plurality of first convolution results after accumulating all the first convolution results having a corresponding relationship.
[0125] S44: 判断所述当前层卷积神经网络是否为最后一层卷积神经网络。 [0125] S44: determine whether the current layer convolutional neural network is the last layer convolutional neural network.
[0126] 结果缓存器 206得到卷积结果之后, 判断该得到的卷积结果是否为最终的卷积 结果。 [0127] 当确定所述当前层卷积神经网络不为最后一层卷积神经网络时, 执行 S45 ; 否 贝 1J, 确定所述当前层卷积神经网络为最后一层卷积神经网络时, 执行 S46。 [0126] After the result buffer 206 obtains the convolution result, it is determined whether the obtained convolution result is the final convolution result. [0127] When it is determined that the current layer convolutional neural network is not the last layer convolutional neural network, execute S45; No 1J, when it is determined that the current layer convolutional neural network is the last layer convolutional neural network, Go to S46.
[0128] S45: 将所述多个第二卷积结果确定为中间的卷积结果, 并将所述中间的卷积 结果发送至所述至少一个计算电路中进行缓存, 作为下一层卷积神经网络的所 述初始输入数据。 [0128] S45: Determine the plurality of second convolution results as intermediate convolution results, and send the intermediate convolution results to the at least one calculation circuit for buffering as the next layer of convolution The initial input data of the neural network.
[0129] 当所述当前层卷积神经网络不为最后一层卷积神经网络时, 所述结果缓存器 20 6将所述当前层的多个第二卷积结果确定为中间的卷积结果, 并将所述中间的卷 积结果发送至所述计算电路 20中的数据缓存器 200进行缓存, 作为下一层卷积神 经网络的初始输入数据。 [0129] When the current layer convolutional neural network is not the last layer of convolutional neural network, the result buffer 206 determines multiple second convolution results of the current layer as intermediate convolution results And send the intermediate convolution result to the data buffer 200 in the calculation circuit 20 for buffering as the initial input data of the next layer of convolutional neural network.
[0130] S46: 将所述多个第二卷积结果确定为最终的卷积结果, 并将所述最终的卷积 结果发送至所述存储电路中。 [0130] S46: Determine the plurality of second convolution results as the final convolution result, and send the final convolution result to the storage circuit.
[0131] 当所述当前层卷积神经网络为最后一层卷积神经网络时, 所述结果缓存器 206 将所述当前层的多个第二卷积结果确定为最终的卷积结果, 并将所述最终的卷 积结果发送至所述存储电路 10中的数据存储器 100进行存储。 [0131] When the current layer convolutional neural network is the last layer convolutional neural network, the result buffer 206 determines a plurality of second convolution results of the current layer as final convolution results, and The final convolution result is sent to the data memory 100 in the storage circuit 10 for storage.
[0132] 由于在进行卷积运算的过程中, 通常是上一层卷积神经网络的输出作为下一层 卷积神经网络的输入, 即第一层卷积神经网络的输出作为第二层卷积神经网络 的输入, 第二层卷积神经网络的输出作为第三层卷积神经网络的输入, 以此类 推, 直到最后一层卷积神经网络输出卷积结果为止。 如果不是最后一层卷积神 经网络, 所述结果缓存器 206将中间的卷积结果直接缓存到对应所述计算电路 20 的数据缓存器 200中, 作为下一层卷积神经网络的初始输入数据进行卷积运算。 如果是最后一层卷积神经网络, 所述结果缓存器 206将最终的卷积结果发送至存 储电路 10中的数据存储器 100中进行存储。 [0132] In the process of performing the convolution operation, the output of the convolutional neural network of the previous layer is usually used as the input of the convolutional neural network of the next layer, that is, the output of the first layer of the convolutional neural network is used as the second layer of convolution The input of the convolutional neural network, the output of the second layer of the convolutional neural network is used as the input of the third layer of the convolutional neural network, and so on, until the last layer of the convolutional neural network outputs the convolution result. If it is not the last layer of convolutional neural network, the result buffer 206 directly buffers the intermediate convolution result to the data buffer 200 corresponding to the calculation circuit 20 as the initial input data of the next layer of convolutional neural network Perform the convolution operation. If it is the last layer of convolutional neural network, the result buffer 206 sends the final convolution result to the data memory 100 in the storage circuit 10 for storage.
[0133] 下面结合图 2所示的示意图举例说明本发明实施例提供的所述神经网络处理器 1 的数据处理过程。 [0133] The data processing process of the neural network processor 1 provided by an embodiment of the present invention is exemplified below with reference to the schematic diagram shown in FIG. 2.
[0134] 示例性的, 假设存储电路 10中存储的进行卷积运算所需的所述初始输入数据以 CiO表示, 权重值以 Weight表示, 其中, 所述初始输入数据 CiO存储于所述数据存 储器 100中, 所述权重值 Weight存储于所述权重存储器 102中。 [0134] Exemplarily, it is assumed that the initial input data stored in the storage circuit 10 required for performing the convolution operation is represented by CiO, and the weight value is represented by Weight, where the initial input data CiO is stored in the data memory In 100, the weight value Weight is stored in the weight memory 102.
[0135] 第一步, 所述存储电路 10向所有计算电路 20 (图中用 PE表示) 进行广播。 每一 个计算电路 20接收到广播信号后, 同步从所述数据存储器 100中读取所述初始输 入数据 CiO并缓存至所述数据缓存器 200; 同时, 每一个计算电路 20还同步从所述 权重存储器 102中读取所述权重值 Weight并缓存至所述权重缓存器 202中。 [0135] In the first step, the storage circuit 10 broadcasts to all calculation circuits 20 (denoted by PE in the figure). Every After receiving the broadcast signal, each computing circuit 20 synchronously reads the initial input data CiO from the data memory 100 and buffers it to the data buffer 200; meanwhile, each computing circuit 20 also synchronizes from the weight memory The weight value Weight is read in 102 and cached in the weight buffer 202.
[0136] 每一个计算电路 20的卷积运算器 204 (图中用 MAC表示) 根据对应的数据存储 器 200 (图中用 IBUF表示) 中的初始输入数据 CiO及对应的权重存储器 202中的权 重值 Weight进行第一层卷积神经网络的卷积运算, 得到第一层的卷积结果 CoO, 并将所述第一层的卷积结果 CoO缓存至所述结果缓存器 206中。 由于第一步运算 得到的卷积运算结果 CoO并不是最后层的卷积运算结果, 因而, 结果缓存器 206 (图中用 OBUF表示) 将所述第一层的卷积运算结果 CoO回传至所述计算电路 20 的数据缓存器 200中进行缓存, 作为第二层卷积神经网络的初始输入数据 Cil。 [0136] The convolution operator 204 (denoted by MAC in the figure) of each calculation circuit 20 is based on the initial input data CiO in the corresponding data memory 200 (denoted by IBUF in the figure) and the weight value in the corresponding weight memory 202 Weight performs the convolution operation of the first layer convolutional neural network to obtain the first layer convolution result CoO, and caches the first layer convolution result CoO into the result buffer 206. Since the convolution operation result CoO obtained in the first operation is not the final layer convolution operation result, the result buffer 206 (denoted by OBUF in the figure) returns the first layer convolution operation result CoO to The data buffer 200 of the calculation circuit 20 performs buffering as the initial input data Cil of the second layer convolutional neural network.
[0137] 第二步, 计算电路 20同步从数据缓存器 200中读取初始输入数据 Cil ; 卷积运算 器 204根据对应的数据存储器 200中的初始输入数据 Cil及对应的权重存储器 202中 的权重值进行第二层卷积神经网络的卷积运算, 得到第二层的卷积结果 Col, 并 将所述第一层的卷积结果 Col缓存至所述结果缓存器 206中。 结果缓存器 206将所 述第二层的卷积运算结果 Col回传至所述计算电路 20的数据缓存器 200中进行缓 存, 作为第三层卷积神经网络的初始输入数据 Ci2。 [0137] In the second step, the calculation circuit 20 synchronously reads the initial input data Cil from the data buffer 200; the convolution operator 204 according to the initial input data Cil in the corresponding data memory 200 and the corresponding weight in the weight memory 202 Value to perform the convolution operation of the second layer convolutional neural network to obtain the convolution result Col of the second layer, and cache the convolution result Col of the first layer into the result buffer 206. The result buffer 206 returns the convolution operation result Col of the second layer to the data buffer 200 of the calculation circuit 20 for buffering as the initial input data Ci2 of the third layer convolutional neural network.
[0138] 以此类推。 [0138] and so on.
[0139] 最后一步, 每一个计算电路 20根据倒数第二步运算得到的卷积结果和权重值进 行最后层卷积神经网络的卷积运算, 得到最终的卷积结果, 并将所述最终的卷 积结果发送至所述存储电路 10中的数据存储器 100中进行存储。 [0139] In the last step, each calculation circuit 20 performs the convolution operation of the final layer convolutional neural network according to the convolution result and the weight value obtained in the penultimate step operation to obtain the final convolution result, and the final The convolution result is sent to the data memory 100 in the storage circuit 10 for storage.
[0140] 需要说明的是, 由于本实施例中的卷积神经网络模型采用的是全连接的方式, 则每一个计算电路 20在进行卷积运算的过程中, 数据缓存器 200中的一个初始输 入数据, 在当前层卷积神经网络中进行卷积运算后, 会得到多个第一卷积结果 , 具有对应关系 (例如, 同一个神经元) 的所述第一卷积结果进行累加后得到 多个第二卷积结果。 当对所有具有对应关系的所述第一卷积结果进行累加后, 该初始输入数据可以被删掉。 直到最后一层卷积神经网络完成卷积运算之后, 就得到了最终的卷积结果。 [0140] It should be noted that, since the convolutional neural network model in this embodiment adopts a fully connected mode, each computing circuit 20 performs an initial operation in the data buffer 200 during the convolution operation. Input data, after performing convolution operations in the current layer of convolutional neural network, multiple first convolution results will be obtained, and the first convolution results with corresponding relationships (for example, the same neuron) are accumulated and obtained Multiple second convolution results. After accumulating all the first convolution results with a corresponding relationship, the initial input data may be deleted. Until the final layer of convolutional neural network completes the convolution operation, the final convolution result is obtained.
[0141] 本实施例中, 为便于与作为输入特征图的所述初始输入数据相对应, 所述卷积 结果也称之为输出特征图。 上述实施例阐述了所述神经网络处理器 1的数据处理 过程, 其中有三个层次的数据复用过程。 通过这三层的数据复用可以大大提升 神经网络处理器的运算并行度, 有效的降低整个处理器的功耗。 [0141] In this embodiment, to facilitate correspondence with the initial input data as the input feature map, the convolution The result is also called output feature map. The above embodiment illustrates the data processing process of the neural network processor 1, in which there are three levels of data multiplexing process. Through these three layers of data multiplexing, the parallelism of the neural network processor can be greatly improved, and the power consumption of the entire processor can be effectively reduced.
[0142] 下面具体说明三层的数据复用: [0142] The following specifically describes three-layer data multiplexing:
[0143] 第一层次数据复用: 每个计算电路 20第一次同步从存储电路 10中读取初始输入 数据和权重值, 完成第一层卷积神经网络中的卷积运算, 如此实现了同一个初 始输入数据和权重值在不同计算电路 20中的第一次数据复用。 [0143] The first level of data multiplexing: each calculation circuit 20 reads the initial input data and the weight value from the storage circuit 10 for the first time to complete the convolution operation in the first layer of convolutional neural network, so that The first data of the same initial input data and weight value in different calculation circuits 20 are multiplexed.
[0144] 第二层次数据复用: 每个计算电路 20的结果缓存器 206中可以同时存放多个多 个第一卷积结果, 将第一卷积结果与多个具有对应关系的其他第一卷积结果进 行累加, 实现了同一个第一卷积结果在同一个计算电路 20中的第二次数据复用 [0144] The second level of data multiplexing: The result buffer 206 of each calculation circuit 20 can store multiple multiple first convolution results at the same time, and the first convolution result and multiple other first relationships having a corresponding relationship The convolution results are accumulated to realize the second data multiplexing of the same first convolution result in the same calculation circuit 20
[0145] 第三层次数据复用: 所述所有卷积结果 (包括中间的卷积结果和最终的卷积结 果) 可以全部缓存于所述结果缓存器 206中, 如果第二卷积结果为中间的卷积结 果, 则结果缓存器 206将中间的卷积结果直接回传到所述数据缓存器 200中进行 缓存且作为下一层卷积神经网络的初始输入数据。 即, 通过将多个具有对应关 系的第一卷积结果进行累加后得到第二卷积结果, 作为下一层卷积神经网络的 初始输入数据, 实现了卷积神经网络层与层之间的第三次数据复用。 [0145] Third level data multiplexing: all the convolution results (including the intermediate convolution result and the final convolution result) can be all cached in the result buffer 206, if the second convolution result is intermediate , The result buffer 206 directly returns the intermediate convolution result to the data buffer 200 for buffering and serves as the initial input data of the next layer of convolutional neural network. That is, by accumulating a plurality of corresponding first convolution results to obtain a second convolution result, as the initial input data of the convolutional neural network of the next layer, the convolutional neural network layer to layer is realized. The third data multiplexing.
[0146] 上述三个层次的数据复用可以从图 1和图 2中体现出, 本发明实施例还提出了第 四层次数据复用的方案, 通过第四层次数据复用, 进一步优化了运算的并行度 、 提高了卷积运算器的运算效率和数据利用率。 第四层次数据复用的过程详情 参见如下图 3所示的示意图。 [0146] The above three levels of data multiplexing can be shown in FIG. 1 and FIG. 2, the embodiment of the present invention also proposes a fourth level of data multiplexing scheme, through the fourth level of data multiplexing, the operation is further optimized The degree of parallelism improves the computational efficiency and data utilization rate of the convolution operator. Refer to the schematic diagram shown in Figure 3 below for details of the fourth level data multiplexing process.
[0147] 图 3为卷积运算器使用某个初始输入数据运算对应的卷积结果的过程的示意图 。 图 3左边的为卷积核, 图 3中间的为初始输入数据, 图 3右边的为对应得到的卷 积结果。 [0147] FIG. 3 is a schematic diagram of a process in which a convolution operator uses a certain initial input data to calculate a corresponding convolution result. The left side of Fig. 3 is the convolution kernel, the middle part of Fig. 3 is the initial input data, and the right side of Fig. 3 is the corresponding convolution result.
[0148] 所述卷积运算器在当前层卷积神经网络中根据所述初始输入数据及所述权重值 进行卷积运算得到多个第一卷积结果包括: [0148] The convolution operator performs a convolution operation according to the initial input data and the weight value in the current layer convolutional neural network to obtain a plurality of first convolution results including:
[0149] 将所述初始输入数据的第 Q行数据与预设卷积核的第 L行数据进行卷积运算, 对应得到的数据为第三卷积结果的第 Q-L+1行的子数据; [0150] 将所有位于第 Q-L+l行的子数据进行累加, 得到第 Q-L+1行的数据; [0149] Perform a convolution operation on the Qth line data of the initial input data and the Lth line data of the preset convolution kernel, and the corresponding data is the sub-line of the Q-L+1 line of the third convolution result data; [0150] accumulate all the sub-data located in the Q-L+1 line to obtain the data in the Q-L+1 line;
[0151] 根据所述第三卷积结果与所述权重值进行卷积运算得到多个所述第一卷积结果 [0151] performing a convolution operation according to the third convolution result and the weight value to obtain a plurality of the first convolution results
[0152] 其中, Q的取值范围为 1至 M, M为所述初始输入数据的总行数, L的取值范围 为 1至 N, N为所述预设卷积核的总行数。 [0152] Wherein, the value range of Q is 1 to M, M is the total number of lines of the initial input data, the value range of L is 1 to N, and N is the total number of lines of the preset convolution kernel.
[0153] 将所述初始输入数据的所述第 Q行数据分别与所述预设卷积核的每行数据都进 行卷积运算, 并当所述第 Q行数据与所述预设卷积核的所有行数据都进行了卷积 运算之后, 对所述初始输入数据的所述第 Q行数据进行删除, 直至将所述初始输 入数据删除完毕。 [0153] performing a convolution operation on the Q-th row data of the initial input data and each row of data of the preset convolution kernel, and when the Q-th row data and the preset convolution After the convolution operation is performed on all the row data of the kernel, the Qth row data of the initial input data is deleted until the initial input data is deleted.
[0154] 示例性的, 每一个计算电路 20的卷积运算器 204采用 3*3的卷积核。 卷积核依次 从初始输入数据的左边滑到右边, 从初始输入数据的上边滑到下边, 在滑动的 过程中进行累乘加运算, 得到对应位置的卷积结果。 [0154] Exemplarily, the convolution operator 204 of each calculation circuit 20 uses a 3*3 convolution kernel. The convolution kernel slides from the left to the right of the initial input data, and then from the top to the bottom of the initial input data. During the sliding process, the multiply-accumulate operation is performed to obtain the convolution result of the corresponding position.
[0155] 当所述卷积核滑到如图 3所示的位置 1 (即卷积核滑到初始输入数据的第 m-2行 、 第 m-1行和第 m行) 的时候, 其中卷积核中的 w6、 w7、 w8需要与第 m行的数据 进行卷积运算, 得到的数据对应的是卷积结果的第 m-2的结果。 [0155] When the convolution kernel slides to the position 1 shown in FIG. 3 (that is, the convolution kernel slides to the m-2th, m-1, and mth rows of the initial input data), where The w6, w7, and w8 in the convolution kernel need to perform a convolution operation on the data of the mth row, and the obtained data corresponds to the m-2th result of the convolution result.
[0156] 当所述卷积核滑到如图 3所示的位置 2 (即卷积核滑到初始输入数据的第 m-1行 、 第 m行和第 m+1行) 的时候, 其中卷积核中的 w3、 w4、 w6需要与第 m行的数 据进行卷积运算, 得到的数据对应的是卷积结果的第 m-1行的结果。 [0156] When the convolution kernel slides to the position 2 shown in FIG. 3 (ie, the convolution kernel slides to the m-1th, mth, and m+1th rows of the initial input data), where The w3, w4, and w6 in the convolution kernel need to perform a convolution operation on the data of the mth row, and the obtained data corresponds to the result of the m-1th row of the convolution result.
[0157] 当所述卷积核滑到如图 3所示的位置 3 (即卷积核滑到初始输入数据的第 m行、 第 m+1行和第 m+2行的时候, 其中卷积核中的 wl、 w2、 w3需要与第 m行的数据 进行卷积运算, 得到的数据对应的是卷积结果的第 m行的结果。 [0157] When the convolution kernel slides to the position 3 shown in FIG. 3 (ie, the convolution kernel slides to the mth, m+1, and m+2th rows of the initial input data, where The wl, w2, and w3 in the product kernel need to be convoluted with the data of the mth row, and the obtained data corresponds to the result of the mth row of the convolution result.
[0158] 由上述可以看到, 第四层次数据复用: 在所述卷积运算器 204运算一个卷积结 果过程中, 初始输入数据的同一行, 例如, 初始输入数据的第 m行数据, 可以进 行 L*卷积结果的个数 (L为卷积核的行数) 次复用。 即, 通过将初始输入数据中 的一行数据与整个卷积核进行卷积运算, 实现了初始输入数据的一行数据的第 四次复用。 [0158] As can be seen from the above, the fourth level of data multiplexing: In the process of the convolution operator 204 computing a convolution result, the same line of initial input data, for example, the m-th line of data of the initial input data, The number of L* convolution results (L is the number of lines of the convolution kernel) can be multiplexed. That is, by performing a convolution operation on a row of data in the initial input data and the entire convolution kernel, the fourth multiplexing of the row of data on the initial input data is realized.
[0159] 综上所述, 本发明通过至少一个计算电路第一次从存储电路中读出初始输入数 据和权重值进行第一次卷积运算, 实现了同一个初始输入数据和权重值在不同 计算电路中的第一次数据复用; 通过将第一卷积结果与多个具有对应关系的其 他第一卷积结果进行累加, 实现了同一个第一卷积结果在同一个计算电路中的 第二次数据复用; 通过将多个具有对应关系的第一卷积结果进行累加后得到第 二卷积结果, 作为下一层卷积神经网络的初始输入数据, 实现了卷积神经网络 层与层之间的第三次数据复用。 即, 通过三次数据复用, 提高了数据的利用率 , 减少了数据访问的次数, 从而提高了计算电路的运算速度, 并降低了神经网 络处理器的功耗。 [0159] In summary, the present invention reads out the initial input data and the weight value from the storage circuit for the first convolution operation through the at least one calculation circuit for the first time, and realizes that the same initial input data and the weight value are different. The first data multiplexing in the calculation circuit; by accumulating the first convolution result with multiple other first convolution results with corresponding relationships, the same first convolution result in the same calculation circuit is realized The second data multiplexing; the second convolution result is obtained by accumulating multiple first convolution results with corresponding relationships, as the initial input data of the next layer of convolutional neural network, and the convolutional neural network layer is realized The third data multiplexing between layers. That is, through three times of data multiplexing, the data utilization rate is increased, the number of data accesses is reduced, thereby increasing the calculation speed of the calculation circuit and reducing the power consumption of the neural network processor.
[0160] 其次, 通过将初始输入数据中的每一行数据与整个卷积核进行卷积运算, 实现 了初始输入数据的每一行数据的第四次复用, 能够进一步地提高数据的利用率 , 减少数据访问的次数, 从而进一步地提高了计算电路的运算速度, 并降低了 神经网络处理器的功耗。 [0160] Secondly, by performing a convolution operation on each row of data in the initial input data and the entire convolution kernel, the fourth multiplexing of each row of data on the initial input data is realized, which can further improve the data utilization rate, The number of data accesses is reduced, thereby further increasing the calculation speed of the calculation circuit and reducing the power consumption of the neural network processor.
[0161] 再次, 在对所有具有对应关系的所述第一卷积结果进行累加后, 删除第一卷积 结果, 节省了存储电路的存储空间; 在初始输入数据的某行数据与卷积核完成 了卷积运算后, 将初始输入数据的该行数据进行删除, 能够进一步节省存储电 路的存储空间。 [0161] Again, after accumulating all the first convolution results with a corresponding relationship, the first convolution result is deleted to save the storage space of the storage circuit; a row of data in the initial input data and the convolution kernel After the convolution operation is completed, deleting the row of data of the initial input data can further save the storage space of the storage circuit.
[0162] 另外, 在有多个计算电路并行运行的情况下, 能够提高并行计算的效率。 [0162] In addition, in the case where a plurality of calculation circuits operate in parallel, the efficiency of parallel calculation can be improved.
[0163] 上述图 4详细介绍了本发明的卷积神经网络数据复用方法, 下面结合图 5和图 6 , 分别对实现所述卷积神经网络数据复用方法的软件系统的功能模块以及实现 所述卷积神经网络数据复用方法的硬件系统架构进行介绍。 [0163] The above FIG. 4 details the convolutional neural network data multiplexing method of the present invention. The following describes the functional modules and implementations of the software system that implements the convolutional neural network data multiplexing method in conjunction with FIGS. 5 and 6 respectively. The hardware system architecture of the convolutional neural network data multiplexing method is introduced.
[0164] 应该了解, 所述实施例仅为说明之用, 在专利申请范围上并不受此结构的限制 [0164] It should be understood that the embodiments are for illustrative purposes only, and are not limited by the structure in the scope of patent applications
[0165] 实施例三 Embodiment 3
[0166] 参阅图 5所示, 是本发明卷积神经网络数据复用装置的较佳实施例中的功能模 块图。 [0166] Referring to FIG. 5, it is a functional block diagram of a preferred embodiment of a convolutional neural network data multiplexing device of the present invention.
[0167] 在一些实施例中, 所述卷积神经网络数据复用装置 50运行于电子设备中。 所述 卷积神经网络数据复用装置 50可以包括多个由程序代码段所组成的功能模块。 所述卷积神经网络数据复用装置 50中的各个程序段的程序代码可以存储于所述 电子设备的存储器中, 并由至少一个处理器所执行, 以执行 (详见图 4描述) 卷 积神经网络的数据复用。 [0167] In some embodiments, the convolutional neural network data multiplexing device 50 runs in an electronic device. The convolutional neural network data multiplexing device 50 may include multiple function modules composed of program code segments. The program codes of each program segment in the convolutional neural network data multiplexing device 50 can be stored in the memory of the electronic device and executed by at least one processor to execute (see FIG. 4 for details) The data reuse of the product neural network.
[0168] 本实施例中, 所述卷积神经网络数据复用装置 50根据其所执行的功能, 可以被 划分为多个功能模块。 所述功能模块可以包括: 存储模块 501、 卷积运算模块 50 2、 删除模块 503、 判断模块 504、 第一确定模块 505及第二确定模块 506。 本发明 所称的模块是指一种能够被至少一个处理器所执行并且能够完成固定功能的一 系列运算机程序段, 其存储在所述存储器中。 在本实施例中, 关于各模块的功 能将在后续的实施例中详述。 [0168] In this embodiment, the convolutional neural network data multiplexing device 50 may be divided into multiple functional modules according to the functions it performs. The functional module may include: a storage module 501, a convolution operation module 502, a deletion module 503, a judgment module 504, a first determination module 505, and a second determination module 506. The module referred to in the present invention refers to a series of computer program segments that can be executed by at least one processor and can perform fixed functions, and are stored in the memory. In this embodiment, the functions of each module will be described in detail in subsequent embodiments.
[0169] 存储模块 501, 用于通过所述存储电路存储进行卷积运算所需的初始输入数据 和权重值; [0169] a storage module 501, configured to store, through the storage circuit, initial input data and weight values required for convolution operation;
[0170] 卷积运算模块 502, 用于控制所述至少一个计算电路在当前层卷积神经网络中 根据所述初始输入数据及所述权重值进行卷积运算得到多个第一卷积结果, 将 具有对应关系的所述第一卷积结果进行累加后得到多个第二卷积结果; [0170] The convolution operation module 502 is configured to control the at least one calculation circuit to perform a convolution operation according to the initial input data and the weight value in the current layer convolutional neural network to obtain a plurality of first convolution results, Accumulating the first convolution results with corresponding relationships to obtain multiple second convolution results;
[0171] 删除模块 503 , 用于当对所有具有对应关系的所述第一卷积结果进行累加后, 控制所述至少一个计算电路删除所述多个第一卷积结果; [0171] a deletion module 503, configured to control the at least one calculation circuit to delete the plurality of first convolution results after accumulating all the first convolution results having a corresponding relationship;
[0172] 判断模块 504, 用于判断所述当前层卷积神经网络是否为最后一层卷积神经网 络。 [0172] The judgment module 504 is configured to judge whether the current layer convolutional neural network is the last layer convolutional neural network.
[0173] 第一确定模块 505 , 用于当所述判断模块 504确定所述当前层卷积神经网络不为 最后一层卷积神经网络时, 将所述多个第二卷积结果确定为中间的卷积结果, 并将所述中间的卷积结果发送至所述至少一个计算电路中进行缓存, 作为下一 层卷积神经网络的所述初始输入数据; [0173] The first determination module 505 is configured to determine the plurality of second convolution results as intermediate when the determination module 504 determines that the current layer convolutional neural network is not the last layer convolutional neural network And the intermediate convolution result is sent to the at least one calculation circuit for buffering as the initial input data of the convolutional neural network of the next layer;
[0174] 第二确定模块 506 , 用于当所述判断模块 504确定所述当前层卷积神经网络为所 述最后一层卷积神经网络时, 将所述多个第二卷积结果确定为最终的卷积结果 , 并将所述最终的卷积结果发送至所述存储电路中。 [0174] The second determination module 506 is configured to determine the plurality of second convolution results when the determination module 504 determines that the current layer convolutional neural network is the last layer convolutional neural network The final convolution result, and the final convolution result is sent to the storage circuit.
[0175] 关于上述模块 (501-506) 的具体描述, 可参见实施例所述的卷积神经网络数 据复用方法, 本文不再详细阐述。 [0175] For a specific description of the above modules (501-506), refer to the data multiplexing method of the convolutional neural network described in the embodiment, which will not be elaborated in detail herein.
[0176] 综上所述, 本发明实施例提供的所述神经网络处理器, 通过至少一个计算电路 第一次从存储电路中读出初始输入数据和权重值进行第一次卷积运算, 实现了 同一个初始输入数据和权重值在不同计算电路中的第一次数据复用; 通过将第 一卷积结果与多个具有对应关系的其他第一卷积结果进行累加, 实现了同一个 第一卷积结果在同一个计算电路中的第二次数据复用; 通过将多个具有对应关 系的第一卷积结果进行累加后得到第二卷积结果, 作为下一层卷积神经网络的 初始输入数据, 实现了卷积神经网络层与层之间的第三次数据复用; 通过将初 始输入数据中的一行数据与整个卷积核进行卷积运算, 实现了初始输入数据的 一行数据的第四次复用。 即, 通过四次数据复用, 提高了数据的利用率, 减少 了数据访问的次数, 有效的降低了处理器的功耗, 而降低了处理器的功耗又能 提高计算电路的运算并行度。 [0176] In summary, the neural network processor provided by the embodiment of the present invention implements the first convolution operation by at least one calculation circuit reading out the initial input data and the weight value from the storage circuit for the first time The first data multiplexing of the same initial input data and weight value in different calculation circuits; A convolution result is accumulated with a plurality of other first convolution results having a corresponding relationship, and the second data multiplexing of the same first convolution result in the same calculation circuit is realized; by multiple corresponding relationships The first convolution result is accumulated and the second convolution result is obtained. As the initial input data of the next layer of convolutional neural network, the third data multiplexing between the layers of the convolutional neural network is realized; A row of data in the initial input data is convoluted with the entire convolution kernel to realize the fourth multiplexing of a row of data of the initial input data. That is, through four times of data multiplexing, the utilization rate of data is improved, the number of data accesses is reduced, and the power consumption of the processor is effectively reduced, which reduces the power consumption of the processor and improves the parallelism of the calculation circuit .
[0177] 其次, 在对所有具有对应关系的所述第一卷积结果进行累加后, 删除第一卷积 结果, 节省了计算电路的存储空间; 而在初始输入数据的某行数据与卷积核的 每行数据完成了卷积运算后, 将初始输入数据的该行数据进行删除, 进一步节 省了计算电路的存储空间, 从而有效降低了整个神经网络处理器的功耗, 提高 了计算电路进行卷积运算的运算效率。 [0177] Secondly, after accumulating all the first convolution results having a corresponding relationship, the first convolution result is deleted to save the storage space of the calculation circuit; and a row of data and convolution in the initial input data After completing the convolution operation for each row of data in the kernel, the row of data of the initial input data is deleted, which further saves the storage space of the calculation circuit, thereby effectively reducing the power consumption of the entire neural network processor and improving the calculation circuit. Operational efficiency of convolution operations.
[0178] 实施例四 Embodiment 4
[0179] 参阅图 6所示, 在本发明较佳实施例中, 所述电子设备 6包括存储器 61、 至少一 个处理器 62、 至少一条通信总线 63、 显示屏幕 64及至少一个神经网络处理器 66 [0179] Referring to FIG. 6, in a preferred embodiment of the present invention, the electronic device 6 includes a memory 61, at least one processor 62, at least one communication bus 63, a display screen 64, and at least one neural network processor 66
[0180] 本领域技术人员应该了解, 图 6示出的电子设备的结构并不构成本发明实施例 的限定, 既可以是总线型结构, 也可以是星形结构, 所述电子设备 6还可以包括 比图示更多或更少的其他硬件或者软件, 或者不同的部件布置。 [0180] Those skilled in the art should understand that the structure of the electronic device shown in FIG. 6 does not constitute a limitation of the embodiment of the present invention, and may be a bus-type structure or a star structure, and the electronic device 6 may also It includes more or less other hardware or software than shown, or a different arrangement of components.
[0181] 在一些实施例中, 所述电子设备 6包括一种能够按照事先设定或存储的指令, 自动进行数值运算和 /或者信息处理的设备。 所述电子设备 6的硬件包括但不限于 : 微处理器、 专用神经网络处理器 (Application Specific Integrated Circuit, ASIC ) 、 可编程门阵列 (Field— Programmable Gate [0181] In some embodiments, the electronic device 6 includes a device that can automatically perform numerical operations and/or information processing according to instructions set or stored in advance. The hardware of the electronic device 6 includes but is not limited to: a microprocessor, a dedicated neural network processor (Application Specific Integrated Circuit, ASIC), and a programmable gate array (Field-Programmable Gate)
Array, FPGA) 、 数字处理器 (Digital Signal Processor, DSP) 及嵌入式设备等 Array, FPGA), Digital Signal Processor (DSP) and embedded devices, etc.
。 所述电子设备 6还可包括用户设备, 所述用户设备包括但不限于任何一种可与 用户通过键盘、 鼠标、 遥控器、 触摸板或声控设备等方式进行人机交互的电子 产品, 例如, 个人运算机、 平板电脑、 智能手机、 数码相机等。 [0182] 需要说明的是, 所述电子设备 6仅为举例, 其他现有的或今后可能出现的电子 产品如可适应于本发明, 也应包含在本发明的保护范围以内, 并以引用方式包 含于此。 . The electronic device 6 may further include user equipment. The user equipment includes, but is not limited to, any electronic product that can interact with a user through a keyboard, a mouse, a remote control, a touchpad, or a voice control device. For example, Personal computer, tablet, smart phone, digital camera, etc. [0182] It should be noted that the electronic device 6 is only an example, and other existing or future electronic products that may be adapted to the present invention should also be included in the scope of protection of the present invention, and by reference Included here.
[0183] 在一些实施例中, 所述存储器 61用于存储程序代码和各种数据, 例如安装在所 述电子设备 6中的卷积神经网络数据复用装置 50, 并在电子设备 6的运行过程中 实现高速、 自动地完成程序或数据的存取。 所述存储器 61包括只读存储器 (Rea d- Only Memory, ROM) 、 随机存储器 (Random Access Memory, RAM) 、 可 编程只读存储器 (Programmable Read-Only Memory, PROM) 、 可擦除可编程 只读存储器 (Erasable Programmable Read-Only Memory, EPROM) 、 一次可编 程只读存储器 (One-time Programmable Read-Only Memory, OTPROM) 、 电子 擦除式可复写只读存储器 (Electrically-Erasable Programmable Read-Only Memory , EEPROM) 、 只读光盘 (Compact Disc Read-Only Memory, CD-ROM) 或其 他光盘存储器、 磁盘存储器、 磁带存储器、 或者能够用于携带或存储数据的运 算机可读的任何其他介质。 [0183] In some embodiments, the memory 61 is used to store program codes and various data, such as a convolutional neural network data multiplexing device 50 installed in the electronic device 6, and is running on the electronic device 6. In the process, high-speed and automatic access to programs or data is achieved. The memory 61 includes read-only memory (Rea d-Only Memory, ROM), random access memory (Random Access Memory, RAM), programmable read-only memory (Programmable Read-Only Memory, PROM), and erasable programmable read-only Memory (Erasable Programmable Read-Only Memory, EPROM), One-time Programmable Read-Only Memory (OTPROM), Electronically Erasable Programmable Read-Only Memory (Electrically-Erasable Programmable Read-Only Memory, EEPROM), Compact Disc Read-Only Memory (CD-ROM) or other optical disk storage, magnetic disk storage, magnetic tape storage, or any other medium readable by a computer that can be used to carry or store data.
[0184] 在一些实施例中, 所述至少一个处理器 62可以由神经网络处理器组成, 例如可 以由单个封装的神经网络处理器所组成, 也可以是由多个相同功能或不同功能 封装的神经网络处理器所组成, 包括一个或者多个中央处理器 (Central [0184] In some embodiments, the at least one processor 62 may be composed of a neural network processor, for example, may be composed of a single packaged neural network processor, or may be packaged by multiple same functions or different functions Neural network processor, including one or more central processors (Central
Processing unit, CPU) 、 微处理器、 数字处理芯片、 图形处理器及各种控制芯 片的组合等。 所述至少一个处理器 62是所述电子设备 6的控制核心 (Control Unit ) , 利用各种接口和线路连接整个电子设备 6的各个部件, 通过运行或执行存储 在所述存储器 61内的程序或者模块, 以及调用存储在所述存储器 61内的数据, 以执行电子设备 6的各种功能和处理数据, 例如执行卷积神经网络数据复用的功 能。 Processing unit (CPU), microprocessor, digital processing chip, graphics processor and various control chip combinations. The at least one processor 62 is a control unit (Control Unit) of the electronic device 6, connects various components of the entire electronic device 6 with various interfaces and lines, and runs or executes a program stored in the memory 61 or A module, and calling data stored in the memory 61 to perform various functions of the electronic device 6 and process data, such as a function of performing multiplexing of convolutional neural network data.
[0185] 在一些实施例中, 所述至少一条通信总线 63被设置为实现所述存储器 61、 所述 至少一个处理器 62、 所述显示屏幕 64以及所述至少一个神经网络处理器 66等之 间的连接通信。 [0185] In some embodiments, the at least one communication bus 63 is configured to implement one of the memory 61, the at least one processor 62, the display screen 64, the at least one neural network processor 66, etc. Connection communication.
[0186] 在一些实施例中, 所述显示屏幕 64可用于显示由观看者输入的信息或提供给观 看者的信息以及电子设备 6的各种图形观看者接口, 这些图形观看者接口可以由 图形、 文本、 图标、 视频和其任意组合来构成。 所述显示屏幕 64可包括显示面 板, 可选的, 可以采用液晶显示屏幕 (Liquid Crystal Display, LCD) 、 有机发 光二极管 (Organic Light- Emitting Diode, OLED) 等形式来配置显示面板。 [0186] In some embodiments, the display screen 64 may be used to display information input by the viewer or provided to the viewer and various graphical viewer interfaces of the electronic device 6, these graphical viewer interfaces may be It consists of graphics, text, icons, video and any combination thereof. The display screen 64 may include a display panel. Alternatively, the display panel may be configured in the form of a liquid crystal display (Liquid Crystal Display, LCD), organic light-emitting diode (Organic Light-Emitting Diode, OLED), or the like.
[0187] 所述显示屏幕 64还可以包括触摸面板。 如果所述显示屏幕 64包括触摸面板, 所 述显示屏幕 64可以被实现为触摸屏, 以接收来自观看者的输入信号。 触摸面板 包括一个或多个触摸传感器以感测触摸、 滑动和触摸面板上的手势。 上述触摸 传感器可以不仅感测触摸或滑动动作的边界, 而且还检测与上述触摸或滑动操 作相关的持续时间和压力。 所述显示面板与所述触摸面板可以作为两个独立的 部件来实现输入和输入功能, 但是在某些实施例中, 可以将所述显示面板与所 述触摸面板进行集成而实现输入和输出功能。 [0187] The display screen 64 may further include a touch panel. If the display screen 64 includes a touch panel, the display screen 64 may be implemented as a touch screen to receive input signals from the viewer. The touch panel includes one or more touch sensors to sense touch, swipe, and gestures on the touch panel. The above-mentioned touch sensor can not only sense the boundary of the touch or slide operation, but also detect the duration and pressure related to the above-mentioned touch or slide operation. The display panel and the touch panel may be implemented as two independent components to realize input and input functions, but in some embodiments, the display panel and the touch panel may be integrated to realize input and output functions .
[0188] 尽管未示出, 所述电子设备 6还可以包括给各个部件供电的电源 (比如电池) [0188] Although not shown, the electronic device 6 may further include a power supply (such as a battery) to supply power to various components
, 优选的, 电源可以通过电源管理系统与所述至少一个处理器 62逻辑相连, 从 而通过电源管理系统实现管理充电、 放电、 以及功耗管理等功能。 电源还可以 包括一个或一个以上的直流或交流电源、 再充电系统、 电源故障检测电路、 电 源转换器或者逆变器、 电源状态指示器等任意组件。 所述电子设备 6还可以包括 多种传感器、 蓝牙模块、 通讯模块等。 本发明在此不再赘述。 Preferably, the power supply may be logically connected to the at least one processor 62 through a power management system, so that functions such as charging, discharging, and power consumption management are implemented through the power management system. The power supply may also include any component such as one or more DC or AC power supplies, recharging systems, power failure detection circuits, power converters or inverters, and power status indicators. The electronic device 6 may also include various sensors, Bluetooth modules, communication modules, and the like. The invention is not repeated here.
[0189] 应该了解, 所述实施例仅为说明之用, 在专利申请范围上并不受此结构的限制 [0189] It should be understood that the described embodiments are for illustrative purposes only, and are not limited by this structure in the scope of patent applications
[0190] 上述以软件功能模块的形式实现的集成的单元, 可以存储在一个运算机可读取 存储介质中。 上述软件功能模块存储在一个存储介质中, 包括若干指令用以使 得一台运算机设备 (可以是个人运算机, 客户端, 或者网络设备等) 或处理器 (processor) 执行本发明各个实施例所述方法的部分。 [0190] The above integrated unit implemented in the form of a software function module may be stored in a computer-readable storage medium. The above software function modules are stored in a storage medium, and include several instructions to enable a computing device (which may be a personal computing device, a client, or a network device, etc.) or a processor to execute various embodiments of the present invention. Part of the method.
[0191] 在进一步的实施例中, 结合图 1, 所述至少一个处理器 62可执行所述电子设备 6 的操作系统以及安装的各类应用程序 (如所述的卷积神经网络数据复用装置 60 ) 、 程序代码等。 [0191] In a further embodiment, with reference to FIG. 1, the at least one processor 62 may execute the operating system of the electronic device 6 and various installed application programs (such as the convolutional neural network data multiplexing described above) Device 60), program code, etc.
[0192] 所述存储器 61中存储有程序代码, 且所述至少一个处理器 62可调用所述存储器 61中存储的程序代码以执行相关的功能。 例如, 图 5中所述的各个模块是存储在 所述存储器 61中的程序代码, 并由所述至少一个处理器 62所执行, 从而实现所 述各个模块的功能以达到根据用户需求生成神经网络模型的目的。 [0192] The memory 61 stores program codes, and the at least one processor 62 may call the program codes stored in the memory 61 to perform related functions. For example, each module described in FIG. 5 is a program code stored in the memory 61, and is executed by the at least one processor 62, so as to realize The function of each module is described to achieve the purpose of generating a neural network model according to user needs.
[0193] 在本发明的一个实施例中, 所述存储器 61存储多个指令, 所述多个指令被所述 至少一个处理器 62所执行以实现随机生成神经网络模型的功能。 [0193] In one embodiment of the present invention, the memory 61 stores a plurality of instructions, and the plurality of instructions are executed by the at least one processor 62 to implement a function of randomly generating a neural network model.
[0194] 具体地, 所述至少一个处理器 62对上述指令的具体实现方法可参考图 1对应实 施例中相关步骤的描述, 在此不赘述。 [0194] Specifically, for a specific implementation method of the at least one processor 62 for the above instruction, reference may be made to the description of relevant steps in the embodiment corresponding to FIG. 1, and details are not described herein again.
[0195] 在本发明所提供的几个实施例中, 应该理解到, 所揭露的系统, 装置和方法, 可以通过其它的方式实现。 例如, 以上所描述的装置实施例仅仅是示意性的, 例如, 所述模块的划分, 仅仅为一种逻辑功能划分, 实际实现时可以有另外的 划分方式。 [0195] In the several embodiments provided by the present invention, it should be understood that the disclosed system, device, and method may be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of the module is only a division of logical functions, and in actual implementation, there may be another division manner.
[0196] 所述作为分离部件说明的模块可以是或者也可以不是物理上分开的, 作为模块 显示的部件可以是或者也可以不是物理单元, 即可以一个地方, 或者也可以分 布到多个网络单元上。 可以根据实际的需要选择其中的部分或者全部模块来实 现本实施例方案的目的。 [0196] The module described as a separate component may or may not be physically separated, and the component displayed as a module may or may not be a physical unit, that is, may be in one place, or may be distributed to multiple network units on. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
[0197] 另外, 在本发明各个实施例中的各功能模块可以集成在一个处理单元中, 也可 以是各个单元单独物理存在, 也可以两个或两个以上单元集成在一个单元中。 上述集成的单元既可以采用硬件的形式实现, 也可以采用硬件加软件功能模块 的形式实现。 [0197] In addition, each functional module in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware, or in the form of hardware plus software function modules.

Claims

权利要求书 Claims
[权利要求 i] 一种神经网络处理器, 其特征在于, 所述神经网络处理器包括: 存储电路, 用于存储进行卷积运算所需的初始输入数据和权重值; 至少一个计算电路, 用于从所述存储电路中读取所述初始输入数据和 所述权重值, 并基于所述初始输入数据和所述权重值进行卷积运算, 其中, [Claim i] A neural network processor, characterized in that the neural network processor comprises: a storage circuit for storing initial input data and weight values required for convolution operation; at least one calculation circuit, used Reading the initial input data and the weight value from the storage circuit, and performing a convolution operation based on the initial input data and the weight value, where,
所述至少一个计算电路包括: The at least one calculation circuit includes:
数据缓存器, 用于缓存所述计算电路读取的所述初始输入数据; 权重缓存器, 用于缓存所述计算电路读取的所述权重值; A data buffer for buffering the initial input data read by the calculation circuit; a weight buffer for buffering the weight value read by the calculation circuit;
卷积运算器, 用于在当前层卷积神经网络中根据所述初始输入数据及 所述权重值进行卷积运算得到多个第一卷积结果, 并将具有对应关系 的所述第一卷积结果进行累加后得到多个第二卷积结果; 同时, 当对 所有具有对应关系的所述第一卷积结果进行累加后, 删除所述多个第 一卷积结果; A convolution operator, configured to perform a convolution operation according to the initial input data and the weight value in the current layer convolutional neural network to obtain multiple first convolution results, and convert the first volume having a corresponding relationship After accumulating the product results, a plurality of second convolution results are obtained; meanwhile, after accumulating all the first convolution results having a corresponding relationship, the plurality of first convolution results are deleted;
结果缓存器, 用于缓存所述多个第二卷积结果, 并根据预设存储规则 将所述多个第二卷积结果发送至所述数据缓存器中, 作为下一层卷积 神经网络的所述初始输入数据; 或者, 发送至所述存储电路中进行存 储。 A result buffer, used to cache the plurality of second convolution results, and send the plurality of second convolution results to the data buffer according to a preset storage rule, as a next layer of convolutional neural network The initial input data; or, sent to the storage circuit for storage.
[权利要求 2] 如权利要求 1所述的神经网络处理器, 其特征在于, 所述预设存储规 则包括: [Claim 2] The neural network processor according to claim 1, wherein the preset storage rule includes:
当所述当前层卷积神经网络不为最后一层卷积神经网络时, 所述结果 缓存器将所述多个第二卷积结果确定为中间的卷积结果, 并将所述中 间的卷积结果发送至所述数据缓存器; When the current layer convolutional neural network is not the last layer convolutional neural network, the result buffer determines the plurality of second convolution results as an intermediate convolution result, and the intermediate convolution The product result is sent to the data buffer;
当所述当前层卷积神经网络为最后一层卷积神经网络时, 所述结果缓 存器将所述多个第二卷积结果确定为最终的卷积结果, 并将所述最终 的卷积结果发送至所述存储电路中。 When the current layer convolutional neural network is the last layer convolutional neural network, the result buffer determines the plurality of second convolution results as the final convolution result, and the final convolution The result is sent to the storage circuit.
[权利要求 3] 如权利要求 1或 2所述的神经网络处理器, 其特征在于, 所述卷积运算 器在当前层卷积神经网络中根据所述初始输入数据及所述权重值进行 卷积运算得到多个第一卷积结果包括: [Claim 3] The neural network processor according to claim 1 or 2, wherein the convolution operator performs the current input convolutional neural network based on the initial input data and the weight value The multiple first convolution results obtained by the convolution operation include:
将所述初始输入数据的第 Q行数据与预设卷积核的第 L行数据进行卷 积运算, 对应得到的数据为第三卷积结果的第 Q-L+1行的子数据; 将所有位于第 Q-L+1行的子数据进行累加, 得到第 Q-L+1行的数据; 根据所述第三卷积结果与所述权重值进行卷积运算得到多个所述第一 卷积结果; Perform a convolution operation on the Q-line data of the initial input data and the L-line data of the preset convolution kernel, and the corresponding data is the sub-data of the Q-L+1 line of the third convolution result; Accumulate all the sub-data located in the Q-L+1 line to obtain the data in the Q-L+1 line; perform a convolution operation according to the third convolution result and the weight value to obtain a plurality of the first Convolution results;
其中, Q的取值范围为 1至 M, M为所述初始输入数据的总行数, L的 取值范围为 1至 N, N为所述预设卷积核的总行数。 Wherein, the value range of Q is 1 to M, M is the total number of lines of the initial input data, and the value range of L is 1 to N, and N is the total number of lines of the preset convolution kernel.
[权利要求 4] 如权利要求 3所述的神经网络处理器, 其特征在于, 将所述初始输入 数据的所述第 Q行数据分别与所述预设卷积核的每行数据都进行卷积 运算, 并当所述第 Q行数据与所述预设卷积核的所有行数据都进行了 卷积运算之后, 对所述初始输入数据的所述第 Q行数据进行删除, 直 至将所述初始输入数据删除完毕。 [Claim 4] The neural network processor according to claim 3, characterized in that the Q-th line data of the initial input data and each line of data of the preset convolution kernel are respectively rolled Product operation, and after performing a convolution operation on the Q-th row data and all the row data of the preset convolution kernel, delete the Q-th row data of the initial input data until all The initial input data is deleted.
[权利要求 5] 一种卷积神经网络数据复用方法, 应用于电子设备中, 其特征在于, 所述电子设备包括如权利要求 1至 4中任意一项所述的神经网络处理器 , 所述方法包括: [Claim 5] A convolutional neural network data multiplexing method applied to an electronic device, characterized in that the electronic device includes the neural network processor according to any one of claims 1 to 4. The methods described include:
通过所述存储电路存储进行卷积运算所需的初始输入数据和权重值; 控制所述至少一个计算电路在当前层卷积神经网络中根据所述初始输 入数据及所述权重值进行卷积运算得到多个第一卷积结果, 将具有对 应关系的所述第一卷积结果进行累加后得到多个第二卷积结果; 当对所有具有对应关系的所述第一卷积结果进行累加后, 控制所述至 少一个计算电路删除所述多个第一卷积结果; 当所述当前层卷积神经网络不为最后一层卷积神经网络时, 将所述多 个第二卷积结果确定为中间的卷积结果, 并将所述中间的卷积结果发 送至所述至少一个计算电路中进行缓存, 作为下一层卷积神经网络的 所述初始输入数据; Storing, by the storage circuit, initial input data and weight values required for performing convolution operations; controlling the at least one calculation circuit to perform convolution operations according to the initial input data and the weight values in the current layer convolutional neural network Obtaining multiple first convolution results, and accumulating the first convolution results with corresponding relationships to obtain multiple second convolution results; when accumulating all the first convolution results with corresponding relationships , Controlling the at least one calculation circuit to delete the plurality of first convolution results; when the current layer convolution neural network is not the last layer convolution neural network, determine the plurality of second convolution results Is an intermediate convolution result, and the intermediate convolution result is sent to the at least one calculation circuit for buffering as the initial input data of the next layer of convolution neural network;
当所述当前层卷积神经网络为所述最后一层卷积神经网络时, 将所述 多个第二卷积结果确定为最终的卷积结果, 并将所述最终的卷积结果 发送至所述存储电路中。 When the current layer convolutional neural network is the last layer convolutional neural network, the plurality of second convolution results are determined as final convolution results, and the final convolution results Sent to the memory circuit.
[权利要求 6] 如权利要求 5所述的方法, 其特征在于, 控制所述至少一个计算电路 在当前层卷积神经网络中根据所述初始输入数据及所述权重值进行卷 积运算得到多个第一卷积结果包括: [Claim 6] The method according to claim 5, characterized in that the at least one calculation circuit is controlled to perform a convolution operation based on the initial input data and the weight value in the current layer convolutional neural network to obtain multiple The first convolution results include:
将所述初始输入数据的第 Q行数据与预设卷积核的第 L行数据进行卷 积运算, 对应得到的数据为第三卷积结果的第 Q-L+1行的子数据; 将所有位于第 Q-L+1行的子数据进行累加, 得到第 Q-L+1行的数据; 根据所述第三卷积结果与所述权重值进行卷积运算得到多个所述第一 卷积结果; Perform a convolution operation on the Q-line data of the initial input data and the L-line data of the preset convolution kernel, and the corresponding data is the sub-data of the Q-L+1 line of the third convolution result; Accumulate all the sub-data located in the Q-L+1 line to obtain the data in the Q-L+1 line; perform a convolution operation according to the third convolution result and the weight value to obtain a plurality of the first Convolution results;
其中, Q的取值范围为 1至 M, M为所述初始输入数据的总行数, L的 取值范围为 1至 N, N为所述预设卷积核的总行数。 Wherein, the value range of Q is 1 to M, M is the total number of lines of the initial input data, and the value range of L is 1 to N, and N is the total number of lines of the preset convolution kernel.
[权利要求 7] 如权利要求 6所述的神经网络处理器, 其特征在于, 将所述初始输入 数据的所述第 Q行数据分别与所述预设卷积核的每行数据都进行卷积 运算, 并当所述第 Q行数据与所述预设卷积核的所有行数据都进行了 卷积运算之后, 对所述初始输入数据的所述第 Q行数据进行删除, 直 至将所述初始输入数据删除完毕。 [Claim 7] The neural network processor according to claim 6, wherein the Q-th line data of the initial input data and each line of data of the preset convolution kernel are respectively rolled Product operation, and after performing a convolution operation on the Q-th row data and all the row data of the preset convolution kernel, delete the Q-th row data of the initial input data until all The initial input data is deleted.
[权利要求 8] 一种卷积神经网络数据复用装置, 安装于电子设备中, 其特征在于, 所述电子设备包括如权利要求 1至 4中任意一项所述的神经网络处理器 , 所述装置包括: [Claim 8] A convolutional neural network data multiplexing device, installed in an electronic device, characterized in that the electronic device includes the neural network processor according to any one of claims 1 to 4. The device includes:
存储模块, 用于通过所述存储电路存储进行卷积运算所需的初始输入 数据和权重值; A storage module, configured to store the initial input data and weight values required for convolution operation through the storage circuit;
卷积运算模块, 用于控制所述至少一个计算电路在当前层卷积神经网 络中根据所述初始输入数据及所述权重值进行卷积运算得到多个第一 卷积结果, 将具有对应关系的所述第一卷积结果进行累加后得到多个 第二卷积结果; A convolution operation module, configured to control the at least one calculation circuit to perform a convolution operation according to the initial input data and the weight value in the current layer convolutional neural network to obtain multiple first convolution results, which will have a corresponding relationship After accumulating the first convolution results, multiple second convolution results are obtained;
删除模块, 用于当对所有具有对应关系的所述第一卷积结果进行累加 后, 控制所述至少一个计算电路删除所述多个第一卷积结果; 第一确定模块, 用于当所述当前层卷积神经网络不为最后一层卷积神 经网络时, 将所述多个第二卷积结果确定为中间的卷积结果, 并将所 述中间的卷积结果发送至所述至少一个计算电路中进行缓存, 作为下 一层卷积神经网络的所述初始输入数据; A deleting module, configured to control the at least one calculation circuit to delete the plurality of first convolution results after accumulating all the first convolution results having a corresponding relationship; The current layer of convolutional neural network is not the last layer of convolution When passing through the network, determine the plurality of second convolution results as intermediate convolution results, and send the intermediate convolution results to the at least one calculation circuit for buffering as the next layer of convolution neurals The initial input data of the network;
第二确定模块, 用于当所述当前层卷积神经网络为所述最后一层卷积 神经网络时, 将所述多个第二卷积结果确定为最终的卷积结果, 并将 所述最终的卷积结果发送至所述存储电路中。 A second determining module, configured to determine the plurality of second convolution results as the final convolution result when the current layer of convolutional neural network is the last layer of convolutional neural network, and convert the The final convolution result is sent to the storage circuit.
[权利要求 9] 一种电子设备, 其特征在于, 所述电子设备包括处理器, 所述处理器 用于执行存储器中存储的运算机程序时实现如权利要求 5至 7中任意一 项所述卷积神经网络数据复用方法。 [Claim 9] An electronic device, characterized in that the electronic device includes a processor, and the processor is used to implement a computer program stored in a memory to implement the volume according to any one of claims 5 to 7. Product multiplexing method of product neural network.
[权利要求 10] 一种运算机可读存储介质, 其上存储有运算机程序, 其特征在于, 所 述运算机程序被处理器执行时实现如权利要求 5至 7中任意一项所述卷 积神经网络数据复用方法。 [Claim 10] A computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the volume according to any one of claims 5 to 7 is realized Product multiplexing method of product neural network.
PCT/CN2019/114725 2018-12-27 2019-10-31 Neural network processor, convolutional neural network data multiplexing method and related device WO2020134546A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811614780.3 2018-12-27
CN201811614780.3A CN109740732B (en) 2018-12-27 2018-12-27 Neural network processor, convolutional neural network data multiplexing method and related equipment

Publications (1)

Publication Number Publication Date
WO2020134546A1 true WO2020134546A1 (en) 2020-07-02

Family

ID=66361448

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/114725 WO2020134546A1 (en) 2018-12-27 2019-10-31 Neural network processor, convolutional neural network data multiplexing method and related device

Country Status (2)

Country Link
CN (1) CN109740732B (en)
WO (1) WO2020134546A1 (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109740732B (en) * 2018-12-27 2021-05-11 深圳云天励飞技术有限公司 Neural network processor, convolutional neural network data multiplexing method and related equipment
CN110298441B (en) * 2019-05-24 2022-01-11 深圳云天励飞技术有限公司 Data processing method, electronic device and computer readable storage medium
CN112308217A (en) * 2019-07-31 2021-02-02 北京欣奕华科技有限公司 Convolutional neural network acceleration method and system
CN110490313B (en) * 2019-08-14 2022-03-18 中科寒武纪科技股份有限公司 Memory multiplexing method and related product thereof
CN110737473A (en) * 2019-09-24 2020-01-31 北京小米移动软件有限公司 Data processing method and device, terminal and storage medium
CN110930290B (en) * 2019-11-13 2023-07-07 东软睿驰汽车技术(沈阳)有限公司 Data processing method and device
CN112819022B (en) * 2019-11-18 2023-11-07 同方威视技术股份有限公司 Image recognition device and image recognition method based on neural network
CN111027683A (en) * 2019-12-09 2020-04-17 Oppo广东移动通信有限公司 Data processing method, data processing device, storage medium and electronic equipment
CN110956258B (en) * 2019-12-17 2023-05-16 深圳鲲云信息科技有限公司 Neural network acceleration circuit and method
CN111341306B (en) * 2020-02-14 2022-06-17 东南大学 Storage and calculation compression method for keyword awakening CNN based on speech feature multiplexing
TWI733334B (en) * 2020-02-15 2021-07-11 財團法人工業技術研究院 Convolutional neural-network calculating apparatus and operation methods thereof
CN111752879B (en) * 2020-06-22 2022-02-22 深圳鲲云信息科技有限公司 Acceleration system, method and storage medium based on convolutional neural network
CN112396165A (en) * 2020-11-30 2021-02-23 珠海零边界集成电路有限公司 Arithmetic device and method for convolutional neural network
CN113449852B (en) * 2021-08-05 2023-02-03 安谋科技(中国)有限公司 Convolutional neural network computing method, system on chip and electronic device
CN117827386A (en) * 2022-09-27 2024-04-05 北京有竹居网络技术有限公司 Scheduling method, scheduling device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108133270A (en) * 2018-01-12 2018-06-08 清华大学 Convolutional neural networks accelerating method and device
CN108229645A (en) * 2017-04-28 2018-06-29 北京市商汤科技开发有限公司 Convolution accelerates and computation processing method, device, electronic equipment and storage medium
WO2018130029A1 (en) * 2017-01-13 2018-07-19 华为技术有限公司 Calculating device and calculation method for neural network calculation
CN108416422A (en) * 2017-12-29 2018-08-17 国民技术股份有限公司 A kind of convolutional neural networks implementation method and device based on FPGA
CN109740732A (en) * 2018-12-27 2019-05-10 深圳云天励飞技术有限公司 Neural network processor, convolutional neural networks data multiplexing method and relevant device

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106844294B (en) * 2016-12-29 2019-05-03 华为机器有限公司 Convolution algorithm chip and communication equipment
CN106875011B (en) * 2017-01-12 2020-04-17 南京风兴科技有限公司 Hardware architecture of binary weight convolution neural network accelerator and calculation flow thereof
CN108573305B (en) * 2017-03-15 2020-07-24 杭州海康威视数字技术股份有限公司 Data processing method, equipment and device
CN107862374B (en) * 2017-10-30 2020-07-31 中国科学院计算技术研究所 Neural network processing system and processing method based on assembly line
CN107918794A (en) * 2017-11-15 2018-04-17 中国科学院计算技术研究所 Neural network processor based on computing array
CN108171317B (en) * 2017-11-27 2020-08-04 北京时代民芯科技有限公司 Data multiplexing convolution neural network accelerator based on SOC
CN108241890B (en) * 2018-01-29 2021-11-23 清华大学 Reconfigurable neural network acceleration method and architecture
CN108665059A (en) * 2018-05-22 2018-10-16 中国科学技术大学苏州研究院 Convolutional neural networks acceleration system based on field programmable gate array

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018130029A1 (en) * 2017-01-13 2018-07-19 华为技术有限公司 Calculating device and calculation method for neural network calculation
CN108229645A (en) * 2017-04-28 2018-06-29 北京市商汤科技开发有限公司 Convolution accelerates and computation processing method, device, electronic equipment and storage medium
CN108416422A (en) * 2017-12-29 2018-08-17 国民技术股份有限公司 A kind of convolutional neural networks implementation method and device based on FPGA
CN108133270A (en) * 2018-01-12 2018-06-08 清华大学 Convolutional neural networks accelerating method and device
CN109740732A (en) * 2018-12-27 2019-05-10 深圳云天励飞技术有限公司 Neural network processor, convolutional neural networks data multiplexing method and relevant device

Also Published As

Publication number Publication date
CN109740732B (en) 2021-05-11
CN109740732A (en) 2019-05-10

Similar Documents

Publication Publication Date Title
WO2020134546A1 (en) Neural network processor, convolutional neural network data multiplexing method and related device
CN107291356B (en) File transmission display control method and device and corresponding terminal
EP3422166B1 (en) Method and apparatus for displaying application interface, and electronic device
US9740268B2 (en) Intelligent management for an electronic device
KR101229699B1 (en) Method of moving content between applications and apparatus for the same
WO2019024642A1 (en) Process control method and apparatus, storage medium, and electronic device
US10254950B2 (en) Display method of terminal device and terminal device
US20220075516A1 (en) Display Processing Method and Electronic Device
US20150135105A1 (en) Interacting with an application
CN111258736B (en) Information processing method and device and electronic equipment
EP4224872A1 (en) Method for setting refresh rate, and related device
CN115576645B (en) Virtual processor scheduling method and device, storage medium and electronic equipment
KR20230004851A (en) Video call interface display control method, device, storage medium and device
US20110242115A1 (en) Method for performing image signal processing with aid of a graphics processing unit, and associated apparatus
CN112764563A (en) Multi-screen control method, device and system, electronic equipment and storage medium
CN111767124A (en) Request response method, device, storage medium and electronic equipment
CN110471870B (en) Multi-system operation method and device, electronic equipment and storage medium
US20230276079A1 (en) Live streaming room page jump method and apparatus, live streaming room page return method and apparatus, and electronic device
CN114222355A (en) Terminal power saving display method and device and electronic equipment
CN114071007A (en) Image processing method, multimedia processing chip and electronic equipment
US20210224011A1 (en) Selecting a display with machine learning
CN110661919B (en) Multi-user display method, device, electronic equipment and storage medium
WO2021115229A1 (en) Method and apparatus for displaying electronic book, storage medium and electronic device
CN112825021B (en) System response method and device, electronic equipment and computer readable storage medium
WO2024001490A1 (en) Window display method, electronic device and computer-readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19906034

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19906034

Country of ref document: EP

Kind code of ref document: A1