WO2021114904A1 - Data processing method and apparatus, computer device and storage medium - Google Patents

Data processing method and apparatus, computer device and storage medium Download PDF

Info

Publication number
WO2021114904A1
WO2021114904A1 PCT/CN2020/123836 CN2020123836W WO2021114904A1 WO 2021114904 A1 WO2021114904 A1 WO 2021114904A1 CN 2020123836 W CN2020123836 W CN 2020123836W WO 2021114904 A1 WO2021114904 A1 WO 2021114904A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
sub
convolution
preset
convolution result
Prior art date
Application number
PCT/CN2020/123836
Other languages
French (fr)
Chinese (zh)
Inventor
刘道福
黄迪
周诗怡
Original Assignee
中科寒武纪科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中科寒武纪科技股份有限公司 filed Critical 中科寒武纪科技股份有限公司
Publication of WO2021114904A1 publication Critical patent/WO2021114904A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Definitions

  • the present disclosure relates to the field of data processing technology, and in particular to a data processing method, device, computer equipment, and storage medium.
  • neural network algorithm is a very popular machine learning algorithm recently, and it has achieved very good results in various fields, such as image recognition, speech recognition, natural language processing, etc.
  • image recognition speech recognition
  • speech recognition natural language processing
  • the complexity of the algorithm is getting higher and higher.
  • the scale of the model is gradually increasing.
  • the embodiments of the present disclosure provide a data processing method, device, computer equipment, and storage medium that can improve hardware energy efficiency ratio, reduce computing time, and improve computing efficiency.
  • a data processing method applied to a processor includes:
  • the preset merge mode is the reverse process of the preset split mode.
  • a data processing device applied to a processor, and the device includes:
  • the splitting module is used to split the first data according to a preset splitting method to obtain multiple second data;
  • the convolution module is configured to perform the convolution operation of the second data and the weight respectively to obtain multiple first convolution results
  • a merging module configured to merge the multiple first convolution results according to a preset merging manner to obtain a hole convolution result of the first data and the weight
  • the preset merge mode is the reverse process of the preset split mode.
  • an artificial intelligence chip is provided, and the chip includes the data processing device as described in any one of the foregoing.
  • an electronic device including the aforementioned artificial intelligence chip.
  • a board card comprising: a storage device, an interface device, a control device, and the aforementioned artificial intelligence chip;
  • the artificial intelligence chip is connected to the storage device, the control device, and the interface device respectively;
  • the storage device is used to store data
  • the interface device is used to implement data transmission between the artificial intelligence chip and external equipment
  • the control device is used to monitor the state of the artificial intelligence chip.
  • an electronic device including:
  • a memory for storing processor executable instructions
  • the processor is configured to call instructions stored in the memory to execute the method described in any one of the foregoing.
  • a computer-readable storage medium having computer program instructions stored thereon, wherein the computer program instructions implement the method described in any one of the foregoing when the computer program instructions are executed by a processor .
  • the multiple obtained second data are respectively convolved with the weights, and the multiple obtained first convolution results are combined according to the preset mode
  • the preset merging method is an inverse process of the preset splitting method, a hole convolution result of the first data and the weight can be obtained after merging.
  • the energy efficiency ratio of hardware can be improved, computing time can be reduced, and computing efficiency can be improved.
  • Fig. 1 shows a schematic diagram of an exemplary hole convolution according to the present disclosure
  • Fig. 2 shows a flowchart of a data processing method according to an embodiment of the present disclosure
  • Fig. 3 shows a schematic diagram of a data processing method according to an embodiment of the present disclosure
  • Fig. 4 shows a schematic diagram of a data processing method according to an embodiment of the present disclosure
  • Fig. 5 shows a block diagram of a data processing device according to an embodiment of the present disclosure
  • Figure 6 shows a structural block diagram of a board according to an embodiment of the present disclosure
  • FIG. 7 shows a block diagram of an electronic device 800 according to an embodiment of the present disclosure
  • FIG. 8 shows a block diagram of an electronic device 1900 according to an embodiment of the present disclosure.
  • the term “if” can be interpreted as “when” or “once” or “in response to determination” or “in response to detection” depending on the context.
  • the phrase “if determined” or “if detected [described condition or event]” can be interpreted as meaning “once determined” or “in response to determination” or “once detected [described condition or event]” depending on the context ]” or “in response to detection of [condition or event described]”.
  • Hole convolution can increase the receptive field of convolution, but at the same time it will also affect the hardware energy efficiency ratio and computing time, reduce the hardware energy efficiency ratio and increase the computing time.
  • the present disclosure provides a data processing method.
  • the data processing method of the embodiment of the present disclosure can be applied to a processor, and the processor can be a general-purpose processor, such as a CPU (Central Processing Unit, central processing unit), or an artificial intelligence processor for performing artificial intelligence operations. (IPU).
  • Artificial intelligence operations can include machine learning operations, brain-like operations, and so on. Among them, machine learning operations include neural network operations, k-means operations, support vector machine operations, and so on.
  • the artificial intelligence processor may include, for example, GPU (Graphics Processing Unit), NPU (Neural-Network Processing Unit, neural network processing unit), DSP (Digital Signal Process, digital signal processing unit), field programmable gate array (Field-Programmable Gate Array, FPGA) One or a combination of chips.
  • the present disclosure does not limit the specific types of processors.
  • the processor mentioned in the present disclosure may include multiple processing units, and each processing unit can independently run various tasks assigned to it, such as convolution computing tasks and pooling tasks. Or fully connected tasks, etc.
  • the present disclosure does not limit the processing unit and the tasks run by the processing unit.
  • Fig. 2 shows a flowchart of a data processing method according to an embodiment of the present disclosure. The method may be applied to a processor. As shown in Fig. 2, the method may include:
  • step S21 the first data is split according to a preset split mode to obtain a plurality of second data.
  • the foregoing preset splitting method may be a preset method for splitting the first data.
  • the preset splitting method may split the first data into four second data, for example: For the rows and columns of the data, the first data can be split by the principle of one element apart to obtain four second data. After splitting, all the second data and the weights are the elements in the first convolution result. It is consistent with the elements in the convolution result of the first data and the weight.
  • the plurality of second data may include the first sub-data, the second sub-data, the third sub-data, and the fourth sub-data.
  • the first data is divided in a preset manner. Split to obtain multiple second data, which can include:
  • the element corresponding to the even-numbered row and the even-numbered column in the first data is determined to form the fourth sub-data.
  • the odd-numbered rows in the first data can be determined, and the elements corresponding to the odd-numbered columns in each odd-numbered row can be determined to form the first sub-data, and the elements corresponding to the even-numbered columns in each odd-numbered row can be determined to form the second sub-data.
  • Sub-data; the even-numbered row in the first data can be determined, and the element corresponding to the odd-numbered column in each even-numbered row can be determined to form the third sub-data, and the element corresponding to the even-numbered column in each even-numbered row can be determined to form the fourth sub-data data.
  • elements corresponding to odd columns of odd rows in the first data may be used to form the first sub-data
  • the element corresponding to the odd-numbered column (identified as "3" in the first data shown in Figure 3) constitutes the third sub-data; the element corresponding to the even-numbered column of the even-numbered row in the first data is determined (in the figure
  • the first data shown in 3 is identified as "4"), which constitutes the fourth sub-data.
  • step S22 the convolution operation of the second data and the weight is performed respectively to obtain multiple first convolution results.
  • the plurality of second data may be subjected to a common convolution operation with the weights respectively to obtain a plurality of first convolution results.
  • the first sub-data and the weight can be convolved to obtain the first convolution result corresponding to the first sub-data
  • the second sub-data and the weight can be convolved.
  • step S23 the multiple first convolution results are merged according to a preset merging manner to obtain a hole convolution result of the first data and the weight value
  • the preset merge mode is the reverse process of the preset split mode.
  • the foregoing preset merging method is the inverse process of the foregoing preset splitting method, that is, the hole convolution result of the first data and the weight obtained after merging is split according to the preset splitting method, and each can be obtained.
  • the first convolution result is the inverse process of the foregoing preset splitting method, that is, the hole convolution result of the first data and the weight obtained after merging is split according to the preset splitting method, and each can be obtained.
  • the first convolution result is the inverse process of the foregoing preset splitting method, that is, the hole convolution result of the first data and the weight obtained after merging is split according to the preset splitting method, and each can be obtained.
  • the above-mentioned multiple first convolution results are combined according to a preset merging manner to obtain the hole convolution result of the first data and the weight, and the first sub-convolution result
  • the elements in the first convolution result corresponding to the data are sequentially used as the elements corresponding to the odd columns of the odd rows of the hole convolution result;
  • the elements in the first convolution result corresponding to the fourth sub-data are sequentially used as the elements corresponding to the even-numbered rows and the even-numbered columns in the hole convolution result.
  • each row in the first convolution result corresponding to the first sub-data is used as the odd row of the hole convolution result in turn, and each element in each row is used as the odd row of the odd row in the hole convolution result in turn
  • each row in the first convolution result corresponding to the second sub-data is used as an odd row of the hole convolution result in turn
  • each element in each row is used as an even column of the odd row in the hole convolution result in turn.
  • the first convolution result corresponding to the first sub-data is 2*2
  • the first convolution result corresponding to the second sub-data is 1*2
  • the first convolution result corresponding to the fourth sub-data is 1*1.
  • each row of the first convolution result corresponding to the first sub-data Take each row of the first convolution result corresponding to the first sub-data as the odd-numbered column of the odd-numbered row of the hole convolution result (that is, the two columns of the first row in the first convolution result corresponding to the first sub-data).
  • the elements are respectively as the elements in the first column and the third column of the first row of the hole convolution result, and the two elements in the second row are respectively used as the elements in the first column and the third column of the third row of the hole convolution result , Marked as "1" in Figure 4).
  • the elements in the first convolution result corresponding to the second sub-data are sequentially used as the even-numbered columns of the odd rows of the hole convolution result (that is, an element in the first row of the first convolution result corresponding to the second sub-data).
  • the elements in the first convolution result corresponding to the third sub-data are sequentially used as the odd-numbered columns of the even rows of the hole convolution result (that is, an element in the first row of the first convolution result corresponding to the third sub-data).
  • the element in the first convolution result corresponding to the fourth sub-data is sequentially used as the even-numbered column of the even-numbered row of the hole convolution result (that is, an element in the first row of the first convolution result corresponding to the fourth sub-data
  • the elements in the second row and the second column as the result of the hole convolution are identified as "4" in FIG. 4).
  • the multiple obtained second data are respectively convolved with the weights, and the multiple obtained first convolution results are combined according to the preset mode
  • the preset merging method is an inverse process of the preset splitting method, a hole convolution result of the first data and the weight can be obtained after merging.
  • the energy efficiency ratio of hardware can be improved, computing time can be reduced, and computing efficiency can be improved.
  • the convolution scale that the processor can handle is: 5*5 neurons and 3*3 weights
  • the processing can only process a convolution of a 5*5 kernel and a 3*3 weight at a time, and output a convolution result.
  • the processor needs to perform 9 operations to complete this hollow convolution.
  • the processor can get 9 through one operation As a result, the hole convolution of the first data and the weight can be completed.
  • the data processing method provided by the present disclosure improves the energy efficiency ratio of hardware, reduces computing time, and improves computing efficiency.
  • the foregoing first data may include neurons and/or gradients.
  • the hole convolution can be performed through the first gradient and the weight of the current convolution layer to determine the second gradient of the next convolution layer.
  • the gradient of the current convolutional layer can be split according to the preset splitting method to obtain four first sub-gradients, and the four first sub-gradients and weights can be convolved to obtain four If the four convolution results are combined according to the preset combining method, the second gradient of the next convolution layer can be obtained.
  • the foregoing first data may include a first neuron and a first gradient
  • the splitting of the first data according to a preset splitting manner to obtain a plurality of second data may include:
  • the first gradient is split according to the preset split mode to obtain a plurality of second gradients.
  • the first neuron can be split according to a preset splitting method to obtain multiple second neurons.
  • the second neuron may include: a first sub-neuron, a second sub-neuron, a third sub-neuron, and a fourth sub-neuron, and then it can be determined that the odd-numbered row of the first neuron corresponds to the odd-numbered column
  • the elements of to form the first sub-neuron determine the elements corresponding to the odd-numbered rows and the even-numbered columns in the first neuron, form the second sub-neuron, and determine the even-numbered rows in the first neuron
  • Elements corresponding to odd-numbered columns of to form the third sub-neuron, and elements corresponding to even-numbered rows of even-numbered columns in the first neuron are determined to form the fourth sub-neuron.
  • the first gradient can be split according to a preset split mode to obtain multiple second gradients.
  • the second gradient may include: a first sub-gradient, a second sub-gradient, a third sub-gradient, and a fourth sub-gradient, then the elements corresponding to the odd-numbered columns of the odd-numbered rows of the first gradient may be determined to form the The first sub-gradient determines the elements corresponding to the odd-numbered rows and the even-numbered columns in the first gradient to form the second sub-gradient, and determines the elements corresponding to the odd-numbered columns in the even-numbered rows in the first gradient to form the The third sub-gradient determines the elements corresponding to the even-numbered rows and the even-numbered columns in the first gradient to form the fourth sub-gradient.
  • the above method may further include:
  • the parity property of the row and column corresponding to the position of the element in the second neuron in the first neuron corresponds to the position of the element in the corresponding second gradient in the first gradient
  • the parity of the rows and columns is consistent.
  • the parity properties of the row and column corresponding to the position of the element in the second gradient corresponding to the second neuron in the first gradient correspond to the position of the element in the second neuron in the first neuron
  • the rows and columns of is consistent with the parity properties, for example: all elements in the second neuron are in odd rows and odd columns in the first neuron, then all elements in the second gradient corresponding to the second neuron are in the first gradient All elements in the second neuron are in odd rows and odd columns; or, all elements in the second neuron are in odd rows and even columns in the first neuron, then all elements in the second gradient corresponding to the second neuron are in the first gradient Are located in odd rows and even columns, or all elements in the second neuron are located in even rows and odd columns in the first neuron, then all elements in the second gradient corresponding to the second neuron are in the first gradient.
  • all elements in the second neuron are in even-numbered rows and even-numbered columns in the first neuron, then all elements in the second gradient corresponding to the second neuron are located in the first gradient Even rows and even columns.
  • the second neuron includes: a first sub-neuron, a second sub-neuron, a third sub-neuron, and a fourth sub-neuron
  • the second gradient includes: the first sub-gradient, The second sub-gradient, the third sub-gradient and the fourth sub-gradient
  • the first sub-neuron corresponds to the first sub-gradient
  • the first sub-neuron performs convolution processing with the first sub-gradient to obtain the first sub-neuron
  • the second sub-neuron corresponds to the second sub-gradient
  • the second sub-neuron performs convolution processing with the second sub-gradient to obtain the convolution result corresponding to the second sub-neuron
  • the third sub-neuron The third sub-neuron corresponds to the third sub-gradient, and the third sub-neuron performs convolution processing with the third sub-gradient to obtain the convolution result corresponding to the third sub-neuron
  • the third convolution result corresponding to each second neuron is obtained, the third convolution result corresponding to each second neuron is added, and the obtained sum is determined as the residual of the weight.
  • the energy efficiency ratio of the hardware can be improved, the calculation time can be reduced, and the calculation efficiency can be improved.
  • the above method may further include:
  • the weight value is adjusted according to the residual error of the weight value.
  • the weight of the current convolutional layer can be adjusted according to the residual of the weight. For example, the sum of the residual of the weight and the weight can be determined as the new weight.
  • steps in the flowcharts of FIGS. 1-4 are displayed in sequence as indicated by the arrows, these steps are not necessarily executed in sequence in the order indicated by the arrows. Unless specifically stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least some of the steps in Figures 1-4 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. These sub-steps or stages The execution order of is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
  • Fig. 5 shows a block diagram of a data processing device according to an embodiment of the present disclosure.
  • the data processing device may include:
  • the splitting module 501 can be used to split the first data according to a preset splitting manner to obtain multiple second data;
  • the convolution module 502 may be used to perform the convolution operation of the second data and the weight respectively to obtain multiple first convolution results;
  • the merging module 503 may be used to merge the multiple first convolution results according to a preset merging manner to obtain a hole convolution result of the first data and the weight value,
  • the preset merge mode is the reverse process of the preset split mode.
  • the multiple obtained second data are respectively convolved with the weights, and the multiple obtained first convolution results are combined according to the preset mode
  • the preset merging method is an inverse process of the preset splitting method, a hole convolution result of the first data and the weight can be obtained after merging.
  • the energy efficiency ratio of hardware can be improved, computing time can be reduced, and computing efficiency can be improved.
  • the plurality of second data includes first sub-data, second sub-data, third sub-data, and fourth sub-data.
  • the aforementioned splitting module may also be used for:
  • the element corresponding to the even-numbered row and the even-numbered column in the first data is determined to form the fourth sub-data.
  • the above-mentioned merging module can also be used for:
  • the elements in the first convolution result corresponding to the fourth sub-data are sequentially used as the elements corresponding to the even-numbered rows and even-numbered columns in the hole convolution result.
  • the first data includes neurons and/or gradients.
  • the first data may include a first neuron and a first gradient
  • the above-mentioned splitting module may also be used for:
  • the first gradient is split according to the preset split mode to obtain a plurality of second gradients.
  • the foregoing apparatus may further include:
  • a processing module configured to perform a convolution operation on any of the second neuron and the corresponding second gradient to obtain a third convolution result
  • a determining module configured to determine that the sum of the third convolution results corresponding to each of the second neurons is the residual of the weight
  • the parity property of the row and column corresponding to the position of the element in the second neuron in the first neuron corresponds to the position of the element in the corresponding second gradient in the first gradient
  • the parity of the rows and columns is consistent.
  • the foregoing apparatus may further include:
  • the adjustment module is configured to adjust the weight value according to the residual error of the weight value.
  • the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
  • the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
  • the foregoing device embodiments are only illustrative, and the device of the present disclosure may also be implemented in other ways.
  • the division of units/modules in the above-mentioned embodiments is only a logical function division, and there may be other division methods in actual implementation.
  • multiple units, modules or components may be combined or integrated into another system, or some features may be omitted or not implemented.
  • the functional units/modules in the various embodiments of the present disclosure may be integrated into one unit/module, or each unit/module may exist alone physically, or two or more units/modules may exist.
  • the modules are integrated together.
  • the above-mentioned integrated unit/module can be realized in the form of hardware or software program module.
  • the hardware may be a digital circuit, an analog circuit, and so on.
  • the physical realization of the hardware structure includes but is not limited to transistors, memristors and so on.
  • the artificial intelligence processor may be any appropriate hardware processor, such as CPU, GPU, FPGA, DSP, ASIC, and so on.
  • the storage unit may be any suitable magnetic storage medium or magneto-optical storage medium, such as RRAM (Resistive Random Access Memory), DRAM (Dynamic Random Access Memory), Static random access memory SRAM (Static Random-Access Memory), enhanced dynamic random access memory EDRAM (Enhanced Dynamic Random Access Memory), high-bandwidth memory HBM (High-Bandwidth Memory), hybrid storage cube HMC (Hybrid Memory Cube), etc. Wait.
  • RRAM Resistive Random Access Memory
  • DRAM Dynamic Random Access Memory
  • Static random access memory SRAM Static Random-Access Memory
  • enhanced dynamic random access memory EDRAM Enhanced Dynamic Random Access Memory
  • high-bandwidth memory HBM High-Bandwidth Memory
  • hybrid storage cube HMC Hybrid Memory Cube
  • the integrated unit/module is implemented in the form of a software program module and sold or used as an independent product, it can be stored in a computer readable memory.
  • the technical solution of the present disclosure essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory, A number of instructions are included to enable a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the various embodiments of the present disclosure.
  • the aforementioned memory includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.
  • an artificial intelligence chip is also disclosed, which includes the above-mentioned data processing device.
  • a board card which includes a storage device, an interface device, a control device, and the above-mentioned artificial intelligence chip; wherein the artificial intelligence chip is related to the storage device and the control device.
  • the interface devices are respectively connected; the storage device is used to store data; the interface device is used to realize data transmission between the artificial intelligence chip and an external device; the control device is used to The state of the artificial intelligence chip is monitored.
  • Fig. 6 shows a structural block diagram of a board according to an embodiment of the present disclosure.
  • the board may include other supporting components in addition to the chip 389 described above.
  • the supporting components include, but are not limited to: a storage device 390 Interface device 391 and control device 392.
  • the storage device 390 is connected to the artificial intelligence chip through a bus for storing data.
  • the storage device may include multiple groups of storage units 393. Each group of the storage unit and the artificial intelligence chip are connected through a bus. It can be understood that each group of the storage units may be DDR SDRAM (English: Double Data Rate SDRAM, double-rate synchronous dynamic random access memory).
  • the storage device may include 4 groups of the storage units. Each group of the storage unit may include a plurality of DDR4 particles (chips).
  • the artificial intelligence chip may include four 72-bit DDR4 controllers. In the 72-bit DDR4 controller, 64 bits are used for data transmission and 8 bits are used for ECC verification. It can be understood that when DDR4-3200 particles are used in each group of the storage units, the theoretical bandwidth of data transmission can reach 25600MB/s.
  • each group of the storage unit includes a plurality of double-rate synchronous dynamic random access memories arranged in parallel.
  • DDR can transmit data twice in one clock cycle.
  • a controller for controlling the DDR is provided in the chip for controlling the data transmission and data storage of each storage unit.
  • the interface device is electrically connected with the artificial intelligence chip.
  • the interface device is used to implement data transmission between the artificial intelligence chip and an external device (such as a server or a computer).
  • the interface device may be a standard PCIE interface.
  • the data to be processed is transferred from the server to the chip through a standard PCIE interface to realize data transfer.
  • the interface device may also be other interfaces. The present disclosure does not limit the specific manifestations of the above other interfaces, as long as the interface unit can realize the switching function.
  • the calculation result of the artificial intelligence chip is still transmitted by the interface device back to an external device (such as a server).
  • the control device is electrically connected with the artificial intelligence chip.
  • the control device is used to monitor the state of the artificial intelligence chip.
  • the artificial intelligence chip and the control device may be electrically connected through an SPI interface.
  • the control device may include a single-chip microcomputer (Micro Controller Unit, MCU).
  • MCU Micro Controller Unit
  • the artificial intelligence chip may include multiple processing chips, multiple processing cores, or multiple processing circuits, and can drive multiple loads. Therefore, the artificial intelligence chip can be in different working states such as multi-load and light-load.
  • the control device can realize the regulation and control of the working states of multiple processing chips, multiple processing and or multiple processing circuits in the artificial intelligence chip.
  • an electronic device which includes the aforementioned artificial intelligence chip.
  • Electronic equipment includes data processing devices, robots, computers, printers, scanners, tablets, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, servers, cloud servers, cameras, cameras, projectors, watches, headsets , Mobile storage, wearable devices, vehicles, household appliances, and/or medical equipment.
  • the transportation means include airplanes, ships, and/or vehicles;
  • the household appliances include TVs, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods;
  • the medical equipment includes nuclear magnetic resonance, B-ultrasound and/or electrocardiograph.
  • the embodiments of the present disclosure also provide a computer-readable storage medium on which computer program instructions are stored, and the computer program instructions implement the above-mentioned method when executed by a processor.
  • the computer-readable storage medium may be a non-volatile computer-readable storage medium.
  • An embodiment of the present disclosure also provides an electronic device, including: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured to call the instructions stored in the memory to execute the above method.
  • the electronic device can be provided as a terminal, server or other form of device.
  • FIG. 7 shows a block diagram of an electronic device 800 according to an embodiment of the present disclosure.
  • the electronic device 800 may be a mobile phone, a computer, a digital broadcasting terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, and other terminals.
  • the electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power supply component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, and a sensor component 814 , And communication component 816.
  • the processing component 802 generally controls the overall operations of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations.
  • the processing component 802 may include one or more processors 820 to execute instructions to complete all or part of the steps of the foregoing method.
  • the processing component 802 may include one or more modules to facilitate the interaction between the processing component 802 and other components.
  • the processing component 802 may include a multimedia module to facilitate the interaction between the multimedia component 808 and the processing component 802.
  • the memory 804 is configured to store various types of data to support operations in the electronic device 800. Examples of these data include instructions for any application or method to operate on the electronic device 800, contact data, phone book data, messages, pictures, videos, etc.
  • the memory 804 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable and Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic Disk or Optical Disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EPROM erasable and Programmable Read Only Memory
  • PROM Programmable Read Only Memory
  • ROM Read Only Memory
  • Magnetic Memory Flash Memory
  • Magnetic Disk Magnetic Disk or Optical Disk.
  • the power supply component 806 provides power for various components of the electronic device 800.
  • the power supply component 806 may include a power management system, one or more power supplies, and other components associated with the generation, management, and distribution of power for the electronic device 800.
  • the multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and the user.
  • the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user.
  • the touch panel includes one or more touch sensors to sense touch, sliding, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure related to the touch or slide operation.
  • the multimedia component 808 includes a front camera and/or a rear camera. When the electronic device 800 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capabilities.
  • the audio component 810 is configured to output and/or input audio signals.
  • the audio component 810 includes a microphone (MIC), and when the electronic device 800 is in an operation mode, such as a call mode, a recording mode, and a voice recognition mode, the microphone is configured to receive an external audio signal.
  • the received audio signal may be further stored in the memory 804 or transmitted via the communication component 816.
  • the audio component 810 further includes a speaker for outputting audio signals.
  • the I/O interface 812 provides an interface between the processing component 802 and a peripheral interface module.
  • the above-mentioned peripheral interface module may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: home button, volume button, start button, and lock button.
  • the sensor component 814 includes one or more sensors for providing the electronic device 800 with various aspects of state evaluation.
  • the sensor component 814 can detect the on/off status of the electronic device 800 and the relative positioning of the components.
  • the component is the display and the keypad of the electronic device 800.
  • the sensor component 814 can also detect the electronic device 800 or the electronic device 800.
  • the position of the component changes, the presence or absence of contact between the user and the electronic device 800, the orientation or acceleration/deceleration of the electronic device 800, and the temperature change of the electronic device 800.
  • the sensor component 814 may include a proximity sensor configured to detect the presence of nearby objects when there is no physical contact.
  • the sensor component 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications.
  • the sensor component 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
  • the communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices.
  • the electronic device 800 can access a wireless network based on a communication standard, such as WiFi, 2G, or 3G, or a combination thereof.
  • the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communication component 816 further includes a near field communication (NFC) module to facilitate short-range communication.
  • the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID radio frequency identification
  • IrDA infrared data association
  • UWB ultra-wideband
  • Bluetooth Bluetooth
  • the electronic device 800 may be implemented by one or more application-specific integrated circuits (ASIC), digital signal processors (DSP), digital signal processing devices (DSPD), programmable logic devices (PLD), field-available A programmable gate array (FPGA), controller, microcontroller, microprocessor, or other electronic components are implemented to implement the above methods.
  • ASIC application-specific integrated circuits
  • DSP digital signal processors
  • DSPD digital signal processing devices
  • PLD programmable logic devices
  • FPGA field-available A programmable gate array
  • controller microcontroller, microprocessor, or other electronic components are implemented to implement the above methods.
  • a non-volatile computer-readable storage medium such as the memory 804 including computer program instructions, which can be executed by the processor 820 of the electronic device 800 to complete the foregoing method.
  • FIG. 8 shows a block diagram of an electronic device 1900 according to an embodiment of the present disclosure.
  • the electronic device 1900 may be provided as a server.
  • the electronic device 1900 includes a processing component 1922, which further includes one or more processors, and a memory resource represented by the memory 1932, for storing instructions that can be executed by the processing component 1922, such as application programs.
  • the application program stored in the memory 1932 may include one or more modules each corresponding to a set of instructions.
  • the processing component 1922 is configured to execute instructions to perform the above-described methods.
  • the electronic device 1900 may also include a power supply component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input output (I/O) interface 1958 .
  • the electronic device 1900 can operate based on an operating system stored in the memory 1932, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or the like.
  • a non-volatile computer-readable storage medium is also provided, such as the memory 1932 including computer program instructions, which can be executed by the processing component 1922 of the electronic device 1900 to complete the foregoing method.
  • the preset merge mode is the reverse process of the preset split mode.
  • the plurality of second data includes first sub-data, second sub-data, third sub-data, and fourth sub-data, and the first data is split according to presets Way to split, get multiple second data, including:
  • the element corresponding to the even-numbered row and the even-numbered column in the first data is determined to form the fourth sub-data.
  • the combining the multiple first convolution results in a preset combining manner to obtain the hole convolution result of the first data and the weight includes:
  • the elements in the first convolution result corresponding to the fourth sub-data are sequentially used as the elements corresponding to the even-numbered rows and the even-numbered columns in the hole convolution result.
  • the first data includes neurons and/or gradients.
  • the first data includes a first neuron and a first gradient
  • the first data is split according to a preset split mode to obtain multiple
  • the second data includes:
  • the first gradient is split according to the preset split mode to obtain a plurality of second gradients.
  • the parity property of the row and column corresponding to the position of the element in the second neuron in the first neuron corresponds to the position of the element in the corresponding second gradient in the first gradient
  • the parity of the rows and columns is consistent.
  • the weight value is adjusted according to the residual error of the weight value.
  • the splitting module is used to split the first data according to a preset splitting method to obtain multiple second data;
  • the convolution module is configured to perform the convolution operation of the second data and the weight respectively to obtain multiple first convolution results
  • a merging module configured to merge the multiple first convolution results according to a preset merging manner to obtain a hole convolution result of the first data and the weight
  • the preset merge mode is the reverse process of the preset split mode.
  • the splitting module is further configured to:
  • the element corresponding to the even-numbered row and the even-numbered column in the first data is determined to form the fourth sub-data.
  • the merging module is further used for:
  • the elements in the first convolution result corresponding to the fourth sub-data are sequentially used as the elements corresponding to the even-numbered rows and the even-numbered columns in the hole convolution result.
  • the device includes neurons and/or gradients.
  • the device according to any one of clauses A8 to A11, the first data includes a first neuron and a first gradient, and the splitting module is further used for:
  • the first gradient is split according to the preset split mode to obtain a plurality of second gradients.
  • a processing module configured to perform a convolution operation on any of the second neuron and the corresponding second gradient to obtain a third convolution result
  • a determining module configured to determine that the sum of the third convolution results corresponding to each of the second neurons is the residual of the weight
  • the parity property of the row and column corresponding to the position of the element in the second neuron in the first neuron corresponds to the position of the element in the corresponding second gradient in the first gradient
  • the parity of the rows and columns is consistent.
  • the adjustment module is configured to adjust the weight value according to the residual error of the weight value.
  • Clause A15 an artificial intelligence chip including the data processing device as described in Clause A8.
  • Clause A16 an electronic device including the artificial intelligence chip as described in Clause A15.
  • a board card includes: a storage device, an interface device, a control device, and the artificial intelligence chip as described in clause A15;
  • the artificial intelligence chip is connected to the storage device, the control device, and the interface device respectively;
  • the storage device is used to store data
  • the interface device is used to implement data transmission between the artificial intelligence chip and external equipment
  • the control device is used to monitor the state of the artificial intelligence chip.
  • the storage device includes: multiple groups of storage units, each group of the storage unit is connected to the artificial intelligence chip through a bus, and the storage unit is: DDR SDRAM;
  • the chip includes: a DDR controller, which is used to control the data transmission and data storage of each storage unit;
  • the interface device is: a standard PCIE interface.
  • an electronic device including:
  • a memory for storing processor executable instructions
  • the processor is configured to call instructions stored in the memory to execute the method described in any one of clauses A1 to A7.
  • Clause A19 a computer-readable storage medium with computer program instructions stored thereon, characterized in that, when the computer program instructions are executed by a processor, the method described in any one of clauses A1 to A7 is implemented.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Image Processing (AREA)
  • Complex Calculations (AREA)

Abstract

A data processing method and apparatus, a computer device and a storage medium. The computer device comprises a control module, and the control module comprises: an instruction caching unit, an instruction processing unit and a queue storage unit, wherein the instruction caching unit is used for storing a calculation instruction associated with the computation of an artificial neural network; the instruction processing unit is used for parsing the calculation instruction to obtain a plurality of computation instructions; and the queue storage unit is used for storing an instruction queue, and the instruction queue comprises: a plurality of computation instructions or calculation instructions to be executed in the sequential order of the queue. By means of the data processing method and apparatus, the computer device and the storage medium, the computation efficiency of a related product during the computation of a neural network model is improved.

Description

数据处理方法、装置、计算机设备和存储介质Data processing method, device, computer equipment and storage medium
本申请要求在2019年12月09日提交中国专利局、申请号为201911252885.3、发明名称为“数据处理方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office, the application number is 201911252885.3, and the invention title is "data processing methods, devices, computer equipment and storage media" on December 9, 2019, the entire contents of which are incorporated by reference In this application.
技术领域Technical field
本公开涉及数据处理技术领域,特别是涉及一种数据处理方法、装置、计算机设备和存储介质。The present disclosure relates to the field of data processing technology, and in particular to a data processing method, device, computer equipment, and storage medium.
背景技术Background technique
在人工智能技术领域,神经网络算法是最近非常流行的一种机器学习算法,在各种领域中都取得了非常好的效果,比如图像识别,语音识别,自然语言处理等。随着神经网络算法的发展,算法的复杂度也越来越高,为了提高识别度,模型的规模也在逐渐增大。In the field of artificial intelligence technology, neural network algorithm is a very popular machine learning algorithm recently, and it has achieved very good results in various fields, such as image recognition, speech recognition, natural language processing, etc. With the development of neural network algorithms, the complexity of the algorithm is getting higher and higher. In order to improve the recognition, the scale of the model is gradually increasing.
发明内容Summary of the invention
基于此,本公开实施例提供了一种能够提高硬件能效比,减少运算时间,提高运算效率的数据处理方法、装置、计算机设备和存储介质。Based on this, the embodiments of the present disclosure provide a data processing method, device, computer equipment, and storage medium that can improve hardware energy efficiency ratio, reduce computing time, and improve computing efficiency.
根据本公开的一方面,提供了一种数据处理方法,应用于处理器,所述方法包括:According to an aspect of the present disclosure, there is provided a data processing method applied to a processor, and the method includes:
将第一数据按照预置拆分方式进行拆分,得到多个第二数据;Split the first data according to a preset split mode to obtain multiple second data;
分别进行所述第二数据与权值的卷积操作,得到多个第一卷积结果;Performing a convolution operation of the second data and the weight respectively to obtain multiple first convolution results;
将所述多个第一卷积结果按照预置合并方式进行合并,得到所述第一数据与所述权值的空洞卷积结果,Combining the multiple first convolution results according to a preset merging manner to obtain a hole convolution result of the first data and the weight,
其中,所述预置合并方式为所述预置拆分方式的逆过程。Wherein, the preset merge mode is the reverse process of the preset split mode.
根据本公开的另一方面,提供了一种数据处理装置,应用于处理器,所述装置包括:According to another aspect of the present disclosure, there is provided a data processing device applied to a processor, and the device includes:
拆分模块,用于将第一数据按照预置拆分方式进行拆分,得到多个第二数据;The splitting module is used to split the first data according to a preset splitting method to obtain multiple second data;
卷积模块,用于分别进行所述第二数据与权值的卷积操作,得到多个第一卷积结果;The convolution module is configured to perform the convolution operation of the second data and the weight respectively to obtain multiple first convolution results;
合并模块,用于将所述多个第一卷积结果按照预置合并方式进行合并,得到所述第一数据与所述权值的空洞卷积结果,A merging module, configured to merge the multiple first convolution results according to a preset merging manner to obtain a hole convolution result of the first data and the weight,
其中,所述预置合并方式为所述预置拆分方式的逆过程。Wherein, the preset merge mode is the reverse process of the preset split mode.
根据本公开的另一方面,提供了一种人工智能芯片,所述芯片包括如前述任意一项所述的数据处理装置。According to another aspect of the present disclosure, an artificial intelligence chip is provided, and the chip includes the data processing device as described in any one of the foregoing.
根据本公开的另一方面,提供了一种电子设备,所述电子设备包括如前述的人工智能芯片。According to another aspect of the present disclosure, there is provided an electronic device including the aforementioned artificial intelligence chip.
根据本公开的另一方面,提供了一种板卡,所述板卡包括:存储器件、接口装置和控制器件以及如前述的人工智能芯片;According to another aspect of the present disclosure, there is provided a board card, the board card comprising: a storage device, an interface device, a control device, and the aforementioned artificial intelligence chip;
其中,所述人工智能芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;Wherein, the artificial intelligence chip is connected to the storage device, the control device, and the interface device respectively;
所述存储器件,用于存储数据;The storage device is used to store data;
所述接口装置,用于实现所述人工智能芯片与外部设备之间的数据传输;The interface device is used to implement data transmission between the artificial intelligence chip and external equipment;
所述控制器件,用于对所述人工智能芯片的状态进行监控。The control device is used to monitor the state of the artificial intelligence chip.
根据本公开的另一方面,提供了一种电子设备,包括:According to another aspect of the present disclosure, there is provided an electronic device including:
处理器;processor;
用于存储处理器可执行指令的存储器;A memory for storing processor executable instructions;
其中,所述处理器被配置为调用所述存储器存储的指令,以执行前述中任意一项所述的方法。Wherein, the processor is configured to call instructions stored in the memory to execute the method described in any one of the foregoing.
根据本公开的另一方面,提供了一种计算机可读存储介质,其上存储有计算机程序指令,其特征在于,所述计算机程序指令被处理器执行时实现前述中任意一项所述的方法。According to another aspect of the present disclosure, there is provided a computer-readable storage medium having computer program instructions stored thereon, wherein the computer program instructions implement the method described in any one of the foregoing when the computer program instructions are executed by a processor .
这样,将第一数据按照预置拆分方式进行拆分后,将得到的多个第二数据分别与权值进行卷积操作,并将得到的多个第一卷积结果按照预置合并方式进行合并,由于该预置合并方式为预置拆分方式的逆过程,因此合并后可以得到第一数据与权值的空洞卷积结果。根据本公开提供的数据处理方法、装置、计算机设备和存储介质,可以提高硬件能效比,减少运算时间,提高运算效率。In this way, after the first data is split according to the preset split mode, the multiple obtained second data are respectively convolved with the weights, and the multiple obtained first convolution results are combined according to the preset mode For merging, since the preset merging method is an inverse process of the preset splitting method, a hole convolution result of the first data and the weight can be obtained after merging. According to the data processing method, device, computer equipment, and storage medium provided by the present disclosure, the energy efficiency ratio of hardware can be improved, computing time can be reduced, and computing efficiency can be improved.
根据下面参考附图对示例性实施例的详细说明,本公开的其它特征及方面将变得清楚。According to the following detailed description of exemplary embodiments with reference to the accompanying drawings, other features and aspects of the present disclosure will become clear.
附图说明Description of the drawings
包含在说明书中并且构成说明书的一部分的附图与说明书一起示出了本公开的示例性实施例、特征和方面,并且用于解释本公开的原理。The drawings included in the specification and constituting a part of the specification together with the specification illustrate exemplary embodiments, features, and aspects of the present disclosure, and are used to explain the principle of the present disclosure.
图1示出根据本公开示例性的空洞卷积的示意图;Fig. 1 shows a schematic diagram of an exemplary hole convolution according to the present disclosure;
图2示出根据本公开实施例的数据处理方法的流程图;Fig. 2 shows a flowchart of a data processing method according to an embodiment of the present disclosure;
图3示出根据本公开实施例的数据处理方法的示意图;Fig. 3 shows a schematic diagram of a data processing method according to an embodiment of the present disclosure;
图4示出根据本公开实施例的数据处理方法的示意图;Fig. 4 shows a schematic diagram of a data processing method according to an embodiment of the present disclosure;
图5示出根据本公开实施例的数据处理装置的框图;Fig. 5 shows a block diagram of a data processing device according to an embodiment of the present disclosure;
图6示出根据本公开实施例的板卡的结构框图;Figure 6 shows a structural block diagram of a board according to an embodiment of the present disclosure;
图7示出根据本公开实施例的一种电子设备800的框图;FIG. 7 shows a block diagram of an electronic device 800 according to an embodiment of the present disclosure;
图8示出根据本公开实施例的一种电子设备1900的框图。FIG. 8 shows a block diagram of an electronic device 1900 according to an embodiment of the present disclosure.
具体实施方式Detailed ways
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。The technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are part of the embodiments of the present disclosure, rather than all of the embodiments. Based on the embodiments in the present disclosure, all other embodiments obtained by those skilled in the art without creative work shall fall within the protection scope of the present disclosure.
应当理解,本公开的权利要求、说明书及附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。本公开的说明书和权利要求书中使用的术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It should be understood that the terms "first", "second", "third" and "fourth" in the claims, specification and drawings of the present disclosure are used to distinguish different objects, rather than to describe a specific order. . The terms "comprising" and "comprising" used in the specification and claims of the present disclosure indicate the existence of the described features, wholes, steps, operations, elements and/or components, but do not exclude one or more other features, wholes The existence or addition of, steps, operations, elements, components, and/or their collections.
还应当理解,在此本公开说明书中所使用的术语仅仅是出于描述特定实施例的目的,而并不意在限定本公开。如在本公开说明书和权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。还应当进一步理解,在本公开说明书和权利要 求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。It should also be understood that the terms used in this specification of the present disclosure are only for the purpose of describing specific embodiments, and are not intended to limit the present disclosure. As used in the specification and claims of the present disclosure, unless the context clearly indicates otherwise, the singular forms "a", "an" and "the" are intended to include plural forms. It should be further understood that the term "and/or" used in the specification and claims of the present disclosure refers to any combination of one or more of the associated listed items and all possible combinations, and includes these combinations.
如在本说明书和权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地,短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。As used in this specification and claims, the term "if" can be interpreted as "when" or "once" or "in response to determination" or "in response to detection" depending on the context. Similarly, the phrase "if determined" or "if detected [described condition or event]" can be interpreted as meaning "once determined" or "in response to determination" or "once detected [described condition or event]" depending on the context ]" or "in response to detection of [condition or event described]".
针对于dilation rate(空洞数)为2的空洞卷积,如图1所示,在针对3*3的权值进行卷积时,可以将特征图中每间隔一个元素获取一个元素,将获取的两个元素之间间隔的元素的权重设为0,获取5*5的kernel与该权值进行对位乘操作,以此增大卷积的感受野。For the hole convolution with a dilation rate (number of holes) of 2, as shown in Figure 1, when convolving with a weight of 3*3, one element can be obtained for each interval in the feature map, and the obtained The weight of the element in the interval between the two elements is set to 0, and the 5*5 kernel is obtained and the weight is multiplied to increase the receptive field of the convolution.
通过空洞卷积可以增大卷积的感受野,但同时也会对硬件能效比和运算时间带来影响,降低了硬件能效比并增加了运算时间。Hole convolution can increase the receptive field of convolution, but at the same time it will also affect the hardware energy efficiency ratio and computing time, reduce the hardware energy efficiency ratio and increase the computing time.
本公开提供了一种数据处理方法。本公开实施例的数据处理方法可应用于处理器中,该处理器可以是通用处理器,例如CPU(Central Processing Unit,中央处理器),也可以是用于执行人工智能运算的人工智能处理器(IPU)。人工智能运算可包括机器学习运算、类脑运算等。其中,机器学习运算包括神经网络运算、k-means运算、支持向量机运算等。该人工智能处理器可例如包括GPU(Graphics Processing Unit,图形处理单元)、NPU(Neural-Network Processing Unit,神经网络处理单元)、DSP(Digital Signal Process,数字信号处理单元)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)芯片中的一种或组合。本公开对处理器的具体类型不作限制。The present disclosure provides a data processing method. The data processing method of the embodiment of the present disclosure can be applied to a processor, and the processor can be a general-purpose processor, such as a CPU (Central Processing Unit, central processing unit), or an artificial intelligence processor for performing artificial intelligence operations. (IPU). Artificial intelligence operations can include machine learning operations, brain-like operations, and so on. Among them, machine learning operations include neural network operations, k-means operations, support vector machine operations, and so on. The artificial intelligence processor may include, for example, GPU (Graphics Processing Unit), NPU (Neural-Network Processing Unit, neural network processing unit), DSP (Digital Signal Process, digital signal processing unit), field programmable gate array (Field-Programmable Gate Array, FPGA) One or a combination of chips. The present disclosure does not limit the specific types of processors.
在一种可能的实现方式中,本公开中所提及的处理器可包括多个处理单元,每个处理单元可以独立运行所分配到的各种任务,如:卷积运算任务、池化任务或全连接任务等。本公开对处理单元及处理单元所运行的任务不作限制。In a possible implementation manner, the processor mentioned in the present disclosure may include multiple processing units, and each processing unit can independently run various tasks assigned to it, such as convolution computing tasks and pooling tasks. Or fully connected tasks, etc. The present disclosure does not limit the processing unit and the tasks run by the processing unit.
图2示出根据本公开实施例的数据处理方法的流程图,该方法可以应用于处理器,如图2所示,该方法可以包括:Fig. 2 shows a flowchart of a data processing method according to an embodiment of the present disclosure. The method may be applied to a processor. As shown in Fig. 2, the method may include:
在步骤S21中,将第一数据按照预置拆分方式进行拆分,得到多个第二数据。In step S21, the first data is split according to a preset split mode to obtain a plurality of second data.
举例来说,上述预置拆分方式可以为预先设置的用于拆分第一数据的方式,该预置拆分方式可以将第一数据拆分为四个第二数据,例如:针对第一数据的行与列,可以采用间隔一个元素的原则对第一数据进行拆分,得到四个第二数据,拆分后得到的所有第二数据与权值的第一卷积结果中的元素,与第一数据与权值的卷积结果中的元素一致。For example, the foregoing preset splitting method may be a preset method for splitting the first data. The preset splitting method may split the first data into four second data, for example: For the rows and columns of the data, the first data can be split by the principle of one element apart to obtain four second data. After splitting, all the second data and the weights are the elements in the first convolution result. It is consistent with the elements in the convolution result of the first data and the weight.
在一种可能的实现方式中,上述多个第二数据可以包括第一子数据、第二子数据、第三子数据及第四子数据,所述将第一数据按照预置拆分方式进行拆分,得到多个第二数据,可以包括:In a possible implementation manner, the plurality of second data may include the first sub-data, the second sub-data, the third sub-data, and the fourth sub-data. The first data is divided in a preset manner. Split to obtain multiple second data, which can include:
遍历第一数据中的元素,确定所述第一数据中的奇数行的奇数列对应的元素,组成所述第一子数据;Traverse the elements in the first data, determine the elements corresponding to the odd columns of the odd rows in the first data, and form the first sub-data;
确定所述第一数据中的奇数行的偶数列对应的元素,组成所述第二子数据;Determining elements corresponding to odd-numbered rows and even-numbered columns in the first data to form the second sub-data;
确定所述第一数据中的偶数行的奇数列对应的元素,组成所述第三子数据;Determining elements corresponding to odd columns of even rows in the first data to form the third sub-data;
确定所述第一数据中的偶数行的偶数列对应的元素,组成所述第四子数据。The element corresponding to the even-numbered row and the even-numbered column in the first data is determined to form the fourth sub-data.
举例来说,可以确定第一数据中的奇数行,并可以确定每一奇数行中的奇数列对应的元素组成第 一子数据,可以确定每一奇数行中的偶数列对应的元素组成第二子数据;可以确定第一数据中的偶数行,并可以确定每一偶数行中的奇数列对应的元素组成第三子数据,可以确定每一偶数行中的偶数列对应的元素组成第四子数据。For example, the odd-numbered rows in the first data can be determined, and the elements corresponding to the odd-numbered columns in each odd-numbered row can be determined to form the first sub-data, and the elements corresponding to the even-numbered columns in each odd-numbered row can be determined to form the second sub-data. Sub-data; the even-numbered row in the first data can be determined, and the element corresponding to the odd-numbered column in each even-numbered row can be determined to form the third sub-data, and the element corresponding to the even-numbered column in each even-numbered row can be determined to form the fourth sub-data data.
示例性的,参照图3所示,可以将第一数据中的奇数行的奇数列对应的元素(在图3所示的第一数据中标识为“1”),组成所述第一子数据;确定第一数据中的奇数行的偶数列对应的元素(在图3所示的第一数据中标识为“2”),组成所述第二子数据;确定第一数据中的偶数行的奇数列对应的元素(在图3所示的第一数据中标识为“3”),组成所述第三子数据;确定所述第一数据中的偶数行的偶数列对应的元素(在图3所示的第一数据中标识为“4”),组成所述第四子数据。Exemplarily, referring to FIG. 3, elements corresponding to odd columns of odd rows in the first data (identified as "1" in the first data shown in FIG. 3) may be used to form the first sub-data Determine the element corresponding to the even-numbered column of the odd-numbered row in the first data (identified as "2" in the first data shown in Figure 3) to form the second sub-data; determine the even-numbered row of the first data The element corresponding to the odd-numbered column (identified as "3" in the first data shown in Figure 3) constitutes the third sub-data; the element corresponding to the even-numbered column of the even-numbered row in the first data is determined (in the figure The first data shown in 3 is identified as "4"), which constitutes the fourth sub-data.
在步骤S22中,分别进行所述第二数据与权值的卷积操作,得到多个第一卷积结果。In step S22, the convolution operation of the second data and the weight is performed respectively to obtain multiple first convolution results.
在将第一数据拆分为多个第二数据后,可以将多个第二数据分别与权值进行普通卷积操作,得到多个第一卷积结果。以上述图3所示的示例为例,可以将第一子数据与权值进行卷积操作,得到第一子数据对应的第一卷积结果,将第二子数据与权值进行卷积操作,得到第二子数据对应的第一卷积结果,将第三子数据与权值进行卷积操作,得到第三子数据对应的第一卷积结果,将第四子数据与权值进行卷积操作,得到第四子数据对应的第一卷积结果。After the first data is split into a plurality of second data, the plurality of second data may be subjected to a common convolution operation with the weights respectively to obtain a plurality of first convolution results. Taking the example shown in Figure 3 as an example, the first sub-data and the weight can be convolved to obtain the first convolution result corresponding to the first sub-data, and the second sub-data and the weight can be convolved. , Obtain the first convolution result corresponding to the second sub-data, perform convolution operation on the third sub-data and the weight, obtain the first convolution result corresponding to the third sub-data, and convolve the fourth sub-data with the weight The product operation obtains the first convolution result corresponding to the fourth sub-data.
在步骤S23中,将所述多个第一卷积结果按照预置合并方式进行合并,得到所述第一数据与所述权值的空洞卷积结果,In step S23, the multiple first convolution results are merged according to a preset merging manner to obtain a hole convolution result of the first data and the weight value,
其中,所述预置合并方式为所述预置拆分方式的逆过程。Wherein, the preset merge mode is the reverse process of the preset split mode.
举例来说,上述预置合并方式为上述预置拆分方式的逆过程,也即合并后得到的第一数据与权值的空洞卷积结果按照预置拆分方式进行拆分,可以得到各个第一卷积结果。For example, the foregoing preset merging method is the inverse process of the foregoing preset splitting method, that is, the hole convolution result of the first data and the weight obtained after merging is split according to the preset splitting method, and each can be obtained. The first convolution result.
在一种可能的实现方式中,上述将所述多个第一卷积结果按照预置合并方式进行合并,得到所述第一数据与所述权值的空洞卷积结果将所述第一子数据对应的第一卷积结果中的元素,依次作为所述空洞卷积结果的奇数行的奇数列对应的元素;In a possible implementation manner, the above-mentioned multiple first convolution results are combined according to a preset merging manner to obtain the hole convolution result of the first data and the weight, and the first sub-convolution result The elements in the first convolution result corresponding to the data are sequentially used as the elements corresponding to the odd columns of the odd rows of the hole convolution result;
将所述第二子数据对应的第一卷积结果中的元素,依次作为所述空洞卷积结果的奇数行的偶数列对应的元素;Taking the elements in the first convolution result corresponding to the second sub-data as the elements corresponding to the odd-numbered rows and even-numbered columns of the hole convolution result in sequence;
将所述第三子数据对应的第一卷积结果中的元素,依次作为所述空洞卷积结果中的偶数行的奇数列对应的元素;Taking the elements in the first convolution result corresponding to the third sub-data as the elements corresponding to the odd columns of the even rows in the hole convolution result in sequence;
将所述第四子数据对应的第一卷积结果中的元素,依次作为所述空洞卷积结果中的偶数行的偶数列对应的元素。The elements in the first convolution result corresponding to the fourth sub-data are sequentially used as the elements corresponding to the even-numbered rows and the even-numbered columns in the hole convolution result.
举例来说,将第一子数据对应的第一卷积结果中的每一行依次作为空洞卷积结果的奇数行,将每一行中的每一个元素依次作为空洞卷积结果中的奇数行的奇数列,将第二子数据对应的第一卷积结果中的每一行依次作为空洞卷积结果的奇数行,将每一行中的每一个元素依次作为空洞卷积结果中的奇数行的偶数列。将第三子数据对应的第一卷积结果中的每一行依次作为空洞卷积结果的偶数行,将每一行中的每一个元素依次作为空洞卷积结果中的偶数行的奇数列,将第四子数据对应的每一卷积结果中的每一行依次作为空洞卷积结果的偶数行,将每一行中的每一个元素依次作为空洞卷积结果中的偶数行的偶数列。For example, each row in the first convolution result corresponding to the first sub-data is used as the odd row of the hole convolution result in turn, and each element in each row is used as the odd row of the odd row in the hole convolution result in turn Column, each row in the first convolution result corresponding to the second sub-data is used as an odd row of the hole convolution result in turn, and each element in each row is used as an even column of the odd row in the hole convolution result in turn. Take each row in the first convolution result corresponding to the third sub-data as the even-numbered row of the hole convolution result in turn, and use each element in each row as the odd-numbered column of the even-numbered row in the hole convolution result in turn. Each row in each convolution result corresponding to the four sub-data is successively regarded as an even-numbered row of the hole convolution result, and each element in each row is successively regarded as an even-numbered column of the even-numbered row in the hole convolution result.
仍以上述示例为例,如图4所示,第一子数据对应的第一卷积结果为2*2,第二子数据对应的第一 卷积结果为1*2,第三子数据对应的第一卷积结果为2*1,第四子数据对应的第一卷积结果为1*1。Still taking the above example as an example, as shown in Figure 4, the first convolution result corresponding to the first sub-data is 2*2, the first convolution result corresponding to the second sub-data is 1*2, and the third sub-data corresponds to The first convolution result of is 2*1, and the first convolution result corresponding to the fourth sub-data is 1*1.
将第一子数据对应的第一卷积结果中的每一行的依次作为空洞卷积结果的奇数行的奇数列(也即第一子数据对应的第一卷积结果中的第一行的两个元素分别作为空洞卷积结果的第一行的第一列和第三列的元素,第二行的两个元素分别作为空洞卷积结果的第三行的第一列和第三列的元素,在图4中标识为“1”)。Take each row of the first convolution result corresponding to the first sub-data as the odd-numbered column of the odd-numbered row of the hole convolution result (that is, the two columns of the first row in the first convolution result corresponding to the first sub-data). The elements are respectively as the elements in the first column and the third column of the first row of the hole convolution result, and the two elements in the second row are respectively used as the elements in the first column and the third column of the third row of the hole convolution result , Marked as "1" in Figure 4).
将第二子数据对应的第一卷积结果中的元素,依次作为空洞卷积结果的奇数行的偶数列(也即第二子数据对应的第一卷积结果中的第一行的一个元素作为空洞卷积结果的第一行的第二列的元素,第二行的一个元素作为空洞卷积结果的第三行的第二列的元素,在图4中标识为“2”)。The elements in the first convolution result corresponding to the second sub-data are sequentially used as the even-numbered columns of the odd rows of the hole convolution result (that is, an element in the first row of the first convolution result corresponding to the second sub-data The element in the first row and the second column as the result of the hole convolution, and an element in the second row as the element in the second column of the third row as the result of the hole convolution, which is identified as "2" in FIG. 4).
将第三子数据对应的第一卷积结果中的元素,依次作为空洞卷积结果的偶数行的奇数列(也即第三子数据对应的第一卷积结果中的第一行的一个元素作为空洞卷积结果的第二行的第一列的元素,第二行的一个元素作为空洞卷积结果的第二行的第三列的元素,在图4中标识为“3”)。The elements in the first convolution result corresponding to the third sub-data are sequentially used as the odd-numbered columns of the even rows of the hole convolution result (that is, an element in the first row of the first convolution result corresponding to the third sub-data The element in the first column of the second row as the result of the hole convolution, and an element in the second row as the element in the third column of the second row of the hole convolution result, which is identified as "3" in FIG. 4).
将第四子数据对应的第一卷积结果中的元素,依次作为空洞卷积结果的偶数行的偶数列(也即第四子数据对应的第一卷积结果中的第一行的一个元素作为空洞卷积结果的第二行的第二列的元素,在图4中标识为“4”)。The element in the first convolution result corresponding to the fourth sub-data is sequentially used as the even-numbered column of the even-numbered row of the hole convolution result (that is, an element in the first row of the first convolution result corresponding to the fourth sub-data The elements in the second row and the second column as the result of the hole convolution are identified as "4" in FIG. 4).
这样,将第一数据按照预置拆分方式进行拆分后,将得到的多个第二数据分别与权值进行卷积操作,并将得到的多个第一卷积结果按照预置合并方式进行合并,由于该预置合并方式为预置拆分方式的逆过程,因此合并后可以得到第一数据与权值的空洞卷积结果。根据本公开提供的数据处理方法,可以提高硬件能效比,减少运算时间,提高运算效率。In this way, after the first data is split according to the preset split mode, the multiple obtained second data are respectively convolved with the weights, and the multiple obtained first convolution results are combined according to the preset mode For merging, since the preset merging method is an inverse process of the preset splitting method, a hole convolution result of the first data and the weight can be obtained after merging. According to the data processing method provided by the present disclosure, the energy efficiency ratio of hardware can be improved, computing time can be reduced, and computing efficiency can be improved.
为了使本领域技术人员更好的理解本公开的有益效果,以下通过具体示例对本公开的有益效果加以说明。In order to enable those skilled in the art to better understand the beneficial effects of the present disclosure, the following uses specific examples to illustrate the beneficial effects of the present disclosure.
假设处理器可处理的卷积规模为:5*5的神经元与3*3的权值,若如图3所示的第一数据直接与3*3的权值进行空洞卷积,则处理器每次仅可以处理一个5*5的kernel与3*3的权值的卷积,输出一个卷积结果,处理器需要进行9次运算才可以完成这个空洞卷积。Assuming that the convolution scale that the processor can handle is: 5*5 neurons and 3*3 weights, if the first data shown in Figure 3 is directly convolved with 3*3 weights, then the processing The processor can only process a convolution of a 5*5 kernel and a 3*3 weight at a time, and output a convolution result. The processor needs to perform 9 operations to complete this hollow convolution.
但将第一数据拆分为四个第二数据后,多个第二数据的规模依次为:4*4、3*4、4*3、3*3,处理器通过一次运算可以得到9个结果,即可完成第一数据与权值的空洞卷积。But after splitting the first data into four second data, the scales of multiple second data are: 4*4, 3*4, 4*3, 3*3, and the processor can get 9 through one operation As a result, the hole convolution of the first data and the weight can be completed.
可见,本公开提供的数据处理方法提高了硬件能效比,减少了运算时间,提高了运算效率。It can be seen that the data processing method provided by the present disclosure improves the energy efficiency ratio of hardware, reduces computing time, and improves computing efficiency.
在一种可能的实现方式中,上述第一数据可以包括神经元和/或梯度。In a possible implementation manner, the foregoing first data may include neurons and/or gradients.
在空洞卷积的反向传播过程中,可以通过当前卷积层的第一梯度与权值进行空洞卷积,以确定下一卷积层的第二梯度。在该过程中,可以将当前卷积层的梯度按照预置拆分方式进行拆分,得到四个第一子梯度,并分别将四个第一子梯度与权值进行卷积处理,得到四个卷积结果,将四个卷积结果按照预置合并方式进行合并,可以得到下一卷积层的第二梯度。In the back propagation process of the hole convolution, the hole convolution can be performed through the first gradient and the weight of the current convolution layer to determine the second gradient of the next convolution layer. In this process, the gradient of the current convolutional layer can be split according to the preset splitting method to obtain four first sub-gradients, and the four first sub-gradients and weights can be convolved to obtain four If the four convolution results are combined according to the preset combining method, the second gradient of the next convolution layer can be obtained.
在一种可能的实现方式中,上述第一数据可以包括第一神经元和第一梯度,所述将第一数据按照预置拆分方式进行拆分,得到多个第二数据,可以包括:In a possible implementation manner, the foregoing first data may include a first neuron and a first gradient, and the splitting of the first data according to a preset splitting manner to obtain a plurality of second data may include:
将所述第一神经元按照所述预置拆分方式进行拆分,得到多个第二神经元;Splitting the first neuron according to the preset splitting manner to obtain a plurality of second neurons;
将所述第一梯度按照所述预置拆分方式进行拆分,得到多个第二梯度。The first gradient is split according to the preset split mode to obtain a plurality of second gradients.
举例来说,可以将第一神经元按照预置拆分方式进行拆分,得到多个第二神经元。示例性的,第 二神经元可以包括:第一子神经元,第二子神经元、第三子神经元、及第四子神经元,则可以确定第一神经元的奇数行的奇数列对应的元素,组成所述第一子神经元,确定所述第一神经元中的奇数行的偶数列对应的元素,组成所述第二子神经元,确定所述第一神经元中的偶数行的奇数列对应的元素,组成所述第三子神经元,确定所述第一神经元中的偶数行的偶数列对应的元素,组成所述第四子神经元。For example, the first neuron can be split according to a preset splitting method to obtain multiple second neurons. Exemplarily, the second neuron may include: a first sub-neuron, a second sub-neuron, a third sub-neuron, and a fourth sub-neuron, and then it can be determined that the odd-numbered row of the first neuron corresponds to the odd-numbered column The elements of to form the first sub-neuron, determine the elements corresponding to the odd-numbered rows and the even-numbered columns in the first neuron, form the second sub-neuron, and determine the even-numbered rows in the first neuron Elements corresponding to odd-numbered columns of to form the third sub-neuron, and elements corresponding to even-numbered rows of even-numbered columns in the first neuron are determined to form the fourth sub-neuron.
对应的,可以对第一梯度按照预置拆分方式进行拆分,得到多个第二梯度。示例性的,第二梯度可以包括:第一子梯度,第二子梯度、第三子梯度、及第四子梯度,则可以确定第一梯度的奇数行的奇数列对应的元素,组成所述第一子梯度,确定所述第一梯度中的奇数行的偶数列对应的元素,组成所述第二子梯度,确定所述第一梯度中的偶数行的奇数列对应的元素,组成所述第三子梯度,确定所述第一梯度中的偶数行的偶数列对应的元素,组成所述第四子梯度。Correspondingly, the first gradient can be split according to a preset split mode to obtain multiple second gradients. Exemplarily, the second gradient may include: a first sub-gradient, a second sub-gradient, a third sub-gradient, and a fourth sub-gradient, then the elements corresponding to the odd-numbered columns of the odd-numbered rows of the first gradient may be determined to form the The first sub-gradient determines the elements corresponding to the odd-numbered rows and the even-numbered columns in the first gradient to form the second sub-gradient, and determines the elements corresponding to the odd-numbered columns in the even-numbered rows in the first gradient to form the The third sub-gradient determines the elements corresponding to the even-numbered rows and the even-numbered columns in the first gradient to form the fourth sub-gradient.
在一种可能的实现方式中,上述方法还可以包括:In a possible implementation manner, the above method may further include:
针对任一所述第二神经元,将该第二神经元与对应的第二梯度执行卷积操作,得到第三卷积结果;For any of the second neurons, perform a convolution operation on the second neuron and the corresponding second gradient to obtain a third convolution result;
确定各个所述第二神经元对应的第三卷积结果的和为所述权值的残差;Determining that the sum of the third convolution results corresponding to each of the second neurons is the residual of the weight;
其中,所述第二神经元中的元素在所述第一神经元中的位置对应的行与列的奇偶性质,与对应的第二梯度中的元素在所述第一梯度中的位置对应的行与列的奇偶性质相一致。Wherein, the parity property of the row and column corresponding to the position of the element in the second neuron in the first neuron corresponds to the position of the element in the corresponding second gradient in the first gradient The parity of the rows and columns is consistent.
举例来说,与第二神经元对应的第二梯度中的元素在第一梯度中的位置对应的行与列的奇偶性质,与第二神经元中的元素在第一神经元中的位置对应的行与列的奇偶性质一致,例如:第二神经元中的所有元素在第一神经元中均位于奇数行奇数列,则第二神经元对应的第二梯度中的所有元素在第一梯度中均位于奇数行奇数列;或者,第二神经元中的所有元素在第一神经元中均位于奇数行偶数列,则第二神经元对应的第二梯度中的所有元素在第一梯度中均位于奇数行偶数列,或者,第二神经元中的所有元素在第一神经元中均位于偶数行奇数列,则第二神经元对应的第二梯度中的所有元素在第一梯度中均位于偶数行奇数列;或者,第二神经元中的所有元素在第一神经元中均位于偶数行偶数列,则第二神经元对应的第二梯度中的所有元素在第一梯度中均位于偶数行偶数列。For example, the parity properties of the row and column corresponding to the position of the element in the second gradient corresponding to the second neuron in the first gradient correspond to the position of the element in the second neuron in the first neuron The rows and columns of is consistent with the parity properties, for example: all elements in the second neuron are in odd rows and odd columns in the first neuron, then all elements in the second gradient corresponding to the second neuron are in the first gradient All elements in the second neuron are in odd rows and odd columns; or, all elements in the second neuron are in odd rows and even columns in the first neuron, then all elements in the second gradient corresponding to the second neuron are in the first gradient Are located in odd rows and even columns, or all elements in the second neuron are located in even rows and odd columns in the first neuron, then all elements in the second gradient corresponding to the second neuron are in the first gradient. Located in even-numbered rows and odd-numbered columns; or, all elements in the second neuron are in even-numbered rows and even-numbered columns in the first neuron, then all elements in the second gradient corresponding to the second neuron are located in the first gradient Even rows and even columns.
示例性的,以上述示例为例,第二神经元包括:第一子神经元、第二子神经元、第三子神经元及第四子神经元,第二梯度包括:第一子梯度、第二子梯度、第三子梯度及第四子梯度,则第一子神经元对第一子梯度相对应,第一子神经元与第一子梯度执行卷积处理,得到第一子神经元对应的卷积结果;第二子神经元对第二子梯度相对应,第二子神经元与第二子梯度执行卷积处理,得到第二子神经元对应的卷积结果;第三子神经元对第三子梯度相对应,第三子神经元与第三子梯度执行卷积处理,得到第三子神经元对应的卷积结果;第四子神经元对第四子梯度相对应,第四子神经元与第四子梯度执行卷积处理,得到第四子神经元对应的卷积结果。Exemplarily, taking the above example as an example, the second neuron includes: a first sub-neuron, a second sub-neuron, a third sub-neuron, and a fourth sub-neuron, and the second gradient includes: the first sub-gradient, The second sub-gradient, the third sub-gradient and the fourth sub-gradient, the first sub-neuron corresponds to the first sub-gradient, and the first sub-neuron performs convolution processing with the first sub-gradient to obtain the first sub-neuron The corresponding convolution result; the second sub-neuron corresponds to the second sub-gradient, and the second sub-neuron performs convolution processing with the second sub-gradient to obtain the convolution result corresponding to the second sub-neuron; the third sub-neuron The third sub-neuron corresponds to the third sub-gradient, and the third sub-neuron performs convolution processing with the third sub-gradient to obtain the convolution result corresponding to the third sub-neuron; the fourth sub-neuron corresponds to the fourth sub-gradient. The four sub-neurons and the fourth sub-gradient perform convolution processing to obtain a convolution result corresponding to the fourth sub-neuron.
在得到各个第二神经元对应的第三卷积结果后,将各个第二神经元对应的第三卷积结果相加,得到的和确定为权值的残差。After the third convolution result corresponding to each second neuron is obtained, the third convolution result corresponding to each second neuron is added, and the obtained sum is determined as the residual of the weight.
这样一来,可以提高硬件能效比,减少运算时间,提高运算效率。In this way, the energy efficiency ratio of the hardware can be improved, the calculation time can be reduced, and the calculation efficiency can be improved.
在一种可能的实现方式中,上述方法还可以包括:In a possible implementation manner, the above method may further include:
根据所述权值的残差调整所述权值。The weight value is adjusted according to the residual error of the weight value.
举例来说,在确定权值的残差后,可以根据该权值的残差调整当前卷积层的权值,例如:确定权 值的残差与权值的和为新的权值。For example, after determining the residual of the weight, the weight of the current convolutional layer can be adjusted according to the residual of the weight. For example, the sum of the residual of the weight and the weight can be determined as the new weight.
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本公开并不受所描述的动作顺序的限制,因为依据本公开,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于可选实施例,所涉及的动作和模块并不一定是本公开所必须的。It should be noted that for the foregoing method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should know that the present disclosure is not limited by the described sequence of actions. Because according to the present disclosure, certain steps can be performed in other order or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are all optional embodiments, and the involved actions and modules are not necessarily required by the present disclosure.
进一步需要说明的是,虽然图1-4的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图1-4中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be further noted that although the steps in the flowcharts of FIGS. 1-4 are displayed in sequence as indicated by the arrows, these steps are not necessarily executed in sequence in the order indicated by the arrows. Unless specifically stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least some of the steps in Figures 1-4 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. These sub-steps or stages The execution order of is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
图5示出根据本公开实施例的数据处理装置的框图。如图5所示,所述数据处理装置可以包括:Fig. 5 shows a block diagram of a data processing device according to an embodiment of the present disclosure. As shown in FIG. 5, the data processing device may include:
拆分模块501,可以用于将第一数据按照预置拆分方式进行拆分,得到多个第二数据;The splitting module 501 can be used to split the first data according to a preset splitting manner to obtain multiple second data;
卷积模块502,可以用于分别进行所述第二数据与权值的卷积操作,得到多个第一卷积结果;The convolution module 502 may be used to perform the convolution operation of the second data and the weight respectively to obtain multiple first convolution results;
合并模块503,可以用于将所述多个第一卷积结果按照预置合并方式进行合并,得到所述第一数据与所述权值的空洞卷积结果,The merging module 503 may be used to merge the multiple first convolution results according to a preset merging manner to obtain a hole convolution result of the first data and the weight value,
其中,所述预置合并方式为所述预置拆分方式的逆过程。Wherein, the preset merge mode is the reverse process of the preset split mode.
这样,将第一数据按照预置拆分方式进行拆分后,将得到的多个第二数据分别与权值进行卷积操作,并将得到的多个第一卷积结果按照预置合并方式进行合并,由于该预置合并方式为预置拆分方式的逆过程,因此合并后可以得到第一数据与权值的空洞卷积结果。根据本公开提供的数据处理装置,可以提高硬件能效比,减少运算时间,提高运算效率。In this way, after the first data is split according to the preset split mode, the multiple obtained second data are respectively convolved with the weights, and the multiple obtained first convolution results are combined according to the preset mode For merging, since the preset merging method is an inverse process of the preset splitting method, a hole convolution result of the first data and the weight can be obtained after merging. According to the data processing device provided by the present disclosure, the energy efficiency ratio of hardware can be improved, computing time can be reduced, and computing efficiency can be improved.
在一种可能的实现方式中,所述多个第二数据包括第一子数据、第二子数据、第三子数据及第四子数据,上述拆分模块,还可以用于:In a possible implementation manner, the plurality of second data includes first sub-data, second sub-data, third sub-data, and fourth sub-data. The aforementioned splitting module may also be used for:
遍历第一数据中的元素,确定所述第一数据中的奇数行的奇数列对应的元素,组成所述第一子数据;Traverse the elements in the first data, determine the elements corresponding to the odd columns of the odd rows in the first data, and form the first sub-data;
确定所述第一数据中的奇数行的偶数列对应的元素,组成所述第二子数据;Determining elements corresponding to odd-numbered rows and even-numbered columns in the first data to form the second sub-data;
确定所述第一数据中的偶数行的奇数列对应的元素,组成所述第三子数据;Determining elements corresponding to odd columns of even rows in the first data to form the third sub-data;
确定所述第一数据中的偶数行的偶数列对应的元素,组成所述第四子数据。The element corresponding to the even-numbered row and the even-numbered column in the first data is determined to form the fourth sub-data.
在一种可能的实现方式中,上述合并模块,还可以用于:In a possible implementation manner, the above-mentioned merging module can also be used for:
将所述第一子数据对应的第一卷积结果中的元素,依次作为所述空洞卷积结果的奇数行的奇数列对应的元素;Taking the elements in the first convolution result corresponding to the first sub-data as the elements corresponding to the odd columns of the odd rows of the hole convolution result in sequence;
将所述第二子数据对应的第一卷积结果中的元素,依次作为所述空洞卷积结果的奇数行的偶数列对应的元素;Taking the elements in the first convolution result corresponding to the second sub-data as the elements corresponding to the odd-numbered rows and even-numbered columns of the hole convolution result in sequence;
将所述第三子数据对应的第一卷积结果中的元素,依次作为所述空洞卷积结果中的偶数行的奇数列对应的元素;Taking the elements in the first convolution result corresponding to the third sub-data as the elements corresponding to the odd columns of the even rows in the hole convolution result in sequence;
将所述第四子数据对应的第一卷积结果中的元素,依次作为所述空洞卷积结果中的偶数行的偶数 列对应的元素。The elements in the first convolution result corresponding to the fourth sub-data are sequentially used as the elements corresponding to the even-numbered rows and even-numbered columns in the hole convolution result.
在一种可能的实现方式中,所述第一数据包括神经元和/或梯度。In a possible implementation manner, the first data includes neurons and/or gradients.
在一种可能的实现方式中,所述第一数据可以包括第一神经元和第一梯度,上述拆分模块,还可以用于:In a possible implementation manner, the first data may include a first neuron and a first gradient, and the above-mentioned splitting module may also be used for:
将所述第一神经元按照所述预置拆分方式进行拆分,得到多个第二神经元;Splitting the first neuron according to the preset splitting manner to obtain a plurality of second neurons;
将所述第一梯度按照所述预置拆分方式进行拆分,得到多个第二梯度。The first gradient is split according to the preset split mode to obtain a plurality of second gradients.
在一种可能的实现方式中,上述装置还可以包括:In a possible implementation manner, the foregoing apparatus may further include:
处理模块,用于针对任一所述第二神经元,将该第二神经元与对应的第二梯度执行卷积操作,得到第三卷积结果;A processing module, configured to perform a convolution operation on any of the second neuron and the corresponding second gradient to obtain a third convolution result;
确定模块,用于确定各个所述第二神经元对应的第三卷积结果的和为所述权值的残差;A determining module, configured to determine that the sum of the third convolution results corresponding to each of the second neurons is the residual of the weight;
其中,所述第二神经元中的元素在所述第一神经元中的位置对应的行与列的奇偶性质,与对应的第二梯度中的元素在所述第一梯度中的位置对应的行与列的奇偶性质相一致。Wherein, the parity property of the row and column corresponding to the position of the element in the second neuron in the first neuron corresponds to the position of the element in the corresponding second gradient in the first gradient The parity of the rows and columns is consistent.
在一种可能的实现方式中,上述装置还可以包括:In a possible implementation manner, the foregoing apparatus may further include:
调整模块,用于根据所述权值的残差调整所述权值。The adjustment module is configured to adjust the weight value according to the residual error of the weight value.
在本公开的一些实施例中,本公开实施例提供的装置具有的功能或包含的模块可以用于执行上文方法实施例描述的方法,其具体实现和技术效果可参照上文方法实施例的描述,为了简洁,这里不再赘述。In some embodiments of the present disclosure, the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments. For specific implementation and technical effects, please refer to the above method embodiments. Description, for the sake of brevity, I will not repeat it here.
应该理解,上述的装置实施例仅是示意性的,本公开的装置还可通过其它的方式实现。例如,上述实施例中所述单元/模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。例如,多个单元、模块或组件可以结合,或者可以集成到另一个系统,或一些特征可以忽略或不执行。It should be understood that the foregoing device embodiments are only illustrative, and the device of the present disclosure may also be implemented in other ways. For example, the division of units/modules in the above-mentioned embodiments is only a logical function division, and there may be other division methods in actual implementation. For example, multiple units, modules or components may be combined or integrated into another system, or some features may be omitted or not implemented.
另外,若无特别说明,在本公开各个实施例中的各功能单元/模块可以集成在一个单元/模块中,也可以是各个单元/模块单独物理存在,也可以两个或两个以上单元/模块集成在一起。上述集成的单元/模块既可以采用硬件的形式实现,也可以采用软件程序模块的形式实现。In addition, unless otherwise specified, the functional units/modules in the various embodiments of the present disclosure may be integrated into one unit/module, or each unit/module may exist alone physically, or two or more units/modules may exist. The modules are integrated together. The above-mentioned integrated unit/module can be realized in the form of hardware or software program module.
所述集成的单元/模块如果以硬件的形式实现时,该硬件可以是数字电路,模拟电路等等。硬件结构的物理实现包括但不局限于晶体管,忆阻器等等。若无特别说明,所述人工智能处理器可以是任何适当的硬件处理器,比如CPU、GPU、FPGA、DSP和ASIC等等。若无特别说明,所述存储单元可以是任何适当的磁存储介质或者磁光存储介质,比如,阻变式存储器RRAM(Resistive Random Access Memory)、动态随机存取存储器DRAM(Dynamic Random Access Memory)、静态随机存取存储器SRAM(Static Random-Access Memory)、增强动态随机存取存储器EDRAM(Enhanced Dynamic Random Access Memory)、高带宽内存HBM(High-Bandwidth Memory)、混合存储立方HMC(Hybrid Memory Cube)等等。If the integrated unit/module is implemented in the form of hardware, the hardware may be a digital circuit, an analog circuit, and so on. The physical realization of the hardware structure includes but is not limited to transistors, memristors and so on. Unless otherwise specified, the artificial intelligence processor may be any appropriate hardware processor, such as CPU, GPU, FPGA, DSP, ASIC, and so on. Unless otherwise specified, the storage unit may be any suitable magnetic storage medium or magneto-optical storage medium, such as RRAM (Resistive Random Access Memory), DRAM (Dynamic Random Access Memory), Static random access memory SRAM (Static Random-Access Memory), enhanced dynamic random access memory EDRAM (Enhanced Dynamic Random Access Memory), high-bandwidth memory HBM (High-Bandwidth Memory), hybrid storage cube HMC (Hybrid Memory Cube), etc. Wait.
所述集成的单元/模块如果以软件程序模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储器中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(ROM, Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit/module is implemented in the form of a software program module and sold or used as an independent product, it can be stored in a computer readable memory. Based on this understanding, the technical solution of the present disclosure essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory, A number of instructions are included to enable a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the various embodiments of the present disclosure. The aforementioned memory includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.
在一种可能的实现方式中,还公开了一种人工智能芯片,其包括了上述数据处理装置。In a possible implementation manner, an artificial intelligence chip is also disclosed, which includes the above-mentioned data processing device.
在一种可能的实现方式中,还公开了一种板卡,其包括存储器件、接口装置和控制器件以及上述人工智能芯片;其中,所述人工智能芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;所述存储器件,用于存储数据;所述接口装置,用于实现所述人工智能芯片与外部设备之间的数据传输;所述控制器件,用于对所述人工智能芯片的状态进行监控。In a possible implementation manner, a board card is also disclosed, which includes a storage device, an interface device, a control device, and the above-mentioned artificial intelligence chip; wherein the artificial intelligence chip is related to the storage device and the control device. And the interface devices are respectively connected; the storage device is used to store data; the interface device is used to realize data transmission between the artificial intelligence chip and an external device; the control device is used to The state of the artificial intelligence chip is monitored.
图6示出根据本公开实施例的板卡的结构框图,参阅图6,上述板卡除了包括上述芯片389以外,还可以包括其他的配套部件,该配套部件包括但不限于:存储器件390、接口装置391和控制器件392。Fig. 6 shows a structural block diagram of a board according to an embodiment of the present disclosure. Referring to Fig. 6, the board may include other supporting components in addition to the chip 389 described above. The supporting components include, but are not limited to: a storage device 390 Interface device 391 and control device 392.
所述存储器件390与所述人工智能芯片通过总线连接,用于存储数据。所述存储器件可以包括多组存储单元393。每一组所述存储单元与所述人工智能芯片通过总线连接。可以理解,每一组所述存储单元可以是DDR SDRAM(英文:Double Data Rate SDRAM,双倍速率同步动态随机存储器)。The storage device 390 is connected to the artificial intelligence chip through a bus for storing data. The storage device may include multiple groups of storage units 393. Each group of the storage unit and the artificial intelligence chip are connected through a bus. It can be understood that each group of the storage units may be DDR SDRAM (English: Double Data Rate SDRAM, double-rate synchronous dynamic random access memory).
DDR不需要提高时钟频率就能加倍提高SDRAM的速度。DDR允许在时钟脉冲的上升沿和下降沿读出数据。DDR的速度是标准SDRAM的两倍。在一个实施例中,所述存储装置可以包括4组所述存储单元。每一组所述存储单元可以包括多个DDR4颗粒(芯片)。在一个实施例中,所述人工智能芯片内部可以包括4个72位DDR4控制器,上述72位DDR4控制器中64bit用于传输数据,8bit用于ECC校验。可以理解,当每一组所述存储单元中采用DDR4-3200颗粒时,数据传输的理论带宽可达到25600MB/s。DDR does not need to increase the clock frequency to double the speed of SDRAM. DDR allows data to be read on the rising and falling edges of the clock pulse. The speed of DDR is twice that of standard SDRAM. In an embodiment, the storage device may include 4 groups of the storage units. Each group of the storage unit may include a plurality of DDR4 particles (chips). In one embodiment, the artificial intelligence chip may include four 72-bit DDR4 controllers. In the 72-bit DDR4 controller, 64 bits are used for data transmission and 8 bits are used for ECC verification. It can be understood that when DDR4-3200 particles are used in each group of the storage units, the theoretical bandwidth of data transmission can reach 25600MB/s.
在一个实施例中,每一组所述存储单元包括多个并联设置的双倍速率同步动态随机存储器。DDR在一个时钟周期内可以传输两次数据。在所述芯片中设置控制DDR的控制器,用于对每个所述存储单元的数据传输与数据存储的控制。In one embodiment, each group of the storage unit includes a plurality of double-rate synchronous dynamic random access memories arranged in parallel. DDR can transmit data twice in one clock cycle. A controller for controlling the DDR is provided in the chip for controlling the data transmission and data storage of each storage unit.
所述接口装置与所述人工智能芯片电连接。所述接口装置用于实现所述人工智能芯片与外部设备(例如服务器或计算机)之间的数据传输。例如在一个实施例中,所述接口装置可以为标准PCIE接口。比如,待处理的数据由服务器通过标准PCIE接口传递至所述芯片,实现数据转移。优选的,当采用PCIE3.0X 16接口传输时,理论带宽可达到16000MB/s。在另一个实施例中,所述接口装置还可以是其他的接口,本公开并不限制上述其他的接口的具体表现形式,所述接口单元能够实现转接功能即可。另外,所述人工智能芯片的计算结果仍由所述接口装置传送回外部设备(例如服务器)。The interface device is electrically connected with the artificial intelligence chip. The interface device is used to implement data transmission between the artificial intelligence chip and an external device (such as a server or a computer). For example, in one embodiment, the interface device may be a standard PCIE interface. For example, the data to be processed is transferred from the server to the chip through a standard PCIE interface to realize data transfer. Preferably, when the PCIE3.0X 16 interface is used for transmission, the theoretical bandwidth can reach 16000MB/s. In another embodiment, the interface device may also be other interfaces. The present disclosure does not limit the specific manifestations of the above other interfaces, as long as the interface unit can realize the switching function. In addition, the calculation result of the artificial intelligence chip is still transmitted by the interface device back to an external device (such as a server).
所述控制器件与所述人工智能芯片电连接。所述控制器件用于对所述人工智能芯片的状态进行监控。具体的,所述人工智能芯片与所述控制器件可以通过SPI接口电连接。所述控制器件可以包括单片机(Micro Controller Unit,MCU)。如所述人工智能芯片可以包括多个处理芯片、多个处理核或多个处理电路,可以带动多个负载。因此,所述人工智能芯片可以处于多负载和轻负载等不同的工作状态。通过所述控制装置可以实现对所述人工智能芯片中多个处理芯片、多个处理和或多个处理电路的工作状态的调控。The control device is electrically connected with the artificial intelligence chip. The control device is used to monitor the state of the artificial intelligence chip. Specifically, the artificial intelligence chip and the control device may be electrically connected through an SPI interface. The control device may include a single-chip microcomputer (Micro Controller Unit, MCU). For example, the artificial intelligence chip may include multiple processing chips, multiple processing cores, or multiple processing circuits, and can drive multiple loads. Therefore, the artificial intelligence chip can be in different working states such as multi-load and light-load. The control device can realize the regulation and control of the working states of multiple processing chips, multiple processing and or multiple processing circuits in the artificial intelligence chip.
在一种可能的实现方式中,公开了一种电子设备,其包括了上述人工智能芯片。电子设备包括数据处理装置、机器人、电脑、打印机、扫描仪、平板电脑、智能终端、手机、行车记录仪、导航仪、传感器、摄像头、服务器、云端服务器、相机、摄像机、投影仪、手表、耳机、移动存储、可穿戴设备、交通工具、家用电器、和/或医疗设备。所述交通工具包括飞机、轮船和/或车辆;所述家用电器 包括电视、空调、微波炉、冰箱、电饭煲、加湿器、洗衣机、电灯、燃气灶、油烟机;所述医疗设备包括核磁共振仪、B超仪和/或心电图仪。In a possible implementation manner, an electronic device is disclosed, which includes the aforementioned artificial intelligence chip. Electronic equipment includes data processing devices, robots, computers, printers, scanners, tablets, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, servers, cloud servers, cameras, cameras, projectors, watches, headsets , Mobile storage, wearable devices, vehicles, household appliances, and/or medical equipment. The transportation means include airplanes, ships, and/or vehicles; the household appliances include TVs, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods; the medical equipment includes nuclear magnetic resonance, B-ultrasound and/or electrocardiograph.
本公开实施例还提出一种计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现上述方法。计算机可读存储介质可以是非易失性计算机可读存储介质。The embodiments of the present disclosure also provide a computer-readable storage medium on which computer program instructions are stored, and the computer program instructions implement the above-mentioned method when executed by a processor. The computer-readable storage medium may be a non-volatile computer-readable storage medium.
本公开实施例还提出一种电子设备,包括:处理器;用于存储处理器可执行指令的存储器;其中,所述处理器被配置为调用所述存储器存储的指令,以执行上述方法。An embodiment of the present disclosure also provides an electronic device, including: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured to call the instructions stored in the memory to execute the above method.
电子设备可以被提供为终端、服务器或其它形态的设备。The electronic device can be provided as a terminal, server or other form of device.
图7示出根据本公开实施例的一种电子设备800的框图。例如,电子设备800可以是移动电话,计算机,数字广播终端,消息收发设备,游戏控制台,平板设备,医疗设备,健身设备,个人数字助理等终端。FIG. 7 shows a block diagram of an electronic device 800 according to an embodiment of the present disclosure. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcasting terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, and other terminals.
参照图7,电子设备800可以包括以下一个或多个组件:处理组件802,存储器804,电源组件806,多媒体组件808,音频组件810,输入/输出(I/O)的接口812,传感器组件814,以及通信组件816。7, the electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power supply component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, and a sensor component 814 , And communication component 816.
处理组件802通常控制电子设备800的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理组件802可以包括一个或多个处理器820来执行指令,以完成上述的方法的全部或部分步骤。此外,处理组件802可以包括一个或多个模块,便于处理组件802和其他组件之间的交互。例如,处理组件802可以包括多媒体模块,以方便多媒体组件808和处理组件802之间的交互。The processing component 802 generally controls the overall operations of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 802 may include one or more processors 820 to execute instructions to complete all or part of the steps of the foregoing method. In addition, the processing component 802 may include one or more modules to facilitate the interaction between the processing component 802 and other components. For example, the processing component 802 may include a multimedia module to facilitate the interaction between the multimedia component 808 and the processing component 802.
存储器804被配置为存储各种类型的数据以支持在电子设备800的操作。这些数据的示例包括用于在电子设备800上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器804可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。The memory 804 is configured to store various types of data to support operations in the electronic device 800. Examples of these data include instructions for any application or method to operate on the electronic device 800, contact data, phone book data, messages, pictures, videos, etc. The memory 804 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable and Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic Disk or Optical Disk.
电源组件806为电子设备800的各种组件提供电力。电源组件806可以包括电源管理系统,一个或多个电源,及其他与为电子设备800生成、管理和分配电力相关联的组件。The power supply component 806 provides power for various components of the electronic device 800. The power supply component 806 may include a power management system, one or more power supplies, and other components associated with the generation, management, and distribution of power for the electronic device 800.
多媒体组件808包括在所述电子设备800和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与所述触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件808包括一个前置摄像头和/或后置摄像头。当电子设备800处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user. The touch panel includes one or more touch sensors to sense touch, sliding, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure related to the touch or slide operation. In some embodiments, the multimedia component 808 includes a front camera and/or a rear camera. When the electronic device 800 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capabilities.
音频组件810被配置为输出和/或输入音频信号。例如,音频组件810包括一个麦克风(MIC),当电子设备800处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器804或经由通信组件816发送。在一些实施例中,音频组件810还包括一个扬声器,用于输出音频信号。The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a microphone (MIC), and when the electronic device 800 is in an operation mode, such as a call mode, a recording mode, and a voice recognition mode, the microphone is configured to receive an external audio signal. The received audio signal may be further stored in the memory 804 or transmitted via the communication component 816. In some embodiments, the audio component 810 further includes a speaker for outputting audio signals.
I/O接口812为处理组件802和外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击 轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。The I/O interface 812 provides an interface between the processing component 802 and a peripheral interface module. The above-mentioned peripheral interface module may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: home button, volume button, start button, and lock button.
传感器组件814包括一个或多个传感器,用于为电子设备800提供各个方面的状态评估。例如,传感器组件814可以检测到电子设备800的打开/关闭状态,组件的相对定位,例如所述组件为电子设备800的显示器和小键盘,传感器组件814还可以检测电子设备800或电子设备800一个组件的位置改变,用户与电子设备800接触的存在或不存在,电子设备800方位或加速/减速和电子设备800的温度变化。传感器组件814可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件814还可以包括光传感器,如CMOS或CCD图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件814还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。The sensor component 814 includes one or more sensors for providing the electronic device 800 with various aspects of state evaluation. For example, the sensor component 814 can detect the on/off status of the electronic device 800 and the relative positioning of the components. For example, the component is the display and the keypad of the electronic device 800. The sensor component 814 can also detect the electronic device 800 or the electronic device 800. The position of the component changes, the presence or absence of contact between the user and the electronic device 800, the orientation or acceleration/deceleration of the electronic device 800, and the temperature change of the electronic device 800. The sensor component 814 may include a proximity sensor configured to detect the presence of nearby objects when there is no physical contact. The sensor component 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor component 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
通信组件816被配置为便于电子设备800和其他设备之间有线或无线方式的通信。电子设备800可以接入基于通信标准的无线网络,如WiFi,2G或3G,或它们的组合。在一个示例性实施例中,通信组件816经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信组件816还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 can access a wireless network based on a communication standard, such as WiFi, 2G, or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a near field communication (NFC) module to facilitate short-range communication. For example, the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
在示例性实施例中,电子设备800可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述方法。In an exemplary embodiment, the electronic device 800 may be implemented by one or more application-specific integrated circuits (ASIC), digital signal processors (DSP), digital signal processing devices (DSPD), programmable logic devices (PLD), field-available A programmable gate array (FPGA), controller, microcontroller, microprocessor, or other electronic components are implemented to implement the above methods.
在示例性实施例中,还提供了一种非易失性计算机可读存储介质,例如包括计算机程序指令的存储器804,上述计算机程序指令可由电子设备800的处理器820执行以完成上述方法。In an exemplary embodiment, there is also provided a non-volatile computer-readable storage medium, such as the memory 804 including computer program instructions, which can be executed by the processor 820 of the electronic device 800 to complete the foregoing method.
图8示出根据本公开实施例的一种电子设备1900的框图。例如,电子设备1900可以被提供为一服务器。参照图8,电子设备1900包括处理组件1922,其进一步包括一个或多个处理器,以及由存储器1932所代表的存储器资源,用于存储可由处理组件1922执行的指令,例如应用程序。存储器1932中存储的应用程序可以包括一个或一个以上的每一个对应于一组指令的模块。此外,处理组件1922被配置为执行指令,以执行上述方法。FIG. 8 shows a block diagram of an electronic device 1900 according to an embodiment of the present disclosure. For example, the electronic device 1900 may be provided as a server. 8, the electronic device 1900 includes a processing component 1922, which further includes one or more processors, and a memory resource represented by the memory 1932, for storing instructions that can be executed by the processing component 1922, such as application programs. The application program stored in the memory 1932 may include one or more modules each corresponding to a set of instructions. In addition, the processing component 1922 is configured to execute instructions to perform the above-described methods.
电子设备1900还可以包括一个电源组件1926被配置为执行电子设备1900的电源管理,一个有线或无线网络接口1950被配置为将电子设备1900连接到网络,和一个输入输出(I/O)接口1958。电子设备1900可以操作基于存储在存储器1932的操作系统,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM或类似。The electronic device 1900 may also include a power supply component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input output (I/O) interface 1958 . The electronic device 1900 can operate based on an operating system stored in the memory 1932, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or the like.
在示例性实施例中,还提供了一种非易失性计算机可读存储介质,例如包括计算机程序指令的存储器1932,上述计算机程序指令可由电子设备1900的处理组件1922执行以完成上述方法。In an exemplary embodiment, a non-volatile computer-readable storage medium is also provided, such as the memory 1932 including computer program instructions, which can be executed by the processing component 1922 of the electronic device 1900 to complete the foregoing method.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。上述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail in an embodiment, reference may be made to related descriptions of other embodiments. The technical features of the above-mentioned embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the various technical features in the above-mentioned embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should all be combined. It is considered as the range described in this specification.
依据以下条款可更好地理解前述内容:The foregoing can be better understood according to the following clauses:
条款A1,一种数据处理方法,应用于处理器,所述方法包括:Clause A1, a data processing method, applied to a processor, the method including:
将第一数据按照预置拆分方式进行拆分,得到多个第二数据;Split the first data according to a preset split mode to obtain multiple second data;
分别进行所述第二数据与权值的卷积操作,得到多个第一卷积结果;Performing a convolution operation of the second data and the weight respectively to obtain multiple first convolution results;
将所述多个第一卷积结果按照预置合并方式进行合并,得到所述第一数据与所述权值的空洞卷积结果,Combining the multiple first convolution results according to a preset merging manner to obtain a hole convolution result of the first data and the weight,
其中,所述预置合并方式为所述预置拆分方式的逆过程。Wherein, the preset merge mode is the reverse process of the preset split mode.
条款A2,根据条款A1所述的方法,所述多个第二数据包括第一子数据、第二子数据、第三子数据及第四子数据,所述将第一数据按照预置拆分方式进行拆分,得到多个第二数据,包括:Clause A2, according to the method of clause A1, the plurality of second data includes first sub-data, second sub-data, third sub-data, and fourth sub-data, and the first data is split according to presets Way to split, get multiple second data, including:
遍历第一数据中的元素,确定所述第一数据中的奇数行的奇数列对应的元素,组成所述第一子数据;Traverse the elements in the first data, determine the elements corresponding to the odd columns of the odd rows in the first data, and form the first sub-data;
确定所述第一数据中的奇数行的偶数列对应的元素,组成所述第二子数据;Determining elements corresponding to odd-numbered rows and even-numbered columns in the first data to form the second sub-data;
确定所述第一数据中的偶数行的奇数列对应的元素,组成所述第三子数据;Determining elements corresponding to odd columns of even rows in the first data to form the third sub-data;
确定所述第一数据中的偶数行的偶数列对应的元素,组成所述第四子数据。The element corresponding to the even-numbered row and the even-numbered column in the first data is determined to form the fourth sub-data.
条款A3,根据条款A2所述的方法,所述将所述多个第一卷积结果按照预置合并方式进行合并,得到所述第一数据与所述权值的空洞卷积结果,包括:Clause A3, according to the method of clause A2, the combining the multiple first convolution results in a preset combining manner to obtain the hole convolution result of the first data and the weight includes:
将所述第一子数据对应的第一卷积结果中的元素,依次作为所述空洞卷积结果的奇数行的奇数列对应的元素;Taking the elements in the first convolution result corresponding to the first sub-data as the elements corresponding to the odd columns of the odd rows of the hole convolution result in sequence;
将所述第二子数据对应的第一卷积结果中的元素,依次作为所述空洞卷积结果的奇数行的偶数列对应的元素;Taking the elements in the first convolution result corresponding to the second sub-data as the elements corresponding to the odd-numbered rows and even-numbered columns of the hole convolution result in sequence;
将所述第三子数据对应的第一卷积结果中的元素,依次作为所述空洞卷积结果中的偶数行的奇数列对应的元素;Taking the elements in the first convolution result corresponding to the third sub-data as the elements corresponding to the odd columns of the even rows in the hole convolution result in sequence;
将所述第四子数据对应的第一卷积结果中的元素,依次作为所述空洞卷积结果中的偶数行的偶数列对应的元素。The elements in the first convolution result corresponding to the fourth sub-data are sequentially used as the elements corresponding to the even-numbered rows and the even-numbered columns in the hole convolution result.
条款A4,根据条款A1至A3中任一项所述的方法,所述第一数据包括神经元和/或梯度。Clause A4, according to the method of any one of clauses A1 to A3, the first data includes neurons and/or gradients.
条款A5,根据条款A1至A4中任一项所述的方法,所述第一数据包括第一神经元和第一梯度,所述将第一数据按照预置拆分方式进行拆分,得到多个第二数据,包括:Clause A5, according to the method of any one of clauses A1 to A4, the first data includes a first neuron and a first gradient, and the first data is split according to a preset split mode to obtain multiple The second data includes:
将所述第一神经元按照所述预置拆分方式进行拆分,得到多个第二神经元;Splitting the first neuron according to the preset splitting manner to obtain a plurality of second neurons;
将所述第一梯度按照所述预置拆分方式进行拆分,得到多个第二梯度。The first gradient is split according to the preset split mode to obtain a plurality of second gradients.
条款A6,根据条款A5所述的方法,所述方法还包括:Clause A6, the method according to clause A5, the method further includes:
针对任一所述第二神经元,将该第二神经元与对应的第二梯度执行卷积操作,得到第三卷积结果;For any of the second neurons, perform a convolution operation on the second neuron and the corresponding second gradient to obtain a third convolution result;
确定各个所述第二神经元对应的第三卷积结果的和为所述权值的残差;Determining that the sum of the third convolution results corresponding to each of the second neurons is the residual of the weight;
其中,所述第二神经元中的元素在所述第一神经元中的位置对应的行与列的奇偶性质,与对应的第二梯度中的元素在所述第一梯度中的位置对应的行与列的奇偶性质相一致。Wherein, the parity property of the row and column corresponding to the position of the element in the second neuron in the first neuron corresponds to the position of the element in the corresponding second gradient in the first gradient The parity of the rows and columns is consistent.
条款A7,根据条款A6所述的方法,所述方法还包括:Clause A7, the method according to clause A6, the method further comprising:
根据所述权值的残差调整所述权值。The weight value is adjusted according to the residual error of the weight value.
条款A8,一种数据处理装置,应用于处理器,所述装置包括:Clause A8, a data processing device applied to a processor, the device comprising:
拆分模块,用于将第一数据按照预置拆分方式进行拆分,得到多个第二数据;The splitting module is used to split the first data according to a preset splitting method to obtain multiple second data;
卷积模块,用于分别进行所述第二数据与权值的卷积操作,得到多个第一卷积结果;The convolution module is configured to perform the convolution operation of the second data and the weight respectively to obtain multiple first convolution results;
合并模块,用于将所述多个第一卷积结果按照预置合并方式进行合并,得到所述第一数据与所述权值的空洞卷积结果,A merging module, configured to merge the multiple first convolution results according to a preset merging manner to obtain a hole convolution result of the first data and the weight,
其中,所述预置合并方式为所述预置拆分方式的逆过程。Wherein, the preset merge mode is the reverse process of the preset split mode.
条款A9,根据条款A8所述的装置,所述多个第二数据包括第一子数据、第二子数据、第三子数据及第四子数据,所述拆分模块,还用于:Clause A9, the device according to clause A8, wherein the plurality of second data includes first sub-data, second sub-data, third sub-data, and fourth sub-data, and the splitting module is further configured to:
遍历第一数据中的元素,确定所述第一数据中的奇数行的奇数列对应的元素,组成所述第一子数据;Traverse the elements in the first data, determine the elements corresponding to the odd columns of the odd rows in the first data, and form the first sub-data;
确定所述第一数据中的奇数行的偶数列对应的元素,组成所述第二子数据;Determining elements corresponding to odd-numbered rows and even-numbered columns in the first data to form the second sub-data;
确定所述第一数据中的偶数行的奇数列对应的元素,组成所述第三子数据;Determining elements corresponding to odd columns of even rows in the first data to form the third sub-data;
确定所述第一数据中的偶数行的偶数列对应的元素,组成所述第四子数据。The element corresponding to the even-numbered row and the even-numbered column in the first data is determined to form the fourth sub-data.
条款A10,根据条款A9所述的装置,所述合并模块,还用于:Clause A10, the device according to clause A9, the merging module is further used for:
将所述第一子数据对应的第一卷积结果中的元素,依次作为所述空洞卷积结果的奇数行的奇数列对应的元素;Taking the elements in the first convolution result corresponding to the first sub-data as the elements corresponding to the odd columns of the odd rows of the hole convolution result in sequence;
将所述第二子数据对应的第一卷积结果中的元素,依次作为所述空洞卷积结果的奇数行的偶数列对应的元素;Taking the elements in the first convolution result corresponding to the second sub-data as the elements corresponding to the odd-numbered rows and even-numbered columns of the hole convolution result in sequence;
将所述第三子数据对应的第一卷积结果中的元素,依次作为所述空洞卷积结果中的偶数行的奇数列对应的元素;Taking the elements in the first convolution result corresponding to the third sub-data as the elements corresponding to the odd columns of the even rows in the hole convolution result in sequence;
将所述第四子数据对应的第一卷积结果中的元素,依次作为所述空洞卷积结果中的偶数行的偶数列对应的元素。The elements in the first convolution result corresponding to the fourth sub-data are sequentially used as the elements corresponding to the even-numbered rows and the even-numbered columns in the hole convolution result.
条款A11,根据条款A8至A10中任一项所述的装置,所述第一数据包括神经元和/或梯度。Clause A11, the device according to any one of clauses A8 to A10, the first data includes neurons and/or gradients.
条款A12,根据条款A8至A11中任一项所述的装置,所述第一数据包括第一神经元和第一梯度,所述拆分模块,还用于:Clause A12, the device according to any one of clauses A8 to A11, the first data includes a first neuron and a first gradient, and the splitting module is further used for:
将所述第一神经元按照所述预置拆分方式进行拆分,得到多个第二神经元;Splitting the first neuron according to the preset splitting manner to obtain a plurality of second neurons;
将所述第一梯度按照所述预置拆分方式进行拆分,得到多个第二梯度。The first gradient is split according to the preset split mode to obtain a plurality of second gradients.
条款A13,根据条款A12所述的装置,所述装置还包括:Clause A13, the device according to clause A12, the device further comprising:
处理模块,用于针对任一所述第二神经元,将该第二神经元与对应的第二梯度执行卷积操作,得到第三卷积结果;A processing module, configured to perform a convolution operation on any of the second neuron and the corresponding second gradient to obtain a third convolution result;
确定模块,用于确定各个所述第二神经元对应的第三卷积结果的和为所述权值的残差;A determining module, configured to determine that the sum of the third convolution results corresponding to each of the second neurons is the residual of the weight;
其中,所述第二神经元中的元素在所述第一神经元中的位置对应的行与列的奇偶性质,与对应的第二梯度中的元素在所述第一梯度中的位置对应的行与列的奇偶性质相一致。Wherein, the parity property of the row and column corresponding to the position of the element in the second neuron in the first neuron corresponds to the position of the element in the corresponding second gradient in the first gradient The parity of the rows and columns is consistent.
条款A14,根据条款A13所述的装置,所述装置还包括:Clause A14, the device according to clause A13, the device further comprising:
调整模块,用于根据所述权值的残差调整所述权值。The adjustment module is configured to adjust the weight value according to the residual error of the weight value.
条款A15,一种人工智能芯片,所述芯片包括如条款A8所述的数据处理装置。Clause A15, an artificial intelligence chip including the data processing device as described in Clause A8.
条款A16,一种电子设备,所述电子设备包括如条款A15所述的人工智能芯片。Clause A16, an electronic device including the artificial intelligence chip as described in Clause A15.
条款A17,一种板卡,所述板卡包括:存储器件、接口装置和控制器件以及如条款A15所述的人工智能芯片;Clause A17, a board card, the board card includes: a storage device, an interface device, a control device, and the artificial intelligence chip as described in clause A15;
其中,所述人工智能芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;Wherein, the artificial intelligence chip is connected to the storage device, the control device, and the interface device respectively;
所述存储器件,用于存储数据;The storage device is used to store data;
所述接口装置,用于实现所述人工智能芯片与外部设备之间的数据传输;The interface device is used to implement data transmission between the artificial intelligence chip and external equipment;
所述控制器件,用于对所述人工智能芯片的状态进行监控。The control device is used to monitor the state of the artificial intelligence chip.
条款A18,根据条款A17所述的板卡,所述存储器件包括:多组存储单元,每一组所述存储单元与所述人工智能芯片通过总线连接,所述存储单元为:DDR SDRAM;Clause A18, the board according to clause A17, the storage device includes: multiple groups of storage units, each group of the storage unit is connected to the artificial intelligence chip through a bus, and the storage unit is: DDR SDRAM;
所述芯片包括:DDR控制器,用于对每个所述存储单元的数据传输与数据存储的控制;The chip includes: a DDR controller, which is used to control the data transmission and data storage of each storage unit;
所述接口装置为:标准PCIE接口。The interface device is: a standard PCIE interface.
条款A18,一种电子设备,包括:Clause A18, an electronic device, including:
处理器;processor;
用于存储处理器可执行指令的存储器;A memory for storing processor executable instructions;
其中,所述处理器被配置为调用所述存储器存储的指令,以执行条款A1至A7中任意一项所述的方法。Wherein, the processor is configured to call instructions stored in the memory to execute the method described in any one of clauses A1 to A7.
条款A19,一种计算机可读存储介质,其上存储有计算机程序指令,其特征在于,所述计算机程序指令被处理器执行时实现条款A1至A7中任意一项所述的方法。Clause A19, a computer-readable storage medium with computer program instructions stored thereon, characterized in that, when the computer program instructions are executed by a processor, the method described in any one of clauses A1 to A7 is implemented.
以上对本公开实施例进行了详细介绍,本文中应用了具体个例对本公开的原理及实施方式进行了阐述,以上实施例的说明仅用于帮助理解本公开的方法及其核心思想。同时,本领域技术人员依据本公开的思想,基于本公开的具体实施方式及应用范围上做出的改变或变形之处,都属于本公开保护的范围。综上所述,本说明书内容不应理解为对本公开的限制。The embodiments of the present disclosure are described in detail above, and specific examples are used in this article to illustrate the principles and implementations of the present disclosure. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the present disclosure. At the same time, changes or modifications made by those skilled in the art based on the ideas of the present disclosure, the specific embodiments and the scope of application of the present disclosure, are all within the protection scope of the present disclosure. In summary, the content of this specification should not be construed as a limitation of this disclosure.

Claims (14)

  1. 一种数据处理方法,其特征在于,应用于处理器,所述方法包括:A data processing method, characterized in that it is applied to a processor, and the method includes:
    将第一数据按照预置拆分方式进行拆分,得到多个第二数据;Split the first data according to a preset split mode to obtain multiple second data;
    分别进行所述第二数据与权值的卷积操作,得到多个第一卷积结果;Performing a convolution operation of the second data and the weight respectively to obtain multiple first convolution results;
    将所述多个第一卷积结果按照预置合并方式进行合并,得到所述第一数据与所述权值的空洞卷积结果,Combining the multiple first convolution results according to a preset merging manner to obtain a hole convolution result of the first data and the weight,
    其中,所述预置合并方式为所述预置拆分方式的逆过程。Wherein, the preset merge mode is the reverse process of the preset split mode.
  2. 根据权利要求1所述的方法,其特征在于,所述多个第二数据包括第一子数据、第二子数据、第三子数据及第四子数据,所述将第一数据按照预置拆分方式进行拆分,得到多个第二数据,包括:The method according to claim 1, wherein the plurality of second data includes a first sub-data, a second sub-data, a third sub-data, and a fourth sub-data, and the first data is preset Split by splitting method to obtain multiple second data, including:
    遍历第一数据中的元素,确定所述第一数据中的奇数行的奇数列对应的元素,组成所述第一子数据;Traverse the elements in the first data, determine the elements corresponding to the odd columns of the odd rows in the first data, and form the first sub-data;
    确定所述第一数据中的奇数行的偶数列对应的元素,组成所述第二子数据;Determining elements corresponding to odd-numbered rows and even-numbered columns in the first data to form the second sub-data;
    确定所述第一数据中的偶数行的奇数列对应的元素,组成所述第三子数据;Determining elements corresponding to odd columns of even rows in the first data to form the third sub-data;
    确定所述第一数据中的偶数行的偶数列对应的元素,组成所述第四子数据。The element corresponding to the even-numbered row and the even-numbered column in the first data is determined to form the fourth sub-data.
  3. 根据权利要求2所述的方法,其特征在于,所述将所述多个第一卷积结果按照预置合并方式进行合并,得到所述第一数据与所述权值的空洞卷积结果,包括:3. The method according to claim 2, wherein the combining the multiple first convolution results according to a preset combining manner to obtain a hole convolution result of the first data and the weight, include:
    将所述第一子数据对应的第一卷积结果中的元素,依次作为所述空洞卷积结果的奇数行的奇数列对应的元素;Taking the elements in the first convolution result corresponding to the first sub-data as the elements corresponding to the odd columns of the odd rows of the hole convolution result in sequence;
    将所述第二子数据对应的第一卷积结果中的元素,依次作为所述空洞卷积结果的奇数行的偶数列对应的元素;Taking the elements in the first convolution result corresponding to the second sub-data as the elements corresponding to the odd-numbered rows and even-numbered columns of the hole convolution result in sequence;
    将所述第三子数据对应的第一卷积结果中的元素,依次作为所述空洞卷积结果中的偶数行的奇数列对应的元素;Taking the elements in the first convolution result corresponding to the third sub-data as the elements corresponding to the odd columns of the even rows in the hole convolution result in sequence;
    将所述第四子数据对应的第一卷积结果中的元素,依次作为所述空洞卷积结果中的偶数行的偶数列对应的元素。The elements in the first convolution result corresponding to the fourth sub-data are sequentially used as the elements corresponding to the even-numbered rows and the even-numbered columns in the hole convolution result.
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述第一数据包括神经元和/或梯度。The method according to any one of claims 1 to 3, wherein the first data includes neurons and/or gradients.
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,所述第一数据包括第一神经元和第一梯度,所述将第一数据按照预置拆分方式进行拆分,得到多个第二数据,包括:The method according to any one of claims 1 to 4, wherein the first data includes a first neuron and a first gradient, and the first data is split according to a preset split mode, Obtain multiple second data, including:
    将所述第一神经元按照所述预置拆分方式进行拆分,得到多个第二神经元;Splitting the first neuron according to the preset splitting manner to obtain a plurality of second neurons;
    将所述第一梯度按照所述预置拆分方式进行拆分,得到多个第二梯度。The first gradient is split according to the preset split mode to obtain a plurality of second gradients.
  6. 根据权利要求5所述的方法,其特征在于,所述方法还包括:The method according to claim 5, wherein the method further comprises:
    针对任一所述第二神经元,将该第二神经元与对应的第二梯度执行卷积操作,得到第三卷积结果;For any of the second neurons, perform a convolution operation on the second neuron and the corresponding second gradient to obtain a third convolution result;
    确定各个所述第二神经元对应的第三卷积结果的和为所述权值的残差;Determining that the sum of the third convolution results corresponding to each of the second neurons is the residual of the weight;
    其中,所述第二神经元中的元素在所述第一神经元中的位置对应的行与列的奇偶性质,与对应的第二梯度中的元素在所述第一梯度中的位置对应的行与列的奇偶性质相一致。Wherein, the parity property of the row and column corresponding to the position of the element in the second neuron in the first neuron corresponds to the position of the element in the corresponding second gradient in the first gradient The parity of the rows and columns is consistent.
  7. 根据权利要求6所述的方法,其特征在于,所述方法还包括:The method according to claim 6, wherein the method further comprises:
    根据所述权值的残差调整所述权值。The weight value is adjusted according to the residual error of the weight value.
  8. 一种数据处理装置,其特征在于,应用于处理器,所述装置包括:A data processing device, characterized in that it is applied to a processor, and the device includes:
    拆分模块,用于将第一数据按照预置拆分方式进行拆分,得到多个第二数据;The splitting module is used to split the first data according to a preset splitting method to obtain multiple second data;
    卷积模块,用于分别进行所述第二数据与权值的卷积操作,得到多个第一卷积结果;The convolution module is configured to perform the convolution operation of the second data and the weight respectively to obtain multiple first convolution results;
    合并模块,用于将所述多个第一卷积结果按照预置合并方式进行合并,得到所述第一数据与所述权值的空洞卷积结果,A merging module, configured to merge the multiple first convolution results according to a preset merging manner to obtain a hole convolution result of the first data and the weight,
    其中,所述预置合并方式为所述预置拆分方式的逆过程。Wherein, the preset merge mode is the reverse process of the preset split mode.
  9. 一种人工智能芯片,其特征在于,所述芯片包括如权利要求8所述的数据处理装置。An artificial intelligence chip, characterized in that the chip includes the data processing device according to claim 8.
  10. 一种电子设备,其特征在于,所述电子设备包括如权利要求9所述的人工智能芯片。An electronic device, wherein the electronic device comprises the artificial intelligence chip according to claim 9.
  11. 一种板卡,其特征在于,所述板卡包括:存储器件、接口装置和控制器件以及如权利要求9所述的人工智能芯片;A board card, characterized in that the board card comprises: a storage device, an interface device, a control device, and the artificial intelligence chip according to claim 9;
    其中,所述人工智能芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;Wherein, the artificial intelligence chip is connected to the storage device, the control device, and the interface device respectively;
    所述存储器件,用于存储数据;The storage device is used to store data;
    所述接口装置,用于实现所述人工智能芯片与外部设备之间的数据传输;The interface device is used to implement data transmission between the artificial intelligence chip and external equipment;
    所述控制器件,用于对所述人工智能芯片的状态进行监控。The control device is used to monitor the state of the artificial intelligence chip.
  12. 根据权利要求11所述的板卡,其特征在于,The board according to claim 11, characterized in that,
    所述存储器件包括:多组存储单元,每一组所述存储单元与所述人工智能芯片通过总线连接,所述存储单元为:DDR SDRAM;The storage device includes: multiple groups of storage units, each group of the storage unit is connected to the artificial intelligence chip through a bus, and the storage unit is: DDR SDRAM;
    所述芯片包括:DDR控制器,用于对每个所述存储单元的数据传输与数据存储的控制;The chip includes: a DDR controller, which is used to control the data transmission and data storage of each storage unit;
    所述接口装置为:标准PCIE接口。The interface device is: a standard PCIE interface.
  13. 一种电子设备,其特征在于,包括:An electronic device, characterized in that it comprises:
    处理器;processor;
    用于存储处理器可执行指令的存储器;A memory for storing processor executable instructions;
    其中,所述处理器被配置为调用所述存储器存储的指令,以执行权利要求1至7中任意一项所述的方法。Wherein, the processor is configured to call instructions stored in the memory to execute the method according to any one of claims 1 to 7.
  14. 一种计算机可读存储介质,其上存储有计算机程序指令,其特征在于,所述计算机程序指令被处理器执行时实现权利要求1至7中任意一项所述的方法。A computer-readable storage medium having computer program instructions stored thereon, wherein the computer program instructions implement the method according to any one of claims 1 to 7 when the computer program instructions are executed by a processor.
PCT/CN2020/123836 2019-12-09 2020-10-27 Data processing method and apparatus, computer device and storage medium WO2021114904A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911252885.3A CN113033761B (en) 2019-12-09 2019-12-09 Data processing method, device, computer equipment and storage medium
CN201911252885.3 2019-12-09

Publications (1)

Publication Number Publication Date
WO2021114904A1 true WO2021114904A1 (en) 2021-06-17

Family

ID=76329534

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/123836 WO2021114904A1 (en) 2019-12-09 2020-10-27 Data processing method and apparatus, computer device and storage medium

Country Status (2)

Country Link
CN (1) CN113033761B (en)
WO (1) WO2021114904A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114154624A (en) * 2021-12-07 2022-03-08 广州小鹏自动驾驶科技有限公司 Data processing method, device and equipment based on convolutional neural network
CN115348432A (en) * 2022-08-15 2022-11-15 上海壁仞智能科技有限公司 Data processing method and device, image processing method, electronic device and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106934765A (en) * 2017-03-14 2017-07-07 长沙全度影像科技有限公司 Panoramic picture fusion method based on depth convolutional neural networks Yu depth information
CN107958235A (en) * 2017-12-28 2018-04-24 泰康保险集团股份有限公司 A kind of facial image detection method, device, medium and electronic equipment
US20180157969A1 (en) * 2016-12-05 2018-06-07 Beijing Deephi Technology Co., Ltd. Apparatus and Method for Achieving Accelerator of Sparse Convolutional Neural Network
CN110070489A (en) * 2019-04-30 2019-07-30 中国人民解放军国防科技大学 Binocular image super-resolution method based on parallax attention mechanism
CN110135556A (en) * 2019-04-04 2019-08-16 平安科技(深圳)有限公司 Neural network accelerated method, device, computer equipment and storage medium based on systolic arrays
CN110428428A (en) * 2019-07-26 2019-11-08 长沙理工大学 A kind of image, semantic dividing method, electronic equipment and readable storage medium storing program for executing

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10061537B2 (en) * 2015-08-13 2018-08-28 Microsoft Technology Licensing, Llc Data reordering using buffers and memory
US10699160B2 (en) * 2017-08-23 2020-06-30 Samsung Electronics Co., Ltd. Neural network method and apparatus
CN110309837B (en) * 2019-07-05 2021-07-06 北京迈格威科技有限公司 Data processing method and image processing method based on convolutional neural network characteristic diagram

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180157969A1 (en) * 2016-12-05 2018-06-07 Beijing Deephi Technology Co., Ltd. Apparatus and Method for Achieving Accelerator of Sparse Convolutional Neural Network
CN106934765A (en) * 2017-03-14 2017-07-07 长沙全度影像科技有限公司 Panoramic picture fusion method based on depth convolutional neural networks Yu depth information
CN107958235A (en) * 2017-12-28 2018-04-24 泰康保险集团股份有限公司 A kind of facial image detection method, device, medium and electronic equipment
CN110135556A (en) * 2019-04-04 2019-08-16 平安科技(深圳)有限公司 Neural network accelerated method, device, computer equipment and storage medium based on systolic arrays
CN110070489A (en) * 2019-04-30 2019-07-30 中国人民解放军国防科技大学 Binocular image super-resolution method based on parallax attention mechanism
CN110428428A (en) * 2019-07-26 2019-11-08 长沙理工大学 A kind of image, semantic dividing method, electronic equipment and readable storage medium storing program for executing

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114154624A (en) * 2021-12-07 2022-03-08 广州小鹏自动驾驶科技有限公司 Data processing method, device and equipment based on convolutional neural network
CN115348432A (en) * 2022-08-15 2022-11-15 上海壁仞智能科技有限公司 Data processing method and device, image processing method, electronic device and medium
CN115348432B (en) * 2022-08-15 2024-05-07 上海壁仞科技股份有限公司 Data processing method and device, image processing method, electronic equipment and medium

Also Published As

Publication number Publication date
CN113033761A (en) 2021-06-25
CN113033761B (en) 2024-05-14

Similar Documents

Publication Publication Date Title
CN110889503B (en) Data processing method, data processing device, computer equipment and storage medium
WO2021036893A1 (en) Data processing method and apparatus, computer device, and storage medium
WO2019157888A1 (en) Control device, method and equipment for processor
WO2021114904A1 (en) Data processing method and apparatus, computer device and storage medium
WO2021114903A1 (en) Data processing method and apparatus, computer device, and storage medium
US20210098001A1 (en) Information processing method and terminal device
CN109670581B (en) Computing device and board card
CN106095544B (en) Central processing unit control method and device
WO2021185262A1 (en) Computing apparatus and method, board card, and computer readable storage medium
WO2021082725A1 (en) Winograd convolution operation method and related product
CN113297128B (en) Data processing method, device, computer equipment and storage medium
WO2021223642A1 (en) Data processing method and apparatus, and related product
WO2021083097A1 (en) Data processing method and apparatus, and computer device and storage medium
CN113298223B (en) Data processing method, device, computer equipment and storage medium
WO2021083100A1 (en) Data processing method and device, computer equipment and storage medium
WO2021082654A1 (en) Data processing method and apparatus, and computer device and storage medium
WO2021037082A1 (en) Method and apparatus for processing data, and related product
CN113762488B (en) Processor, data processing method, computer device, and storage medium
WO2021017546A1 (en) Neural network quantization method and apparatus, chip, electronic device and board card
WO2021082653A1 (en) Data processing method and apparatus, computer device and storage medium
CN113835990B (en) Detection method, detection device, computer equipment and storage medium
WO2021169914A1 (en) Data quantification processing method and apparatus, electronic device and storage medium
CN113762518A (en) Data processing method, data processing device, computer equipment and storage medium
US11983535B2 (en) Artificial intelligence computing device and related product
CN111339060B (en) Operation method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20900568

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20900568

Country of ref document: EP

Kind code of ref document: A1