WO2019057097A1 - 卷积运算方法、装置、计算机设备及计算机可读存储介质 - Google Patents

卷积运算方法、装置、计算机设备及计算机可读存储介质 Download PDF

Info

Publication number
WO2019057097A1
WO2019057097A1 PCT/CN2018/106600 CN2018106600W WO2019057097A1 WO 2019057097 A1 WO2019057097 A1 WO 2019057097A1 CN 2018106600 W CN2018106600 W CN 2018106600W WO 2019057097 A1 WO2019057097 A1 WO 2019057097A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
data points
input data
convolution
rearranged
Prior art date
Application number
PCT/CN2018/106600
Other languages
English (en)
French (fr)
Inventor
张渊
Original Assignee
杭州海康威视数字技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州海康威视数字技术股份有限公司 filed Critical 杭州海康威视数字技术股份有限公司
Priority to EP18859234.9A priority Critical patent/EP3686760A4/en
Priority to US16/649,306 priority patent/US11645357B2/en
Publication of WO2019057097A1 publication Critical patent/WO2019057097A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • G06F17/153Multidimensional correlation or convolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/2163Partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/22Arrangements for sorting or merging computer data on continuous record carriers, e.g. tape, drum, disc
    • G06F7/36Combined merging and sorting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application relates to the field of deep learning technologies, and in particular, to a convolution operation method, apparatus, computer device, and computer readable storage medium.
  • CNN Convolutional Neural Network
  • the convolution kernel of each network layer for convolution operation is also set accordingly. For different sizes.
  • the size of the convolution kernel directly affects the design of the hardware platform corresponding to CNN. If there are multiple sizes of convolution kernels in CNN, it is necessary to design a complex hardware platform to support the operation of CNN, resulting in a large overhead of hardware resources. .
  • the convolution operation for a network layer using a larger-sized convolution kernel, two small-sized convolution kernels are used instead of the larger-sized convolution kernel to convolute the input data.
  • the operation for example, uses two 3x3 convolution kernels instead of one 5x5 convolution kernel to perform convolution operations on the input data.
  • the convolution operation is completed by a convolution kernel, and the method needs to complete the convolution operation through two convolution kernels. This method increases the operation amount of the convolution operation and affects the volume. The computational efficiency of the product operation.
  • the purpose of embodiments of the present application is to provide a convolution operation method, apparatus, computer device, and computer readable storage medium to improve the computational efficiency of a convolutional neural network.
  • the specific technical solutions are as follows:
  • an embodiment of the present application provides a convolution operation method, where the method includes:
  • the re-arranged data is convoluted by a convolution kernel of a preset size to obtain a convolution result.
  • an embodiment of the present application provides a convolution operation device, where the device includes:
  • An acquisition module configured to acquire input data of a network layer in a convolutional neural network
  • An extracting module configured to extract a plurality of data points from the input data each time according to a preset step size
  • mapping module configured to map each of the extracted data points to the same position of different depths in the three-dimensional data to obtain the rearranged data
  • an operation module configured to perform convolution operation on the rearranged data by using a convolution kernel of a preset size to obtain a convolution result.
  • the embodiment of the present application provides a computer readable storage medium for storing executable code, which is executed at runtime: a convolution operation provided by the first aspect of the embodiment of the present application method.
  • an embodiment of the present application provides an application program for performing a convolution operation method provided by the first aspect of the embodiment of the present application.
  • an embodiment of the present application provides a computer device, including a processor and a computer readable storage medium, where
  • the computer readable storage medium for storing executable code
  • the processor is configured to implement the steps of the convolution operation method provided by the first aspect of the embodiments of the present application when the executable code stored on the computer readable storage medium is executed.
  • multiple data points are extracted from the input data of the network layer in the acquired convolutional neural network by using a preset step size, and multiple extractions are performed each time.
  • the data points are mapped to the same position at different depths in the three-dimensional data, and the rearranged data is obtained.
  • the re-arranged data is convoluted by the convolution kernel of the preset size to obtain a convolution result. Due to the extraction and mapping operations of multiple data points on the input data of the network layer, the input data is expanded in the depth direction, and the size of each depth is reduced. Since the size of the input data becomes smaller, the input data can be utilized smaller.
  • the convolution kernel performs a convolution operation on the input data, and the input data of each network layer is processed by the method, and the obtained rearranged data can be convoluted by the convolution kernel of the same preset size. Thereby, the overhead of hardware resources can be reduced, and for each network layer, the convolution operation is performed by using the convolution kernel of the same smaller size, and the operation efficiency of the convolutional neural network can be improved.
  • FIG. 1 is a schematic flow chart of a convolution operation method according to an embodiment of the present application.
  • FIG. 2 is a schematic flow chart of a convolution operation method according to another embodiment of the present application.
  • FIG. 3 is a schematic diagram of rearrangement of input data according to an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a convolution operation device according to an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a convolution operation device according to another embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a computer device according to an embodiment of the present application.
  • the embodiment of the present application provides a convolution operation method, apparatus, computer device, and computer readable storage medium.
  • a convolution operation method provided by the embodiment of the present application is first introduced.
  • An execution body of a convolution operation method provided by an embodiment of the present application may be a computer device that performs a convolution operation, for example, an image processor, a camera having an image processing function, and the like, and the execution body includes at least data.
  • a manner of implementing a convolution operation method provided by an embodiment of the present application may be at least one of software, hardware circuits, and logic circuits disposed in an execution body.
  • a convolution operation method provided by an embodiment of the present application may include the following steps:
  • the input data of each network layer in the convolutional neural network is a three-dimensional data, and the size of the input data can be expressed as W ⁇ H ⁇ I, where I is the depth of the input data, and W ⁇ H is the data size of each depth. That is, the width and height of the data of each depth, because the size of the input data of each network layer in the convolutional neural network is different, especially the data size of each depth is different, in order to increase the calculation rate, for large-sized input data, Convolution operations can be performed by selecting a large-sized convolution kernel. For small-sized input data, a convolution operation can be performed by selecting a small-sized convolution kernel. However, this requires a complex hardware platform to support multiple convolution kernels to perform convolution operations on the input data of different network layers, which affects the computational efficiency of the convolutional neural network.
  • the convolution kernel with a smaller size can be selected to perform convolution operation on the input data, thus, for different
  • the same smaller size convolution kernel can be used, which not only ensures the operation rate of the convolution operation, but also improves the computational efficiency of the convolutional neural network. Therefore, in the embodiment of the present application, the purpose of improving the computational efficiency of the convolutional neural network is achieved by processing the input data.
  • S102 can be specifically:
  • the preset step size may be a preset rule for extracting a plurality of data points. For example, if the preset step size is 2 ⁇ 2, four data points on each depth are extracted each time according to the 2 ⁇ 2 rule.
  • multiple data points satisfying the preset step size in depth may be extracted at one time, for example, the depth of the input data is 256, the preset step size is 2 ⁇ 2, and the data points extracted at one time 2 ⁇ 2 ⁇ 256; it is also possible to extract a plurality of data points satisfying the preset step size in one depth at a time, for example, the preset step size is 2 ⁇ 2, and the data points extracted at one time are 2 ⁇ 2;
  • a plurality of data points satisfying a preset step size in depth may be extracted at one time, for example, a preset step size is 2 ⁇ 2, and data points of 10 depths are extracted at a time, and the data points extracted at one time are 2 ⁇ 2 ⁇ 10 One.
  • the plurality of data points may be mapped, that is, arranged to the same position of different depths in the three-dimensional data, for example, by extracting four data points a, b, c, and d through the above steps,
  • the four data points may be arranged at the same position of four consecutive depths according to any of the four data points a, b, c, and d, and the order may be [a ⁇ b ⁇ c ⁇ d], [a ⁇ b ⁇ d ⁇ c], [a ⁇ c ⁇ b ⁇ d], [a ⁇ d ⁇ b ⁇ c], [a ⁇ d ⁇ c ⁇ b], [b ⁇ a ⁇ c ⁇ d], [b ⁇ d ⁇ a ⁇ c], [b ⁇ d ⁇ a ⁇ c], [b ⁇ d ⁇ a ⁇ c], [b ⁇ d ⁇ c ⁇ a], [b ⁇ c ⁇ a ⁇ d], [b ⁇ c ⁇ a ⁇ d], [c ⁇ a ⁇ b ⁇ d], [c ⁇ a ⁇ d], [c ⁇ b ⁇ a ⁇ d], [c ⁇ b ⁇ a ⁇ d], [c ⁇ b ⁇ a ⁇ d], [
  • S103 can be specifically:
  • a plurality of data to be combined are arranged to obtain rearranged data.
  • the extracted plurality of data points may be directly arranged to the same position at different depths in the newly created three-dimensional data according to the above mapping manner.
  • S103 can also be specifically:
  • the plurality of data points extracted each time are stored in the same position at different depths in the three-dimensional data in the order of arrangement, and the rearranged data is obtained.
  • the extracted data points of each depth may be first arranged, and then the plurality of data points are arranged in order of depth. To the same position at different depths in the new 3D data. For example, if the size of the input data is 26 ⁇ 26 ⁇ 10, the data is extracted according to the preset step size of 2 ⁇ 2, and the size of the data after the rearrangement is 13 ⁇ 13 ⁇ 40, or one line per interval/ After one column extraction, the size of the data after reordering is 25 ⁇ 25 ⁇ 40.
  • multiple data points satisfying the preset step size in one depth of the input data are extracted multiple times, and then multiple data points extracted from each depth are mapped to the same position at different depths in the newly created three-dimensional data.
  • a plurality of data to be merged are obtained, and finally, a plurality of data to be combined are arranged in the depth direction to obtain rearranged data. For example, if the size of the input data is 26 ⁇ 26 ⁇ 10, data is extracted for each depth according to a preset step size of 2 ⁇ 2, and 13 ⁇ 13 ⁇ 4 data to be combined corresponding to each depth is obtained, and then The 10 depths of the data to be combined are combined to obtain 13 ⁇ 13 ⁇ 40 rearranged data.
  • the specific combination mode may be that the data to be merged corresponding to each depth is arbitrarily arranged, and the rearranged data obtained by the arrangement is arranged.
  • the data to be merged corresponding to each depth is arbitrarily arranged, and the rearranged data obtained by the arrangement is arranged.
  • 10! 3628800 Arrangement, one of these arrangements can be selected as the combination of the data to be merged corresponding to each depth.
  • S104 Perform convolution operation on the rearranged data by using a convolution kernel of a preset size to obtain a convolution result.
  • the re-arranged data can be convoluted by using a convolution kernel of a preset size, and the convolution kernel of the preset size can be A smaller size convolution kernel, such as a 3x3 convolution kernel, or a smaller size convolution kernel.
  • the convolution operation can be performed separately by the convolution kernel of the same size, and for the convolutional neural network, the convolution check network can be used for small networks.
  • the input data of the layer is convoluted, so that convolution operation can be realized by a simple hardware platform, thereby improving the computational efficiency of the convolutional neural network.
  • the convolution kernel performs a convolution operation on the input data, and the input data of each network layer is processed by the method, and the obtained rearranged data can be convoluted by the convolution kernel of the same preset size. Thereby, the overhead of hardware resources can be reduced, and for each network layer, the convolution operation is performed by using the convolution kernel of the same smaller size, and the operation efficiency of the convolutional neural network can be improved.
  • the embodiment of the present application further provides a convolution operation method.
  • the convolution operation method includes the following steps:
  • the input data can be divided in the depth direction to obtain a plurality of slices.
  • each depth may be divided into one slice, or multiple depths may be divided into one slice. After the division, the extraction of the data points in each slice may be performed in parallel, thereby improving the operation. s speed.
  • S204 Map a plurality of data points extracted from each slice to the same position of different depths in the three-dimensional data, and obtain data to be combined corresponding to each slice.
  • each slice After each data point is extracted, multiple data points can be mapped, that is, arranged to the same position of different depths in the three-dimensional data.
  • the mapping process is the same as the embodiment shown in FIG. Let me repeat.
  • the extracted plurality of data points may be directly arranged to the same position at different depths; if multiple depths satisfy multiple preset steps, For data points, the extracted data points of each depth are first arranged, and then the plurality of data points are arranged in the order of depth to the same position at different depths.
  • S205 Arranging a plurality of data to be combined in the depth direction to obtain rearranged data.
  • data For example, if the size of the input data is 26 ⁇ 26 ⁇ 10, the input data is divided in the depth direction to obtain three slices of 26 ⁇ 26 ⁇ 1, 26 ⁇ 26 ⁇ 3, and 26 ⁇ 26 ⁇ 6, respectively, according to 2 ⁇ . 2 Pre-set step data is extracted for each slice to obtain 13 ⁇ 13 ⁇ 4, 13 ⁇ 13 ⁇ 12, 13 ⁇ 13 ⁇ 24 data to be combined, and then the three data to be combined are combined to obtain 13 ⁇ 13. ⁇ 40 rearranged data.
  • the merge mode may be any arrangement of the data to be merged corresponding to each slice, and the rearranged data obtained by the arrangement.
  • there can be 3! 6 arrangements, one of these arrangements can be selected as the combination of the data to be merged corresponding to each slice.
  • Convolution operations can be performed using convolution kernels of the same preset size, thereby reducing the overhead of hardware resources, and for each network layer, using the same smaller size convolution kernel for convolution operations, Improve the computational efficiency of convolutional neural networks.
  • the input data of the network layer is A, where A is a three-dimensional data of size W ⁇ H ⁇ I.
  • A is divided into I slices in the depth direction, and each slice is recorded as A i , as shown in FIG. 3 , where i ⁇ [1, I].
  • the third step in accordance with step A i 2 ⁇ 2, and extracts four data points per a j, as shown in FIG. 3 in dashed box A i, where, j ⁇ [1,4].
  • a i * is combined in the depth direction to obtain the rearranged data A * , wherein the size of A * is W * ⁇ H * ⁇ 4I, and the combination of A i * may be for each A i * arrange and merge, the order can be I! Any of a variety of sorting orders.
  • a convolution operation can be performed using a convolution kernel Kr ⁇ Kr ⁇ I r ⁇ O of a preset size to obtain a convolution result.
  • each depth is divided into one slice, and after the division, the extraction of multiple data points in each slice can be performed in parallel, thereby increasing the rate of the operation; and, due to the plurality of slices Data point extraction and mapping operations, each slice is expanded in the depth direction, and the size of each depth is reduced, and the input data can be convoluted by a smaller convolution kernel.
  • the input data of the network layer is processed, and the obtained rearranged data can be convoluted by using the convolution kernel of the same preset size, thereby reducing the overhead of hardware resources and utilizing for each network layer.
  • the convolution operation of the same smaller size convolution kernel can improve the computational efficiency of the convolutional neural network.
  • the embodiment of the present application further provides a convolution operation device, and the convolution operation device may include:
  • the obtaining module 410 is configured to obtain input data of a network layer in the convolutional neural network
  • the extracting module 420 is configured to extract a plurality of data points from the input data each time according to a preset step size
  • the mapping module 430 is configured to map the plurality of data points extracted each time to the same position of different depths in the three-dimensional data to obtain the rearranged data;
  • the operation module 440 is configured to perform convolution operation on the rearranged data by using a convolution kernel of a preset size to obtain a convolution result.
  • the convolution kernel performs a convolution operation on the input data, and the input data of each network layer is processed by the method, and the obtained rearranged data can be convoluted by the convolution kernel of the same preset size. Thereby, the overhead of hardware resources can be reduced, and for each network layer, the convolution operation is performed by using the convolution kernel of the same smaller size, and the operation efficiency of the convolutional neural network can be improved.
  • the extraction module 420 is specifically configured to:
  • the mapping module 430 can be specifically configured to:
  • a plurality of data to be combined are arranged to obtain rearranged data.
  • the extracting module 420 is specifically configured to:
  • the plurality of data points extracted each time are stored in the same position at different depths in the three-dimensional data in the order of arrangement, and the rearranged data is obtained.
  • the convolution operation device provided in this embodiment is a device to which the convolution operation method of the embodiment shown in FIG. 1 is applied. Therefore, all the embodiments of the convolution operation method are applicable to the present convolution operation device and have the same Or similar benefits, no more details here.
  • the embodiment of the present application further provides another convolution operation device.
  • the convolution operation device may include:
  • An obtaining module 510 configured to acquire input data of a network layer in a convolutional neural network
  • a dividing module 520 configured to divide the input data in a depth direction to obtain a plurality of slices
  • the extracting module 530 is configured to extract, for each slice, data points of each depth in the slice according to a preset step size, to obtain a plurality of data points;
  • the mapping module 540 is configured to map the plurality of data points extracted from each slice to the same position of different depths in the three-dimensional data, respectively obtain the data to be merged corresponding to each slice; and to combine the data to be combined in the depth direction Arrange and obtain the rearranged data;
  • the operation module 550 is configured to perform convolution operation on the rearranged data by using a convolution kernel of a preset size to obtain a convolution result.
  • Convolution operations can be performed using convolution kernels of the same preset size, thereby reducing the overhead of hardware resources, and for each network layer, using the same smaller size convolution kernel for convolution operations, Improve the computational efficiency of convolutional neural networks.
  • the embodiment of the present application provides a computer readable storage medium for storing executable code, which is used to execute at runtime:
  • the convolution operation method provided by the embodiment; specifically, the convolution operation method may include:
  • the re-arranged data is convoluted by a convolution kernel of a preset size to obtain a convolution result.
  • the computer readable storage medium stores executable code that executes the convolution operation method provided by the embodiment of the present application at runtime, and thus can realize: convolution obtained from each time by following a preset step size. Extracting a plurality of data points in the input data of the network layer in the neural network, and mapping the plurality of data points extracted each time to the same position of different depths in the three-dimensional data, obtaining the rearranged data, and finally using the volume of the preset size The product core performs a convolution operation on the rearranged data to obtain a convolution result. Due to the extraction and mapping operations of multiple data points on the input data of the network layer, the input data is expanded in the depth direction, and the size of each depth is reduced.
  • the convolution kernel performs a convolution operation on the input data, and the input data of each network layer is processed by the method, and the obtained rearranged data can be convoluted by the convolution kernel of the same preset size. Thereby, the overhead of hardware resources can be reduced, and for each network layer, the convolution operation is performed by using the convolution kernel of the same smaller size, and the operation efficiency of the convolutional neural network can be improved.
  • the embodiment of the present application provides an application program for performing the convolution operation method provided by the embodiment of the present application;
  • the convolution operation method provided in the embodiment may include:
  • the re-arranged data is convoluted by a convolution kernel of a preset size to obtain a convolution result.
  • the application performs the convolution operation method provided by the embodiment of the present application at runtime, and thus can realize: by using the preset step size, each time from the input data of the network layer in the acquired convolutional neural network. Extracting a plurality of data points, and mapping each of the extracted data points to the same position of different depths in the three-dimensional data to obtain the rearranged data, and finally using the convolution kernel of the preset size to perform the volume of the rearranged data.
  • the product operation results in a convolution result. Due to the extraction and mapping operations of multiple data points on the input data of the network layer, the input data is expanded in the depth direction, and the size of each depth is reduced. Since the size of the input data becomes smaller, the input data can be utilized smaller.
  • the convolution kernel performs a convolution operation on the input data, and the input data of each network layer is processed by the method, and the obtained rearranged data can be convoluted by the convolution kernel of the same preset size. Thereby, the overhead of hardware resources can be reduced, and for each network layer, the convolution operation is performed by using the convolution kernel of the same smaller size, and the operation efficiency of the convolutional neural network can be improved.
  • the embodiment of the present application provides a computer device, as shown in FIG. 6, including a processor 601 and a computer readable storage medium 602, where
  • Computer readable storage medium 602 for storing executable code
  • the processor 601 is configured to perform the following steps when executing the executable code stored on the computer readable storage medium 602:
  • the re-arranged data is convoluted by a convolution kernel of a preset size to obtain a convolution result.
  • the processor 601 is further configured to:
  • the processor 601 may specifically implement:
  • each slice each time according to a preset step size, data points of each depth in the slice are respectively extracted to obtain a plurality of data points;
  • the processor 601 can implement the step of mapping the plurality of data points extracted each time to the same position of different depths in the three-dimensional data to obtain the rearranged data, and specifically:
  • a plurality of data to be combined are arranged to obtain rearranged data.
  • the processor 601 in the step of implementing the step of extracting a plurality of data points from the input data according to a preset step size, may specifically implement:
  • the processor 601 can implement the step of mapping the plurality of data points extracted each time to the same position of different depths in the three-dimensional data to obtain the rearranged data, and specifically:
  • a plurality of data to be combined are arranged to obtain rearranged data.
  • the processor 601 in the step of implementing the mapping of the plurality of data points that are extracted each time to the same location of different depths in the three-dimensional data, to obtain the rearranged data, may specifically implement:
  • the plurality of data points extracted each time are stored in the same position at different depths in the three-dimensional data in the order of arrangement, and the rearranged data is obtained.
  • the computer readable storage medium 602 and the processor 601 can perform data transmission by means of a wired connection or a wireless connection, and the computer device can communicate with other devices through a wired communication interface or a wireless communication interface.
  • the above computer readable storage medium may include a RAM (Random Access Memory), and may also include an NVM (Non-volatile Memory), such as at least one disk storage.
  • the computer readable storage medium may also be at least one storage device located remotely from the aforementioned processor.
  • the processor may be a general-purpose processor, including a CPU (Central Processing Unit), an NP (Network Processor), or the like; or a DSP (Digital Signal Processor) or an ASIC (Application) Specific Integrated Circuit, FPGA (Field-Programmable Gate Array) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.
  • CPU Central Processing Unit
  • NP Network Processor
  • DSP Digital Signal Processor
  • ASIC Application) Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • other programmable logic device discrete gate or transistor logic device, discrete hardware components.
  • the processor of the computer device runs a program corresponding to the executable code by reading the executable code stored in the computer readable storage medium, and the program executes the method provided by the embodiment of the present application at runtime.
  • the convolution operation method can be realized by: extracting a plurality of data points from the input data of the network layer in the acquired convolutional neural network each time according to the preset step size, and mapping the plurality of data points extracted each time to The same position at different depths in the three-dimensional data is used to obtain the rearranged data, and finally the convolution operation is performed by using the convolution kernel of the preset size to perform convolution operation to obtain the convolution result.
  • the input data is expanded in the depth direction, and the size of each depth is reduced. Since the size of the input data becomes smaller, the input data can be utilized smaller.
  • the convolution kernel performs a convolution operation on the input data, and the input data of each network layer is processed by the method, and the obtained rearranged data can be convoluted by the convolution kernel of the same preset size. Thereby, the overhead of hardware resources can be reduced, and for each network layer, the convolution operation is performed by using the convolution kernel of the same smaller size, and the operation efficiency of the convolutional neural network can be improved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Algebra (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Neurology (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例提供了一种卷积运算方法、装置、计算机设备及计算机可读存储介质,其中,卷积运算方法包括:获取卷积神经网络中网络层的输入数据;按照预设步长,每次从输入数据中提取多个数据点;将每次提取的多个数据点映射至三维数据中不同深度的同一位置,得到重排列后的数据;利用预设尺寸的卷积核对重排列后的数据进行卷积运算,得到卷积结果。通过本方案,可以提高卷积神经网络的运算效率。

Description

卷积运算方法、装置、计算机设备及计算机可读存储介质
本申请要求于2017年09月22日提交中国专利局、申请号为201710866060.5发明名称为“卷积运算方法、装置、计算机设备及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及深度学习技术领域,特别涉及一种卷积运算方法、装置、计算机设备及计算机可读存储介质。
背景技术
在CNN(Convolutional Neural Network,卷积神经网络)中,对于每个网络层而言,由于输入数据的大小往往是不同的,因此,每个网络层进行卷积运算的卷积核也相应的设置为不同尺寸。然而,卷积核的尺寸大小直接影响到CNN对应硬件平台的设计,如果CNN中具有多种尺寸的卷积核,则需要设计复杂的硬件平台以支持CNN的运行,导致硬件资源的开销较大。
针对上述问题,相应的卷积运算方法中,对于采用较大尺寸的卷积核的网络层,利用两个小尺寸的卷积核代替该较大尺寸的卷积核,对输入数据进行卷积运算,例如,利用两个3×3的卷积核代替一个5×5的卷积核,对输入数据进行卷积运算。但是,对于一个网络层而言,原本通过一个卷积核完成卷积运算,而该方法需要通过两个卷积核才可以完成卷积运算,该方法增加了卷积运算的运算量,影响卷积运算的运算效率。
发明内容
本申请实施例的目的在于提供一种卷积运算方法、装置、计算机设备及计算机可读存储介质,以提高卷积神经网络的运算效率。具体技术方案如下:
第一方面,本申请实施例提供了一种卷积运算方法,所述方法包括:
获取卷积神经网络中网络层的输入数据;
按照预设步长,每次从所述输入数据中提取多个数据点;
将每次提取的多个数据点映射至三维数据中不同深度的同一位置,得到 重排列后的数据;
利用预设尺寸的卷积核对所述重排列后的数据进行卷积运算,得到卷积结果。
第二方面,本申请实施例提供了一种卷积运算装置,所述装置包括:
获取模块,用于获取卷积神经网络中网络层的输入数据;
提取模块,用于按照预设步长,每次从所述输入数据中提取多个数据点;
映射模块,用于将每次提取的多个数据点映射至三维数据中不同深度的同一位置,得到重排列后的数据;
运算模块,用于利用预设尺寸的卷积核对所述重排列后的数据进行卷积运算,得到卷积结果。
第三方面,本申请实施例提供了一种计算机可读存储介质,用于存储可执行代码,所述可执行代码用于在运行时执行:本申请实施例第一方面所提供的卷积运算方法。
第四方面,本申请实施例提供了一种应用程序,用于在运行时执行:本申请实施例第一方面所提供的卷积运算方法。
第五方面,本申请实施例提供了一种计算机设备,包括处理器和计算机可读存储介质,其中,
所述计算机可读存储介质,用于存放可执行代码;
所述处理器,用于执行所述计算机可读存储介质上所存放的可执行代码时,实现本申请实施例第一方面所提供的卷积运算方法的步骤。
综上可见,本申请实施例提供的方案中,通过按照预设步长,每次从获取的卷积神经网络中网络层的输入数据中提取多个数据点,并将每次提取的多个数据点映射至三维数据中不同深度的同一位置,得到重排列后的数据,最后利用预设尺寸的卷积核对重排列后的数据进行卷积运算,得到卷积结果。由于对网络层的输入数据进行多个数据点的提取及映射操作,将输入数据在深度方向进行扩展,并且减小了每个深度的尺寸,由于输入数据的尺寸变小, 则可以利用更小的卷积核对该输入数据进行卷积运算,通过该方法,将各网络层的输入数据进行处理,得到的重排列后的数据均可以利用相同的预设尺寸的卷积核进行卷积运算,从而可以减小硬件资源的开销,并且,针对每个网络层,利用相同的更小尺寸的卷积核进行卷积运算,可以提高卷积神经网络的运算效率。
附图说明
为了更清楚地说明本申请实施例和现有技术的技术方案,下面对实施例和现有技术中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请一实施例的卷积运算方法的流程示意图;
图2为本申请另一实施例的卷积运算方法的流程示意图;
图3为本申请实施例的输入数据重排列的示意图;
图4为本申请一实施例的卷积运算装置的结构示意图;
图5为本申请另一实施例的卷积运算装置的结构示意图;
图6为本申请实施例的计算机设备的结构示意图。
具体实施方式
为使本申请的目的、技术方案、及优点更加清楚明白,以下参照附图并举实施例,对本申请进一步详细说明。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
下面通过具体实施例,对本申请进行详细的说明。
为了提高卷积神经网络的运算效率,本申请实施例提供了一种卷积运算方法、装置、计算机设备及计算机可读存储介质。
下面首先对本申请实施例所提供的一种卷积运算方法进行介绍。
本申请实施例所提供的一种卷积运算方法的执行主体可以为一种执行卷积运算的计算机设备,例如,图像处理器、具有图像处理功能的摄像机等等,执行主体中至少包括具有数据处理能力的核心处理芯片。实现本申请实施例所提供的一种卷积运算方法的方式,可以为设置于执行主体中的软件、硬件电路和逻辑电路中的至少一种方式。
如图1所示,本申请实施例所提供的一种卷积运算方法,可以包括如下步骤:
S101,获取卷积神经网络中网络层的输入数据。
卷积神经网络中每个网络层的输入数据为一个三维数据,输入数据的大小可以表示为W×H×I,其中,I为输入数据的深度,W×H为每个深度的数据尺寸,即每个深度的数据的宽和高,由于卷积神经网络中每个网络层的输入数据的大小不同,尤其是每个深度的数据尺寸不同,为了提高运算速率,针对大尺寸的输入数据,可以选择大尺寸的卷积核进行卷积运算,针对小尺寸的输入数据,可以选择小尺寸的卷积核进行卷积运算。但是,这样就要求复杂的硬件平台支持多种卷积核对不同网络层的输入数据分别进行卷积运算,影响了卷积神经网络的运算效率。
基于上述输入数据对卷积核选择的影响,可以考虑将输入数据每个深度的数据尺寸减少,这样的话,就可以选择尺寸较小的卷积核对输入数据进行卷积运算,这样,针对不同的网络层,可以使用相同的较小尺寸的卷积核,既保证了卷积运算的运算速率,又提高了卷积神经网络的运算效率。因此,本申请实施例中,通过对输入数据进行处理,达到提高卷积神经网络的运算效率的目的。
S102,按照预设步长,每次从输入数据中提取多个数据点。
为了能够减少网络层的输入数据中每个深度的数据尺寸,同时又不影响输入数据原本的数据量,可以考虑将输入数据的深度增加,即将输入数据中多个数据点映射至不同深度的同一位置,这样就可以减少输入数据每个深度的数据尺寸,而又不影响输入数据原始的数据量。在映射前,需要确定映射至不同深度的同一位置的数据点,为了不影响卷积运算的结果,可以对相邻 的数据进行映射。可选的,S102具体可以为:
针对输入数据的每个深度,每次按照预设步长,分别提取多个数据点。
预设步长可以为预先设定的提取多个数据点的规则,例如,预设步长为2×2,则按照2×2的规则,每次提取各深度上的四个数据点。在提取多个数据的过程中,可以一次提取所有深度上满足预设步长的多个数据点,例如,输入数据的深度为256,预设步长为2×2,则一次提取的数据点为2×2×256个;也可以一次提取一个深度中满足预设步长的多个数据点,例如,预设步长为2×2,则一次提取的数据点为2×2个;还可以一次提取多个深度上满足预设步长的多个数据点,例如,预设步长为2×2,一次提取10个深度的数据点,则一次提取的数据点为2×2×10个。
S103,将每次提取的多个数据点映射至三维数据中不同深度的同一位置,得到重排列后的数据。
在提取了多个数据点后,可以将多个数据点进行映射,即排列至三维数据中不同深度的同一位置,例如,通过上述步骤提取到a、b、c、d四个数据点,则可以按照a、b、c、d四个数据点的任一排列顺序,将该四个数据点排列至连续四个深度的同一位置处,排列顺序可以为[a→b→c→d]、[a→b→d→c]、[a→c→b→d]、[a→c→d→b]、[a→d→b→c]、[a→d→c→b]、[b→a→c→d]、[b→a→d→c]、[b→d→a→c]、[b→d→c→a]、[b→c→a→d]、[b→c→d→a]、[c→a→b→d]、[c→a→d→b]、[c→b→a→d]、[c→b→d→a]、[c→d→a→b]、[c→d→b→a]、[d→a→b→c]、[d→a→c→b]、[d→b→a→c]、[d→b→c→a]、[d→c→a→b]、[d→c→b→a]中的任一种,其中箭头表示了数据排列的顺序。
可选的,S103具体可以为:
将每次从输入数据中各深度提取的多个数据点映射至三维数据中不同深度的同一位置,得到多个待合并数据;
沿深度方向,将多个待合并数据进行排列,得到重排列后的数据。
如果一次提取输入数据的一个深度中满足预设步长的多个数据点,则可 以直接根据上述映射方式,将提取的多个数据点排列至新建三维数据中不同深度的同一位置。
可选的,S103具体还可以为:
对每次提取的多个数据点进行排列;
按照排列的顺序,将每次提取的多个数据点存储至三维数据中不同深度的同一位置,得到重排列后的数据。
如果一次提取输入数据的多个深度或者所有深度上满足预设步长的多个数据点,则可以将提取的每个深度的数据点先进行排列,再按深度的顺序将多个数据点排列至新建三维数据中不同深度的同一位置。举例说明,如果输入数据的大小为26×26×10,按照2×2的预设步长进行数据的提取,则得到重排列后的数据的大小为13×13×40,或者每间隔一行/一列提取,则得到重排列后的数据的大小为25×25×40。
或者,还可以是多次提取输入数据的一个深度中满足预设步长的多个数据点,然后将每次从各深度中提取的多个数据点映射至新建三维数据中不同深度的同一位置,得到多个待合并数据,最后沿深度方向,将多个待合并数据进行排列,得到重排列后的数据。例如,如果输入数据的大小为26×26×10,按照2×2的预设步长对每个深度进行数据提取,得到每个深度分别对应的13×13×4的待合并数据,再对10个深度的待合并数据进行合并,得到13×13×40的重排列后的数据。具体的合并方式可以为对各深度对应的待合并数据进行任意排列,通过排列得到的重排列后的数据。针对上述深度为10的输入数据,可以有10!=3628800种排列方式,可以从这些排列方式中任选一种作为各深度对应的待合并数据的合并方式。
S104,利用预设尺寸的卷积核对重排列后的数据进行卷积运算,得到卷积结果。
由于利用上述步骤对输入数据进行处理,输入数据每个深度的数据尺寸减小,则可以利用预设尺寸的卷积核对重排列后的数据进行卷积运算,预设尺寸的卷积核可以为较小尺寸的卷积核,例如3×3的卷积核,或者更小尺寸的卷积核。并且,通过对每个网络层的输入数据进行上述步骤的处理,可以 利用相同尺寸的卷积核分别进行卷积运算,则对于卷积神经网络而言,可以利用小尺寸的卷积核对各网络层的输入数据进行卷积运算,因此,可以实现利用简单的硬件平台实现卷积运算,从而提高卷积神经网络的运算效率。
应用本实施例,通过按照预设步长,每次从获取的卷积神经网络中网络层的输入数据中提取多个数据点,并将每次提取的多个数据点映射至三维数据中不同深度的同一位置,得到重排列后的数据,最后利用预设尺寸的卷积核对重排列后的数据进行卷积运算,得到卷积结果。由于对网络层的输入数据进行多个数据点的提取及映射操作,将输入数据在深度方向进行扩展,并且减小了每个深度的尺寸,由于输入数据的尺寸变小,则可以利用更小的卷积核对该输入数据进行卷积运算,通过该方法,将各网络层的输入数据进行处理,得到的重排列后的数据均可以利用相同的预设尺寸的卷积核进行卷积运算,从而可以减小硬件资源的开销,并且,针对每个网络层,利用相同的更小尺寸的卷积核进行卷积运算,可以提高卷积神经网络的运算效率。
基于图1所示实施例,本申请实施例还提供了一种卷积运算方法,如图2所示,该卷积运算方法包括如下步骤:
S201,获取卷积神经网络中网络层的输入数据。
S202,将输入数据沿深度方向进行划分,得到多个切片。
S203,针对各切片,每次按照预设步长,分别提取该切片中各深度的数据点,得到多个数据点。
在提取多个数据点的过程中,如果一次提取所有深度上满足预设步长的多个数据点,由于一次提取和映射的运算量过大,各深度的运算无法并行运行,容易影响运算的速率。因此,可以将输入数据沿深度方向进行划分,得到多个切片。在对输入数据进行划分的过程中,可以将每个深度划分为一个切片,也可以将多个深度划分为一个切片,划分之后,对各切片中数据点的提取可以并行执行,从而可以提高运算的速率。
S204,将每次从各切片中提取的多个数据点映射至三维数据中不同深度的同一位置,分别得到各切片对应的待合并数据。
针对各切片,在每次提取了多个数据点后,可以将多个数据点进行映射,即排列至三维数据中不同深度的同一位置,映射的过程如图1所示实施例相同,这里不再赘述。
如果一次提取一个深度中满足预设步长的多个数据点,则可以直接将提取的多个数据点排列至不同深度的同一位置;如果一次提取多个深度上满足预设步长的多个数据点,则可以将提取的每个深度的数据点先进行排列,再按深度的顺序将多个数据点排列至不同深度的同一位置。
S205,沿深度方向,将多个待合并数据进行排列,得到重排列后的数据。
将每次从各切片中提取的多个数据点映射至三维数据中不同深度的同一位置,得到多个待合并数据,然后沿深度方向,将多个待合并数据进行排列,得到重排列后的数据。例如,如果输入数据的大小为26×26×10,将该输入数据沿深度方向进行划分,得到26×26×1、26×26×3、26×26×6三个切片,分别按照2×2预设步长对各切片进行数据提取,得到13×13×4、13×13×12、13×13×24的待合并数据,再将这三个待合并数据进行合并,得到13×13×40的重排列后的数据。合并方式可以为对各切片对应的待合并数据进行任意排列,通过排列得到的重排列后的数据。针对上述输入数据,可以有3!=6种排列方式,可以从这些排列方式中任选一种作为各切片对应的待合并数据的合并方式。
S206,利用预设尺寸的卷积核对重排列后的数据进行卷积运算,得到卷积结果。
应用本实施例,通过将输入数据沿深度方向进行划分,得到多个切片,然后按照预设步长,每次从各切片中提取多个数据点,并将每次提取的多个数据点映射至三维数据中不同深度的同一位置,并通过合并得到重排列后的数据,最后利用预设尺寸的卷积核对重排列后的数据进行卷积运算,得到卷积结果。对输入数据进行划分之后,各切片中多个数据点的提取可以并行执行,从而可以提高运算的速率;并且,由于对各切片进行多个数据点的提取及映射操作,将各切片在深度方向进行扩展,减小了每个深度的尺寸,则可以利用更小的卷积核对该输入数据进行卷积运算,通过该方法,将各网络层的输入数据进行处理,得到的重排列后的数据均可以利用相同的预设尺寸的 卷积核进行卷积运算,从而可以减小硬件资源的开销,并且,针对每个网络层,利用相同的更小尺寸的卷积核进行卷积运算,可以提高卷积神经网络的运算效率。
为了便于理解,下面结合具体的应用实例,对本申请实施例所提供的卷积运算方法进行介绍。
第一步,对于需要将卷积核替换为更小尺寸的卷积核的网络层,记该网络层的输入数据为A,其中,A为大小为W×H×I的一个三维数据。
第二步,将A在深度方向上划分为I个切片,分别记录各切片为A i,如图3所示,其中,i∈[1,I]。
第三步,在A i内按照2×2的步长,每次提取4个数据点a j,如图3的A i中虚线框所示,其中,j∈[1,4]。
第四步,将提取的数据点映射到三维数据A i *对应的同一位置,其中,A i *的大小为W *×H *×4,数据点的排列顺序可以为4!=24种排列顺序中的任一种。
第五步,将A i *在深度方向上进行合并,得到重排列后的数据A *,其中,A *的大小为W *×H *×4I,A i *的合并方式可以为对各A i *进行排列合并,排列顺序可以为I!种排列顺序中的任一种。
第六步,基于重排列后的数据A *,可以利用预设尺寸的卷积核K r×K r×I r×O进行卷积运算,得到卷积结果。
本方案中,通过将输入数据沿深度方向进行划分,得到多个切片,然后按照2×2的步长,每次从各切片中提取多个数据点,并将每次提取的多个数据点映射至三维数据中不同深度的同一位置,并通过合并得到重排列后的数据,最后利用预设尺寸的卷积核对重排列后的数据进行卷积运算,得到卷积结果。在对输入数据进行划分的过程中,将每个深度划分为一个切片,划分之后,各切片中多个数据点的提取可以并行执行,从而可以提高运算的速率;并且,由于对各切片进行多个数据点的提取及映射操作,将各切片在深度方向进行扩展,减小了每个深度的尺寸,则可以利用更小的卷积核对该输入数 据进行卷积运算,通过该方法,将各网络层的输入数据进行处理,得到的重排列后的数据均可以利用相同的预设尺寸的卷积核进行卷积运算,从而可以减小硬件资源的开销,并且,针对每个网络层,利用相同的更小尺寸的卷积核进行卷积运算,可以提高卷积神经网络的运算效率。
相应于上述卷积运算方法实施例,如图4所示,本申请实施例还提供了一种卷积运算装置,该卷积运算装置可以包括:
获取模块410,用于获取卷积神经网络中网络层的输入数据;
提取模块420,用于按照预设步长,每次从所述输入数据中提取多个数据点;
映射模块430,用于将每次提取的多个数据点映射至三维数据中不同深度的同一位置,得到重排列后的数据;
运算模块440,用于利用预设尺寸的卷积核对所述重排列后的数据进行卷积运算,得到卷积结果。
应用本实施例,通过按照预设步长,每次从获取的卷积神经网络中网络层的输入数据中提取多个数据点,并将每次提取的多个数据点映射至三维数据中不同深度的同一位置,得到重排列后的数据,最后利用预设尺寸的卷积核对重排列后的数据进行卷积运算,得到卷积结果。由于对网络层的输入数据进行多个数据点的提取及映射操作,将输入数据在深度方向进行扩展,并且减小了每个深度的尺寸,由于输入数据的尺寸变小,则可以利用更小的卷积核对该输入数据进行卷积运算,通过该方法,将各网络层的输入数据进行处理,得到的重排列后的数据均可以利用相同的预设尺寸的卷积核进行卷积运算,从而可以减小硬件资源的开销,并且,针对每个网络层,利用相同的更小尺寸的卷积核进行卷积运算,可以提高卷积神经网络的运算效率。
可选的,所述提取模块420,具体可以用于:
针对所述输入数据的每个深度,每次按照预设步长,分别提取多个数据点;
所述映射模块430,具体可以用于:
将每次从所述输入数据中各深度提取的多个数据点映射至三维数据中不同深度的同一位置,得到多个待合并数据;
沿深度方向,将多个待合并数据进行排列,得到重排列后的数据。
可选的,所述提取模块420,具体还可以用于:
对每次提取的多个数据点进行排列;
按照排列的顺序,将每次提取的多个数据点存储至三维数据中不同深度的同一位置,得到重排列后的数据。
本实施例所提供的卷积运算装置为应用如图1所示实施例的卷积运算方法的装置,因此,上述卷积运算方法的所有实施例均适用于本卷积运算装置,且具有相同或相似的有益效果,这里不再赘述。
基于图4所示实施例,本申请实施例还提供了另一种卷积运算装置,如图5所示,该卷积运算装置可以包括:
获取模块510,用于获取卷积神经网络中网络层的输入数据;
划分模块520,用于将所述输入数据沿深度方向进行划分,得到多个切片;
提取模块530,用于针对各切片,每次按照预设步长,分别提取该切片中各深度的数据点,得到多个数据点;
映射模块540,用于将每次从各切片中提取的多个数据点映射至三维数据中不同深度的同一位置,分别得到各切片对应的待合并数据;沿深度方向,将多个待合并数据进行排列,得到重排列后的数据;
运算模块550,用于利用预设尺寸的卷积核对所述重排列后的数据进行卷积运算,得到卷积结果。
应用本实施例,通过将输入数据沿深度方向进行划分,得到多个切片,然后按照预设步长,每次从各切片中提取多个数据点,并将每次提取的多个 数据点映射至三维数据中不同深度的同一位置,并通过合并得到重排列后的数据,最后利用预设尺寸的卷积核对重排列后的数据进行卷积运算,得到卷积结果。对输入数据进行划分之后,各切片中多个数据点的提取可以并行执行,从而可以提高运算的速率;并且,由于对各切片进行多个数据点的提取及映射操作,将各切片在深度方向进行扩展,减小了每个深度的尺寸,则可以利用更小的卷积核对该输入数据进行卷积运算,通过该方法,将各网络层的输入数据进行处理,得到的重排列后的数据均可以利用相同的预设尺寸的卷积核进行卷积运算,从而可以减小硬件资源的开销,并且,针对每个网络层,利用相同的更小尺寸的卷积核进行卷积运算,可以提高卷积神经网络的运算效率。
另外,相应于上述实施例所提供的卷积运算方法,本申请实施例提供了一种计算机可读存储介质,用于存储可执行代码,所述可执行代码用于在运行时执行:本申请实施例所提供的卷积运算方法;具体的,所述卷积运算方法,可以包括:
获取卷积神经网络中网络层的输入数据;
按照预设步长,每次从所述输入数据中提取多个数据点;
将每次提取的多个数据点映射至三维数据中不同深度的同一位置,得到重排列后的数据;
利用预设尺寸的卷积核对所述重排列后的数据进行卷积运算,得到卷积结果。
本实施例中,计算机可读存储介质存储有在运行时执行本申请实施例所提供的卷积运算方法的可执行代码,因此能够实现:通过按照预设步长,每次从获取的卷积神经网络中网络层的输入数据中提取多个数据点,并将每次提取的多个数据点映射至三维数据中不同深度的同一位置,得到重排列后的数据,最后利用预设尺寸的卷积核对重排列后的数据进行卷积运算,得到卷积结果。由于对网络层的输入数据进行多个数据点的提取及映射操作,将输入数据在深度方向进行扩展,并且减小了每个深度的尺寸,由于输入数据的 尺寸变小,则可以利用更小的卷积核对该输入数据进行卷积运算,通过该方法,将各网络层的输入数据进行处理,得到的重排列后的数据均可以利用相同的预设尺寸的卷积核进行卷积运算,从而可以减小硬件资源的开销,并且,针对每个网络层,利用相同的更小尺寸的卷积核进行卷积运算,可以提高卷积神经网络的运算效率。
另外,相应于上述实施例所提供的卷积运算方法,本申请实施例提供了一种应用程序,用于在运行时执行:本申请实施例所提供的卷积运算方法;具体的,本申请实施例所提供的卷积运算方法,可以包括:
获取卷积神经网络中网络层的输入数据;
按照预设步长,每次从所述输入数据中提取多个数据点;
将每次提取的多个数据点映射至三维数据中不同深度的同一位置,得到重排列后的数据;
利用预设尺寸的卷积核对所述重排列后的数据进行卷积运算,得到卷积结果。
本实施例中,应用程序在运行时执行本申请实施例所提供的卷积运算方法,因此能够实现:通过按照预设步长,每次从获取的卷积神经网络中网络层的输入数据中提取多个数据点,并将每次提取的多个数据点映射至三维数据中不同深度的同一位置,得到重排列后的数据,最后利用预设尺寸的卷积核对重排列后的数据进行卷积运算,得到卷积结果。由于对网络层的输入数据进行多个数据点的提取及映射操作,将输入数据在深度方向进行扩展,并且减小了每个深度的尺寸,由于输入数据的尺寸变小,则可以利用更小的卷积核对该输入数据进行卷积运算,通过该方法,将各网络层的输入数据进行处理,得到的重排列后的数据均可以利用相同的预设尺寸的卷积核进行卷积运算,从而可以减小硬件资源的开销,并且,针对每个网络层,利用相同的更小尺寸的卷积核进行卷积运算,可以提高卷积神经网络的运算效率。
另外,相应于上述实施例提供的卷积运算方法,本申请实施例提供了一 种计算机设备,如图6所示,包括处理器601和计算机可读存储介质602,其中,
计算机可读存储介质602,用于存放可执行代码;
处理器601,用于执行计算机可读存储介质602上所存放的可执行代码时,实现如下步骤:
获取卷积神经网络中网络层的输入数据;
按照预设步长,每次从所述输入数据中提取多个数据点;
将每次提取的多个数据点映射至三维数据中不同深度的同一位置,得到重排列后的数据;
利用预设尺寸的卷积核对所述重排列后的数据进行卷积运算,得到卷积结果。
可选的,所述处理器601还可以实现:
将所述输入数据沿深度方向进行划分,得到多个切片;
所述处理器601在实现所述按照预设步长,每次从所述输入数据中提取多个数据点的步骤中,具体可以实现:
针对各切片,每次按照预设步长,分别提取该切片中各深度的数据点,得到多个数据点;
所述处理器601在实现所述将每次提取的多个数据点映射至三维数据中不同深度的同一位置,得到重排列后的数据的步骤中,具体可以实现:
将每次从各切片中提取的多个数据点映射至三维数据中不同深度的同一位置,分别得到各切片对应的待合并数据;
沿深度方向,将多个待合并数据进行排列,得到重排列后的数据。
可选的,所述处理器601在实现所述按照预设步长,每次从所述输入数据中提取多个数据点的步骤中,具体可以实现:
针对所述输入数据的每个深度,每次按照预设步长,分别提取多个数据 点;
所述处理器601在实现所述将每次提取的多个数据点映射至三维数据中不同深度的同一位置,得到重排列后的数据的步骤中,具体可以实现:
将每次从所述输入数据中各深度提取的多个数据点映射至三维数据中不同深度的同一位置,得到多个待合并数据;
沿深度方向,将多个待合并数据进行排列,得到重排列后的数据。
可选的,所述处理器601在实现所述将每次提取的多个数据点映射至三维数据中不同深度的同一位置,得到重排列后的数据的步骤中,具体可以实现:
对每次提取的多个数据点进行排列;
按照排列的顺序,将每次提取的多个数据点存储至三维数据中不同深度的同一位置,得到重排列后的数据。
计算机可读存储介质602与处理器601之间可以通过有线连接或者无线连接的方式进行数据传输,并且计算机设备可以通过有线通信接口或者无线通信接口与其他的设备进行通信。
上述计算机可读存储介质可以包括RAM(Random Access Memory,随机存取存储器),也可以包括NVM(Non-volatile Memory,非易失性存储器),例如至少一个磁盘存储器。可选的,计算机可读存储介质还可以是至少一个位于远离前述处理器的存储装置。
上述处理器可以是通用处理器,包括CPU(Central Processing Unit,中央处理器)、NP(Network Processor,网络处理器)等;还可以是DSP(Digital Signal Processor,数字信号处理器)、ASIC(Application Specific Integrated Circuit,专用集成电路)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。
本实施例中,该计算机设备的处理器通过读取计算机可读存储介质中存储的可执行代码来运行与所述可执行代码对应的程序,该程序在运行时执行本申请实施例所提供的卷积运算方法,因此能够实现:通过按照预设步长, 每次从获取的卷积神经网络中网络层的输入数据中提取多个数据点,并将每次提取的多个数据点映射至三维数据中不同深度的同一位置,得到重排列后的数据,最后利用预设尺寸的卷积核对重排列后的数据进行卷积运算,得到卷积结果。由于对网络层的输入数据进行多个数据点的提取及映射操作,将输入数据在深度方向进行扩展,并且减小了每个深度的尺寸,由于输入数据的尺寸变小,则可以利用更小的卷积核对该输入数据进行卷积运算,通过该方法,将各网络层的输入数据进行处理,得到的重排列后的数据均可以利用相同的预设尺寸的卷积核进行卷积运算,从而可以减小硬件资源的开销,并且,针对每个网络层,利用相同的更小尺寸的卷积核进行卷积运算,可以提高卷积神经网络的运算效率。
对于计算机设备、应用程序以及计算机可读存储介质实施例而言,由于其所涉及的方法内容基本相似于前述的方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
本说明书中的各个实施例均采用相关的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
以上所述仅为本申请的较佳实施例而已,并不用以限制本申请,凡在本申请的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请保护的范围之内。

Claims (11)

  1. 一种卷积运算方法,其特征在于,所述方法包括:
    获取卷积神经网络中网络层的输入数据;
    按照预设步长,每次从所述输入数据中提取多个数据点;
    将每次提取的多个数据点映射至三维数据中不同深度的同一位置,得到重排列后的数据;
    利用预设尺寸的卷积核对所述重排列后的数据进行卷积运算,得到卷积结果。
  2. 根据权利要求1所述的方法,其特征在于,在所述按照预设步长,每次从所述输入数据中提取多个数据点之前,所述方法还包括:
    将所述输入数据沿深度方向进行划分,得到多个切片;
    所述按照预设步长,每次从所述输入数据中提取多个数据点,包括:
    针对各切片,每次按照预设步长,分别提取该切片中各深度的数据点,得到多个数据点;
    所述将每次提取的多个数据点映射至三维数据中不同深度的同一位置,得到重排列后的数据,包括:
    将每次从各切片中提取的多个数据点映射至三维数据中不同深度的同一位置,分别得到各切片对应的待合并数据;
    沿深度方向,将多个待合并数据进行排列,得到重排列后的数据。
  3. 根据权利要求1所述的方法,其特征在于,所述按照预设步长,每次从所述输入数据中提取多个数据点,包括:
    针对所述输入数据的每个深度,每次按照预设步长,分别提取多个数据点;
    所述将每次提取的多个数据点映射至三维数据中不同深度的同一位置,得到重排列后的数据,包括:
    将每次从所述输入数据中各深度提取的多个数据点映射至三维数据中不同深度的同一位置,得到多个待合并数据;
    沿深度方向,将多个待合并数据进行排列,得到重排列后的数据。
  4. 根据权利要求1所述的方法,其特征在于,所述将每次提取的多个数据点映射至三维数据中不同深度的同一位置,得到重排列后的数据,包括:
    对每次提取的多个数据点进行排列;
    按照排列的顺序,将每次提取的多个数据点存储至三维数据中不同深度的同一位置,得到重排列后的数据。
  5. 一种卷积运算装置,其特征在于,所述装置包括:
    获取模块,用于获取卷积神经网络中网络层的输入数据;
    提取模块,用于按照预设步长,每次从所述输入数据中提取多个数据点;
    映射模块,用于将每次提取的多个数据点映射至三维数据中不同深度的同一位置,得到重排列后的数据;
    运算模块,用于利用预设尺寸的卷积核对所述重排列后的数据进行卷积运算,得到卷积结果。
  6. 根据权利要求5所述的装置,其特征在于,所述装置还包括:
    划分模块,用于将所述输入数据沿深度方向进行划分,得到多个切片;
    所述提取模块,具体用于:
    针对各切片,每次按照预设步长,分别提取该切片中各深度的数据点,得到多个数据点;
    所述映射模块,具体用于:
    将每次从各切片中提取的多个数据点映射至三维数据中不同深度的同一位置,分别得到各切片对应的待合并数据;
    沿深度方向,将多个待合并数据进行排列,得到重排列后的数据。
  7. 根据权利要求5所述的装置,其特征在于,所述提取模块,具体用于:
    针对所述输入数据的每个深度,每次按照预设步长,分别提取多个数据点;
    所述映射模块,具体用于:
    将每次从所述输入数据中各深度提取的多个数据点映射至三维数据中不同深度的同一位置,得到多个待合并数据;
    沿深度方向,将多个待合并数据进行排列,得到重排列后的数据。
  8. 根据权利要求5所述的装置,其特征在于,所述提取模块,具体用于:
    对每次提取的多个数据点进行排列;
    按照排列的顺序,将每次提取的多个数据点存储至三维数据中不同深度的同一位置,得到重排列后的数据。
  9. 一种计算机可读存储介质,其特征在于,用于存储可执行代码,所述可执行代码用于在运行时执行:权利要求1-4任一项所述的卷积运算方法。
  10. 一种应用程序,其特征在于,用于在运行时执行:权利要求1-4任一项所述的卷积运算方法。
  11. 一种计算机设备,其特征在于,包括处理器和计算机可读存储介质,其中,
    所述计算机可读存储介质,用于存放可执行代码;
    所述处理器,用于执行所述计算机可读存储介质上所存放的可执行代码时,实现权利要求1-4任一所述的方法步骤。
PCT/CN2018/106600 2017-09-22 2018-09-20 卷积运算方法、装置、计算机设备及计算机可读存储介质 WO2019057097A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP18859234.9A EP3686760A4 (en) 2017-09-22 2018-09-20 CONVOLUTIONAL OPERATIONAL METHOD AND DEVICE, COMPUTER DEVICE, AND COMPUTER READABLE STORAGE MEDIUM
US16/649,306 US11645357B2 (en) 2017-09-22 2018-09-20 Convolution operation method and apparatus, computer device, and computer-readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710866060.5 2017-09-22
CN201710866060.5A CN109543139B (zh) 2017-09-22 2017-09-22 卷积运算方法、装置、计算机设备及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2019057097A1 true WO2019057097A1 (zh) 2019-03-28

Family

ID=65811068

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/106600 WO2019057097A1 (zh) 2017-09-22 2018-09-20 卷积运算方法、装置、计算机设备及计算机可读存储介质

Country Status (4)

Country Link
US (1) US11645357B2 (zh)
EP (1) EP3686760A4 (zh)
CN (1) CN109543139B (zh)
WO (1) WO2019057097A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113128688A (zh) * 2021-04-14 2021-07-16 北京航空航天大学 通用型ai并行推理加速结构以及推理设备

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109543139B (zh) * 2017-09-22 2021-09-17 杭州海康威视数字技术股份有限公司 卷积运算方法、装置、计算机设备及计算机可读存储介质
CN111008040B (zh) * 2019-11-27 2022-06-14 星宸科技股份有限公司 缓存装置及缓存方法、计算装置及计算方法
CN112836803B (zh) * 2021-02-04 2024-07-23 珠海亿智电子科技有限公司 一种提高卷积运算效率的数据摆放方法
CN113297570B (zh) * 2021-05-21 2022-06-17 浙江工业大学 一种基于卷积神经网络的应用程序在线攻击方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110222752A1 (en) * 2008-04-08 2011-09-15 Three Palm Software Microcalcification enhancement from digital mammograms
CN106845635A (zh) * 2017-01-24 2017-06-13 东南大学 基于级联形式的cnn卷积核硬件设计方法
CN106898011A (zh) * 2017-01-06 2017-06-27 广东工业大学 一种基于边缘检测来确定卷积神经网络卷积核数量的方法
CN106980896A (zh) * 2017-03-16 2017-07-25 武汉理工大学 遥感分类卷积神经网络的关键卷积层超参数确定方法

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8457429B2 (en) * 2007-07-31 2013-06-04 Hewlett-Packard Development Company, L.P. Method and system for enhancing image signals and other signals to increase perception of depth
US8465200B2 (en) * 2010-06-04 2013-06-18 Uchicago Argonne, Llc Method for implementing depth deconvolution algorithm for enhanced thermal tomography 3D imaging
CN104809426B (zh) * 2014-01-27 2019-04-05 日本电气株式会社 卷积神经网络的训练方法、目标识别方法及装置
US10387773B2 (en) * 2014-10-27 2019-08-20 Ebay Inc. Hierarchical deep convolutional neural network for image classification
US10438117B1 (en) * 2015-05-21 2019-10-08 Google Llc Computing convolutions using a neural network processor
CN105260773B (zh) * 2015-09-18 2018-01-12 华为技术有限公司 一种图像处理装置以及图像处理方法
CN105320965B (zh) * 2015-10-23 2018-11-30 西北工业大学 基于深度卷积神经网络的空谱联合的高光谱图像分类方法
CN105787488B (zh) * 2016-03-02 2019-04-30 浙江宇视科技有限公司 由全局向局部传递的图像特征提取方法及装置
US10255529B2 (en) * 2016-03-11 2019-04-09 Magic Leap, Inc. Structure learning in convolutional neural networks
US9589374B1 (en) * 2016-08-01 2017-03-07 12 Sigma Technologies Computer-aided diagnosis system for medical images using deep convolutional neural networks
KR101740464B1 (ko) * 2016-10-20 2017-06-08 (주)제이엘케이인스펙션 뇌졸중 진단 및 예후 예측 방법 및 시스템
KR20180073118A (ko) * 2016-12-22 2018-07-02 삼성전자주식회사 컨볼루션 신경망 처리 방법 및 장치
CN106874955A (zh) * 2017-02-24 2017-06-20 深圳市唯特视科技有限公司 一种基于深度卷积神经网络的三维形状分类方法
CN107103277B (zh) * 2017-02-28 2020-11-06 中科唯实科技(北京)有限公司 一种基于深度相机和3d卷积神经网络的步态识别方法
CN106991372B (zh) * 2017-03-02 2020-08-28 北京工业大学 一种基于混合深度学习模型的动态手势识别方法
CN107145939B (zh) * 2017-06-21 2020-11-24 北京图森智途科技有限公司 一种低计算能力处理设备的计算机视觉处理方法及装置
CN109543139B (zh) * 2017-09-22 2021-09-17 杭州海康威视数字技术股份有限公司 卷积运算方法、装置、计算机设备及计算机可读存储介质
US11709911B2 (en) * 2018-10-03 2023-07-25 Maxim Integrated Products, Inc. Energy-efficient memory systems and methods

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110222752A1 (en) * 2008-04-08 2011-09-15 Three Palm Software Microcalcification enhancement from digital mammograms
CN106898011A (zh) * 2017-01-06 2017-06-27 广东工业大学 一种基于边缘检测来确定卷积神经网络卷积核数量的方法
CN106845635A (zh) * 2017-01-24 2017-06-13 东南大学 基于级联形式的cnn卷积核硬件设计方法
CN106980896A (zh) * 2017-03-16 2017-07-25 武汉理工大学 遥感分类卷积神经网络的关键卷积层超参数确定方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3686760A4

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113128688A (zh) * 2021-04-14 2021-07-16 北京航空航天大学 通用型ai并行推理加速结构以及推理设备
CN113128688B (zh) * 2021-04-14 2022-10-21 北京航空航天大学 通用型ai并行推理加速结构以及推理设备

Also Published As

Publication number Publication date
US20200265306A1 (en) 2020-08-20
EP3686760A1 (en) 2020-07-29
EP3686760A4 (en) 2020-11-18
CN109543139A (zh) 2019-03-29
US11645357B2 (en) 2023-05-09
CN109543139B (zh) 2021-09-17

Similar Documents

Publication Publication Date Title
WO2019057097A1 (zh) 卷积运算方法、装置、计算机设备及计算机可读存储介质
US20220383067A1 (en) Buffer Addressing for a Convolutional Neural Network
CN108876792B (zh) 语义分割方法、装置和系统及存储介质
US10896367B2 (en) Depth concatenation using a matrix computation unit
US20190164045A1 (en) Method and apparatus for performing operation of convolutional layer in convolutional neural network
US9691019B1 (en) Depth concatenation using a matrix computation unit
FI3555814T3 (fi) Keskimääräisen poolingin suorittaminen laitteistossa
US20150109290A1 (en) Device and method for removing noise points in point clouds
US20200167637A1 (en) Neural network processor using dyadic weight matrix and operation method thereof
CN111091572B (zh) 一种图像处理方法、装置、电子设备及存储介质
EP3079077A1 (en) Graph data query method and device
CN108564645B (zh) 房屋模型的渲染方法、终端设备及介质
EP3093757A2 (en) Multi-dimensional sliding window operation for a vector processor
US10200191B2 (en) Electronic calculating device for performing obfuscated arithmetic
JP2023541350A (ja) 表畳み込みおよびアクセラレーション
CN114254584A (zh) 芯片产品的对比方法、建模方法、装置及存储介质
US11106968B1 (en) Circuit arrangements and methods for traversing input feature maps
CN107392316B (zh) 网络训练方法、装置、计算设备及计算机存储介质
US20130343655A1 (en) Apparatus and method extracting feature information of a source image
CN111178513B (zh) 神经网络的卷积实现方法、卷积实现装置及终端设备
EP4381415A1 (en) Multiply-instantiated block modeling for circuit component placement in integrated circuit
CN107977923B (zh) 图像处理方法、装置、电子设备及计算机可读存储介质
US9361719B1 (en) Label placement on a digital map
JP6055758B2 (ja) アイコン表示プログラム、アイコン表示装置
US20150161438A1 (en) Feature generalization using topological model

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18859234

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018859234

Country of ref document: EP

Effective date: 20200422