WO2018113239A1 - 一种卷积神经网络的数据调度方法、系统及计算机设备 - Google Patents

一种卷积神经网络的数据调度方法、系统及计算机设备 Download PDF

Info

Publication number
WO2018113239A1
WO2018113239A1 PCT/CN2017/090792 CN2017090792W WO2018113239A1 WO 2018113239 A1 WO2018113239 A1 WO 2018113239A1 CN 2017090792 W CN2017090792 W CN 2017090792W WO 2018113239 A1 WO2018113239 A1 WO 2018113239A1
Authority
WO
WIPO (PCT)
Prior art keywords
image data
target image
data
buffer module
data buffer
Prior art date
Application number
PCT/CN2017/090792
Other languages
English (en)
French (fr)
Inventor
蒋文
Original Assignee
深圳云天励飞技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳云天励飞技术有限公司 filed Critical 深圳云天励飞技术有限公司
Publication of WO2018113239A1 publication Critical patent/WO2018113239A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0866Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
    • G06F12/0871Allocation or management of cache space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons

Definitions

  • the present invention relates to the field of convolutional neural network technologies, and in particular, to a data scheduling method, system and computer device for a convolutional neural network.
  • CNN Convolutional Neural Network
  • CNN's platform has also expanded from the Central Processing Unit (CPU).
  • CPU Central Processing Unit
  • GPU Graphic Processing Unit
  • FPGA Field-Programmable Gate Array
  • ASIC Application Specific Integrated Circuit
  • Embodiments of the present invention provide a data scheduling method, system, and computer device for a convolutional neural network, which are used to reduce storage space required for processing image data and loading and uploading time of image data.
  • the embodiment of the present invention provides a data scheduling method for a convolutional neural network, including:
  • N target image data Dividing the image data into N target image data, the N being an integer greater than 1, wherein the N target image data includes first target image data and second target image data, the first target image data and the The second target image data is adjacent to the target image data;
  • Loading the first target image data into the first data buffer module, and the calculating unit reads the first target image data stored by the first data buffer module, and performs convolution calculation; In the process of performing convolution calculation on the first target image data stored by the first data buffer module, loading the second target image data into the second data buffer module;
  • the calculation unit After the calculation unit obtains the calculation result, the calculation result is uploaded to the external storage.
  • the dividing the image data into N target image data includes:
  • the loading the first target image data into the first data buffer module comprises:
  • uploading the calculation result to the external storage includes:
  • the calculating unit reads the first target image data stored by the first data buffer module, performs convolution calculation, and obtains the calculation result, and then uploads the calculation result to the external storage;
  • the calculating unit stores the calculated calculation result in the cache, and uploads to the external storage if the data amount of the calculation result stored in the cache reaches a preset condition.
  • the calculating unit after reading the first target image data stored by the first data buffer module, performing convolution calculation includes:
  • the calculating unit reads the weight parameter of the image data from the cache and reads the first target image data stored by the first data buffer module, performs convolution calculation, and obtains a calculation result, and the calculation is performed The result is stored in the cache.
  • the method before the dividing the image data into N pieces of the target image data, the method further includes:
  • the dividing the image data into N target image data includes:
  • Determining, if the image data is divided into the target image data of the N data amounts, whether a storage space required by the first target image data is less than or equal to a maximum that can be allocated by the first data buffer module storage;
  • the image data is divided into the N pieces of the target image data.
  • the second embodiment of the present invention further provides a data scheduling system for a convolutional neural network, including:
  • a dividing module configured to divide the image data into N target image data, wherein the N is an integer greater than 1, and the N target image data includes a first target image and a second target image, the first target image data And the second target image data is adjacent to the target image data;
  • a loading module configured to load the first target image data into the first data buffer module; perform convolution calculation after the calculating unit reads the first target image data stored by the first data buffer module The second target image data is loaded into the second data buffer module;
  • a calculation unit configured to read the first target image data stored by the first data buffer module, and perform convolution calculation
  • a first data buffer module configured to store the first target image data
  • a second data buffering module configured to store the second target image data
  • the uploading module is configured to upload the calculation result obtained by the computing unit to the external storage.
  • the dividing module is specifically configured to divide the image data into the target image data with the N data amounts being equal;
  • the loading module is configured to allocate a storage space equal to a storage space required by the target image data for the first data buffer module and the second data buffer module, and load the first target image data into the storage unit.
  • the first data buffer module is described.
  • the uploading module is specifically configured to perform convolution calculation after the calculating unit reads the first target image data stored by the first data buffer module, to obtain the calculation After the result, uploading the calculation result to the external storage;
  • the calculation result is uploaded to the external storage.
  • the calculating unit is specifically configured to: after reading the weight parameter of the image data from the cache and reading the first target image data stored by the first data buffer module, A convolution calculation is performed to obtain a calculation result, and the calculation result is stored in the cache.
  • system further includes:
  • a determining module configured to determine, according to a storage space currently available in the cache and a quantity of the computing unit, a maximum storage space that can be allocated by the first data buffer module and the second data buffer module, the first data buffer module and the second data
  • the maximum storage space that can be allocated by the buffer module is the same; determining whether the storage space required for the first target image data is less than or equal to the image data after dividing the image data into the target image data with the N data amounts being equal Describe the maximum storage space that the first data buffer module can allocate;
  • the dividing module is further configured to divide the image data into after the determining module determines that the storage space required by the first target image data is less than or equal to a maximum storage space that can be allocated by the first data buffer module.
  • the N pieces of the target image data are further configured to divide the image data into after the determining module determines that the storage space required by the first target image data is less than or equal to a maximum storage space that can be allocated by the first data buffer module.
  • the third embodiment of the present invention further provides a computer device, including:
  • a memory that stores executable instructions and image data
  • One or more processors in communication with the memory to execute executable instructions to do the following:
  • N target image data Dividing the image data into N target image data, the N being an integer greater than 1, wherein the N target image data includes first target image data and second target image data, the first target image data and the The second target image data is adjacent target image data;
  • Loading the first target image data into the first data buffer module, and the calculating unit reads the first target image data stored by the first data buffer module, and performs convolution calculation; In the process of performing convolution calculation on the first target image data stored by the first data buffer module, loading the second target image data into the second data buffer module;
  • the calculation unit After the calculation unit obtains the calculation result, the calculation result is uploaded to the external storage.
  • the embodiment of the present invention has the following advantages: splitting image data into a plurality of target image data having a small required storage space, and requiring less storage space for processing the target image data, and solving the problem.
  • the limitation of storage space in the hardware design of convolutional neural network can improve the computing power; in the process of convolution calculation, dynamically loading and uploading data can effectively reduce the loading and uploading time of data.
  • FIG. 1 is a schematic flow chart of a data scheduling method for a convolutional neural network according to an embodiment of the present invention
  • FIG. 2 is a schematic structural diagram of image data storage according to an embodiment of the present invention.
  • FIG. 3 is a schematic flowchart of loading target image data according to an embodiment of the present invention.
  • FIG. 4 is a schematic flow chart of another data scheduling method for a convolutional neural network according to an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a data scheduling system of a convolutional neural network according to an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of a data scheduling system of a convolutional neural network according to an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
  • the embodiment of the invention provides a data scheduling method for a convolutional neural network, as shown in FIG. 1 , which includes:
  • the N is an integer greater than 1, and the N target image data includes first target image data and second target image data, and the first target image data and the second target image data are adjacent target image data.
  • the left half of the figure may represent the storage of the entire image data, and the storage address of the image data may be continuous or not, the image data is sequentially stored in an adjacent order, and the right half of the image may be A case where the image data is divided into four target image data, wherein the four target image data are adjacent, that is, the first target image data and the second target image are adjacent, and the second target image data and the third target image are adjacent to each other.
  • the three target image data is adjacent to the fourth target image.
  • the image data is obtained by pre-processing the collected original image data, and each target image data may represent a part of continuous data after the pre-processing of the original image data.
  • the storage space required for the above target image data is the same. After the image data is divided into the N target data images, the storage space required for each target data image is equal.
  • the first target image data is loaded into the first data buffer module, and the calculating unit reads the first target image data stored by the first data buffer module, and performs convolution calculation; and the first unit reads the first And performing the convolution calculation on the first target image data stored by the data buffer module, loading the second target image data into the second data buffer module;
  • the first data buffer module and the second data buffer module are both part of the cache, and occupy the same size of the cache space.
  • the first data buffer module and the second data buffer module described above are both 100 MB storage areas in the cache. Since the buffer space is limited, the image data may be stored in an external storage device, and when the image data needs to be processed, the image data is loaded from the external storage device to the first data buffer module and the second data buffer module.
  • the first data buffering module and the second data buffering module are ping-pong buffers, and can perform input and output switching. As shown in FIG. 3, the first data buffering module and the second data buffering module and the calculation are shown in FIG.
  • the operation performed by the unit in three time periods wherein the first target image data is loaded into the first data buffer module during the first time period, and the second data buffer module does not perform the operation of loading the target image data during the time period.
  • the target image data is also not stored; in the second time period, the second data buffer module loads the second target image data, and the computing unit reads the first target image data in the first data buffer module in parallel Performing a convolution calculation, and calculating the result to the cache; in the third time period, the first data buffer module loads the third target image data, and the computing unit reads the foregoing in the second data buffer module in parallel
  • the convolution calculation is performed after the second target image data, and the calculation result is described above to the above buffer.
  • the length of time for the above three time periods can be different.
  • the first data buffer module or the second data buffer module may load the target image required by the calculation unit for the next calculation. Data saves time when loading target image data.
  • the calculation unit may upload the calculation result obtained by each calculation to the cache first, and when the calculation result stored in the cache reaches the preset condition, upload to the external storage; or directly upload to the external storage. For example, when the number of calculation results in the cache reaches 100, the 100 calculation results are uploaded to the above external storage.
  • the image data is split into a plurality of target image data with a small required storage space, and the storage space required for processing the target image data is small, and the storage space in the hardware design of the convolutional neural network is solved.
  • Restricting problems can improve computing power; dynamically loading and uploading data during convolution calculations can effectively reduce the time it takes to load and upload data.
  • the dividing the image data into N target image data includes:
  • the loading the first target image data into the first data buffer module includes:
  • the storage space required for the above target image data is equal and the data in each of the above target image data is continuous.
  • the first data buffer module and the second data buffer module alternately perform a load operation, that is, when the calculation unit reads the target image data in the first data buffer module, the second data buffer module is loaded into the calculation unit.
  • the next target image data to be read which can effectively save the data loading time.
  • only the first data buffer module and the second data buffer module described above can complete the computing task, and the required storage space is small. How many pieces of target image data are divided into the above-described image data, that is, the determination of N described above can be determined based on the data amount of the image data.
  • the image data is first divided into N target image data with the same required storage space, and then the storage space required for the target image data is allocated to the first data buffer module and the second data buffer module. Effectively reduce the amount of storage space occupied.
  • the embodiment of the present invention provides a method for uploading the calculation result to the external storage on the basis of the previous embodiment.
  • the specific method is as follows: after the calculation unit obtains the calculation result, uploading the calculation result to the external storage includes:
  • the calculating unit reads the first target image data stored by the first data buffer module, performs convolution calculation, and obtains the calculation result, and then uploads the calculation result to the external storage;
  • the calculation unit stores the calculated calculation result in the cache, and uploads to the external storage if the data amount of the calculation result stored in the cache reaches a preset condition.
  • the above calculation unit can perform an upload once a calculation result, which can save the cache space.
  • the calculation result may be temporarily stored in the cache.
  • the result is uploaded to the external storage. Since the number of times that the uploading needs to be uploaded is obtained after obtaining a calculation result, the calculation result may be first stored in the cache, and when the calculation result in the cache reaches the preset condition, it is uploaded to the external storage at one time.
  • the above preset condition may be determined according to the speed at which the calculation result is externally stored to the above and the speed at which the calculation unit uploads the calculation result to the cache. For example, the computing unit uploads 5 calculation results to the cache every 1 second, and the cache uploads 500 calculation results to the external storage every 1 second, and the cache can perform an upload after the stored calculation result reaches 500. .
  • performing convolution calculation includes:
  • the calculating unit reads the weight parameter of the image data from the cache and reads the first target image data stored by the first data buffer module, performs convolution calculation, obtains a calculation result, and stores the calculation result in the cache. .
  • the above weight parameters are stored in the cache, and the image data only corresponds to one weight parameter and the weight parameter has less data amount.
  • the weight parameter and the target image data described above are matrices having the same number of rows and columns. After the reading unit reads the target image data and the weight parameter, the matrix point multiplication calculation is performed. If the target image data has multiple input layers, the intermediate result of each input layer needs to be calculated, and the summation operation is performed. The final result of a point in an output layer. The above intermediate result is obtained by performing matrix point multiplication calculation on an input layer and weight parameters.
  • the occupied storage space can be effectively reduced.
  • another method for dividing image data into N target image data is proposed, as follows: before the image data is divided into N target image data, the method further include:
  • the above dividing the image data into N target image data includes:
  • the image data is divided into the N pieces of the target image data.
  • the embodiment of the present invention may first determine the storage space available in the current cache, that is, the maximum storage space that can be allocated to the first data buffer module and the second data buffer module; and determine how many target image data the image data is divided into; A buffer space is allocated for the first data buffer module and the second data buffer module.
  • the available storage space of the current cache is 100 MB
  • the number of computing units is 10
  • the storage space required for image data is 200 MB
  • the maximum storage space that each computing unit can allocate is 100 MB/10, that is, 10 MB
  • the maximum storage space of a data buffer module and the second data buffer module is 10 MB/2, that is, 5 MB
  • the image data may be divided into a plurality of target image data of 40, 45, 50, 100, 200, and the like. If the image data is divided into 50 target image data, and the storage space required for each target image data is 4 MB, a buffer of 4 MB needs to be allocated for both the first data buffer module and the second data buffer module. After the image data is divided into the target image data, it is only necessary to ensure that the storage space required for the target image data is less than or equal to the number of target image data divided.
  • the number of image data divided into the target image data is determined according to the currently available storage space of the cache and the number of calculation units, and the cache space can be fully utilized to improve the computing capability.
  • a data scheduling method for a convolutional neural network is proposed. As shown in FIG. 4, the following steps may be included:
  • the image data includes first target image data and second target image data, and the first target image data and the second target image data are adjacent target image data.
  • the calculation unit reads the first target image data stored by the first data buffer module, performs convolution calculation, and performs convolution after the calculating unit reads the first target image data stored by the first data buffer module. In the calculating process, loading the second target image data into the second data buffer module;
  • the above N can be increased.
  • the above N is originally 8, and N can be adjusted to 9, 10 or other integers.
  • the image data is split into a plurality of target image data with a small required storage space, and the storage space required for processing the target image data is small, and the storage space in the hardware design of the convolutional neural network is solved.
  • Restricting problems can improve computing power; dynamically loading and uploading data during convolution calculations can effectively reduce the time it takes to load and upload data.
  • a data scheduling system for a convolutional neural network is proposed, as shown in FIG. 5, including:
  • the dividing module 501 is configured to divide the image data into N target image data, wherein the N is an integer greater than 1, and the N target image data includes a first target image and a second target image, the first target image data and the foregoing
  • the second target image data is adjacent target image data;
  • the loading module 502 is configured to load the first target image data into the first data buffer module; and perform the convolution calculation after the calculating unit reads the first target image data stored by the first data buffer module Loading the second target image data into the second data buffer module;
  • the calculating unit 503 is configured to perform the convolution calculation after reading the first target image data stored by the first data buffer module;
  • a first data buffering module 504 configured to store the first target image data
  • a second data buffering module 505, configured to store the second target image data
  • the uploading module 506 is configured to upload the calculation result obtained by the calculating unit to the external storage.
  • the implementation method is the same as that in FIG. 1, and will not be described in detail here.
  • a method for dividing image data into a plurality of target image data is provided.
  • the method is specifically as follows.
  • the dividing 501 module is specifically configured to divide the image data into the above-mentioned N data amounts.
  • the loading module 502 is configured to allocate a storage space equal to a storage space required by the target image data for the first data buffer module and the second data buffer module, and load the first target image data into the first data. Buffer module.
  • the storage space required for the above target image data is equal and the data in each of the above target image data is continuous.
  • the first data buffer module and the second data buffer module alternately perform a load operation, that is, when the calculation unit reads the target image data in the first data buffer module, the second data buffer module is loaded into the calculation unit.
  • the next target image data to be read which can effectively save the data loading time.
  • only the first data buffer module and the second data buffer module described above can complete the computing task, and the required storage space is small. How many pieces of target image data are divided into the above-described image data, that is, the determination of N described above can be determined based on the data amount of the image data.
  • the image data is first divided into N target image data having the same storage space, and the storage space required for the target image data is allocated to the first data buffer module and the second data buffer module, which can be effective. Reduce the occupied storage space.
  • the embodiment of the present invention provides a method for uploading the calculation result to the external storage on the basis of the previous embodiment.
  • the method is as follows: the uploading module 506 is specifically configured to read, by the calculating unit, the storage of the first data buffer module. Performing convolution calculation on the first target image data to obtain the above calculation result, and uploading the calculation result to the external storage;
  • the uploading module 506 is specifically configured to upload the calculation result to the external storage after the data amount of the calculation result stored in the cache by the calculating unit reaches a preset condition.
  • the above calculation unit can perform an upload once to obtain a calculation result, which can save the cache space.
  • the calculation result may be temporarily stored in the cache, and when the data amount of the calculation result stored in the cache reaches a preset condition, the result is uploaded to the external storage. Since the number of times that the uploading needs to be uploaded is obtained after obtaining a calculation result, the calculation result may be first stored in the cache, and when the calculation result in the cache reaches the preset condition, it is uploaded to the external storage at one time.
  • the above preset condition may be determined according to the speed at which the calculation result is externally stored to the above and the speed at which the calculation unit uploads the calculation result to the cache. For example, the computing unit uploads 5 calculation results to the cache every 1 second, and the cache uploads 500 calculation results to the external storage every 1 second, and the cache can perform an upload after the stored calculation result reaches 500. .
  • the calculating unit 503 is configured to: after reading the weight parameter of the image data from the cache and reading the first target image data stored by the first data buffer module, perform convolution calculation to obtain a calculation result, and perform the calculation The result is stored in the above cache.
  • the above weight parameters are stored in the cache, and the image data only corresponds to one weight parameter and the weight parameter has less data amount.
  • the weight parameter and the target image data described above are matrices having the same number of rows and columns. After the reading unit reads the target image data and the weight parameter, the matrix point multiplication calculation is performed. If the target image data has multiple input layers, the intermediate result of each input layer needs to be calculated, and the summation operation is performed. The final result of a point in an output layer. The above intermediate result is obtained by performing matrix point multiplication calculation on an input layer and weight parameters.
  • the occupied storage space can be effectively reduced.
  • An embodiment of the present invention provides another method for dividing image data into N target image data, as follows. Further, as shown in FIG. 6, the system further includes:
  • a determining module 601 configured to determine, according to a storage space currently available for the cache and a quantity of the computing unit, a maximum storage space that can be allocated by the first data buffer module and the second data buffer module, the first data buffer module and the second data buffer The maximum storage space that can be allocated by the module is the same; determining whether the storage space required by the first target image data is less than or equal to the first data buffer module after the image data is divided into the target image data of the N data amounts The maximum storage space that can be allocated;
  • the dividing module 501 is further configured to divide the image data into the N pieces after the determining unit 601 determines that the storage space required by the first target image data is less than or equal to the maximum storage space that can be allocated by the first data buffer module. The above target image data.
  • the embodiment of the present invention may first determine the storage space available in the current cache, that is, the maximum storage space that can be allocated to the first data buffer module and the second data buffer module; and determine how many target image data the image data is divided into; A buffer space is allocated for the first data buffer module and the second data buffer module.
  • the available storage space of the current cache is 100 MB
  • the number of computing units is 10
  • the storage space required for image data is 200 MB
  • the maximum storage space that each computing unit can allocate is 100 MB/10, that is, 10 MB
  • the maximum storage space of a data buffer module and the second data buffer module is 10 MB/2 or 5 MB
  • the image data can be divided into a plurality of target image data of 40, 50, 100, 200, and the like. If the image data is divided into 50 target image data, and the storage space required for each target image data is 4 MB, a buffer of 4 MB needs to be allocated for both the first data buffer module and the second data buffer module. After the image data is divided into the target image data, it is only necessary to ensure that the storage space required for the target image data is less than or equal to the number of target image data divided.
  • the number of image data divided into the target image data is determined according to the currently available storage space of the cache and the number of calculation units, and the cache space can be fully utilized to improve the computing capability.
  • the embodiment of the invention provides a computer device, as shown in FIG. 7, comprising:
  • the memory 701 stores executable instructions and image data
  • the processor 702 is in communication with the memory 701 to execute executable instructions to perform the following operations:
  • the image data is divided into N target image data, the N is an integer greater than 1, and the N target image data includes first target image data and second target image data, and the first target image data and the second target image The data is adjacent target image data;
  • Loading the first target image data into the first data buffer module, and the calculating unit reads the first target image data stored by the first data buffer module, and performs convolution calculation; and reading, by the calculating unit, the first data buffer And performing the convolution calculation on the first target image data stored by the module, loading the second target image data into the second data buffer module;
  • the calculation unit After the calculation unit obtains the calculation result, the calculation result is uploaded to the external storage.
  • a method for dividing the image data into the plurality of target image data is as follows. Specifically, the processor 702 is specifically configured to divide the image data into the above-mentioned N data amounts. Target image data; a storage space equal to a storage space required for the target image data is allocated to the first data buffer module and the second data buffer module, and the first target image data is loaded into the first data buffer module.
  • the storage space required for the above target image data is equal and the data in each of the above target image data is continuous.
  • the first data buffer module and the second data buffer module alternately perform a load operation, that is, when the calculation unit reads the target image data in the first data buffer module, the second data buffer module is loaded into the calculation unit.
  • the next target image data to be read which can effectively save the data loading time.
  • only the first data buffer module and the second data buffer module described above can complete the computing task, and the required storage space is small. How many pieces of target image data are divided into the above-described image data, that is, the determination of N described above can be determined based on the data amount of the image data.
  • the image data is first divided into N target image data having the same storage space, and the storage space required for the target image data is allocated to the first data buffer module and the second data buffer module, which can be effective. Reduce the occupied storage space.
  • the embodiment of the present invention provides a method for uploading the calculation result to the external storage, which is specifically as follows: the processor 702 is specifically configured to read, by the calculating unit, the storage of the first data buffer module. After the first target image data is subjected to convolution calculation, after the calculation result is obtained, the calculation result is uploaded to the external storage; or the data amount specifically used for the calculation result stored in the cache by the calculation unit reaches a preset After the condition, the above calculation result is uploaded to the above external storage.
  • the above calculation unit can perform an upload once to obtain a calculation result, which can save the cache space.
  • the calculation result may be temporarily stored in the cache, and when the data amount of the calculation result stored in the cache reaches a preset condition, the result is uploaded to the external storage. Since the number of times that the uploading needs to be uploaded is obtained after obtaining a calculation result, the calculation result may be first stored in the cache, and when the calculation result in the cache reaches the preset condition, it is uploaded to the external storage at one time.
  • the above preset condition may be determined according to the speed at which the calculation result is externally stored to the above and the speed at which the calculation unit uploads the calculation result to the cache. For example, the computing unit uploads 5 calculation results to the cache every 1 second, and the cache uploads 500 calculation results to the external storage every 1 second, and the cache can perform an upload after the stored calculation result reaches 500. .
  • the processor 702 is specifically configured to read the weight parameter of the image data from the cache and read the first target image data stored by the first data buffer module, perform convolution calculation, and obtain a calculation result, where the calculation is performed.
  • the result is stored in the above cache.
  • the above weight parameters are stored in the cache, and the image data only corresponds to one weight parameter and the weight parameter has less data amount.
  • the weight parameter and the target image data described above are matrices having the same number of rows and columns. After the reading unit reads the target image data and the weight parameter, the matrix point multiplication calculation is performed. If the target image data has multiple input layers, the intermediate result of each input layer needs to be calculated, and the summation operation is performed. The final result of a point in an output layer. The above intermediate result is obtained by performing matrix point multiplication calculation on an input layer and weight parameters.
  • the occupied storage space can be effectively reduced.
  • An embodiment of the present invention provides another method for dividing image data into N target image data, as follows:
  • the processor 702 is further configured to: according to a currently available storage space and a calculation unit of the cache.
  • the quantity of the first data buffer module and the second data buffer module are determined to be the maximum storage space that can be allocated, and the maximum storage space that the first data buffer module and the second data buffer module can allocate are the same; determining to divide the image data into Whether the storage space required by the first target image data is less than or equal to a maximum storage space that can be allocated by the first data buffer module after the target data data having the same N data amount is equal to the maximum storage space that can be allocated by the first data buffer module; and determining, by the determining module, the first target After the storage space required for the image data is less than or equal to the maximum storage space that can be allocated by the first data buffer module, the image data is divided into the N pieces of the target image data.
  • the embodiment of the present invention may first determine the storage space available in the current cache, that is, the maximum storage space that can be allocated to the first data buffer module and the second data buffer module; and determine how many target image data the image data is divided into; A buffer space is allocated for the first data buffer module and the second data buffer module.
  • the available storage space of the current cache is 100 MB
  • the number of computing units is 10
  • the storage space required for image data is 200 MB
  • the maximum storage space that each computing unit can allocate is 100 MB/10, that is, 10 MB
  • the above-mentioned The maximum storage space of a data buffer module and the second data buffer module is 10 MB/2 or 5 MB
  • the image data can be divided into a plurality of target image data of 40, 50, 100, 200, and the like. After the image data is divided into the target image data, it is only necessary to ensure that the storage space required for the target image data is less than or equal to the number of target image data divided.
  • the number of image data divided into the target image data is determined according to the currently available storage space of the cache and the number of calculation units, and the cache space can be fully utilized to improve the computing capability.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Image Input (AREA)
  • Image Processing (AREA)

Abstract

一种卷积神经网络的数据调度方法、系统及计算机设备,该方法包括:将图像数据分成N个目标图像数据(101),所述N为大于1的整数,所述N个目标图像数据中包含第一目标图像数据和第二目标图像数据;将所述第一目标图像数据载入第一数据缓冲模块,计算单元读取所述第一数据缓冲模块存储的所述第一目标图像数据后进行卷积计算;在所述计算单元读取所述第一数据缓冲模块存储的所述第一目标图像数据后进行卷积计算的过程中,将所述第二目标图像数据载入第二数据缓冲模块(102);所述计算单元得到计算结果后将所述计算结果上传到外部存储(103)。所述方法可以减少处理图像数据时所需的存储空间以及图像数据的载入和上传时间。

Description

一种卷积神经网络的数据调度方法、系统及计算机设备
本申请要求于2016年12月23日提交中国专利局,申请号为201611205487.2、发明名称为“一种卷积神经网络的数据调度方法、系统及计算机设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及卷积神经网络技术领域,尤其涉及一种卷积神经网络的数据调度方法、系统及计算机设备。
背景技术
卷积神经网络(Convolutional Neural Network, CNN)是一种常见的深度学习架构,受生物自然视觉认知机制启发而来。20世纪90年代,LeCun et al等人发表论文,确立了CNN的现代架构。CNN能够得出原始图像的有效表征,这使得CNN能够直接从原始像素中,经过极少的预处理,识别视觉上面的规律。然而,由于当时缺乏大规模训练数据,计算机的计算能力也跟不上,CNN对于复杂问题的处理结果并不理想。
进入二十一世纪,随着计算机能力的大幅度提升,以及大数据的广泛应用,CNN的应用能力取得了重大突破,同时,CNN的平台也从中央处理器(Central Processing Unit, CPU)扩展到了图形处理器(Graphic Processing Unit, GPU)、现场可编程门级阵列(Field-Programmable Gate Array, FPGA)以及专用集成电路(Application Specific Integrated Circuit,ASIC)。常见的CNN的组成部分有:卷积层、池化层、全连接层。
卷积神经网络的硬件实现设计中,常见的是采用通用的计算单元来计算卷积,需要针对每个计算单元设计一块存储空间,用来存储数据和权重参数。存储空间大小由数据大小来决定。随着CNN模型复杂程度的提高,存储空间的限制问题越来越突出,限制了计算单元的增加,从而限制了计算能力的提高。另外,这种设计对数据的带宽要求也比较高,每次都必须把数据全部载入才能开始计算,而且还需要保存中间数据。
发明内容
本发明实施例提供了一种卷积神经网络的数据调度方法、系统及计算机设备,用于减少处理图像数据时所需的存储空间以及图像数据的载入和上传时间。
一方面本发明实施例提供了一种卷积神经网络的数据调度方法,包括:
将图像数据分成N个目标图像数据,所述N为大于1的整数,所述N个目标图像数据中包含第一目标图像数据和第二目标图像数据,所述第一目标图像数据和所述第二目标图像数据为相邻的所述目标图像数据;
将所述第一目标图像数据载入第一数据缓冲模块,计算单元读取所述第一数据缓冲模块存储的所述第一目标图像数据后进行卷积计算;在所述计算单元读取所述第一数据缓冲模块存储的所述第一目标图像数据后进行卷积计算的过程中,将所述第二目标图像数据载入第二数据缓冲模块;
所述计算单元得到计算结果后将所述计算结果上传到外部存储。
在一个可选的实现方式中,所述将图像数据分成N个目标图像数据包括:
将所述图像数据分成所述N个所需存储空间相等的所述目标图像数据;
所述将所述第一目标图像数据载入第一数据缓冲模块包括:
为所述第一数据缓冲模块和第二数据缓冲模块分配与所述目标图像数据所需存储空间相等的存储空间,将所述第一目标图像数据载入所述第一数据缓冲模块。
在一个可选的实现方式中,所述计算单元得到计算结果后将所述计算结果上传到外部存储包括:
所述计算单元读取所述第一数据缓冲模块存储的所述第一目标图像数据后进行卷积计算,得到所述计算结果后将所述计算结果上传到所述外部存储;
或者,所述计算单元将计算出的计算结果存储到缓存中,若所述缓存中存储的所述计算结果的数据量达到预设条件,则上传到所述外部存储。
在一个可选的实现方式中,所述计算单元读取所述第一数据缓冲模块存储的所述第一目标图像数据后进行卷积计算包括:
所述计算单元从缓存中读取所述图像数据的权重参数以及读取所述第一数据缓冲模块存储的所述第一目标图像数据后,进行卷积计算,得到计算结果,将所述计算结果存储到所述缓存。
在一个可选的实现方式中,在所述将所述图像数据分成N个所述目标图像数据之前,所述方法还包括:
根据缓存当前可用的存储空间以及计算单元的数量,确定第一数据缓冲模块和第二数据缓冲模块可分配的最大存储空间,所述第一数据缓冲模块和所述第二数据缓冲模块可分配的最大存储空间相同;
所述将图像数据分成N个目标图像数据包括:
确定若将所述图像数据分成所述N个数据量相等的所述目标图像数据后,所述第一目标图像数据所需的存储空间是否小于或等于所述第一数据缓冲模块可分配的最大存储空间;
若是,将所述图像数据分成所述N个所述目标图像数据。
二方面本发明实施例还提供了一种卷积神经网络的数据调度系统,包括:
划分模块,用于将图像数据分成N个目标图像数据,所述N为大于1的整数,所述N个目标图像数据中包含第一目标图像和第二目标图像,所述第一目标图像数据和所述第二目标图像数据为相邻的所述目标图像数据;
载入模块,用于将所述第一目标图像数据载入第一数据缓冲模块;在所述计算单元读取所述第一数据缓冲模块存储的所述第一目标图像数据后进行卷积计算的过程中,将所述第二目标图像数据载入第二数据缓冲模块;
计算单元,用于读取所述第一数据缓冲模块存储的所述第一目标图像数据后进行卷积计算;
第一数据缓冲模块,用于存储所述第一目标图像数据;
第二数据缓冲模块,用于存储所述第二目标图像数据;
上传模块,用于将所述计算单元得到的计算结果上传到外部存储。
在一个可选的实现方式中,所述划分模块,具体用于将所述图像数据分成所述N个数据量相等的所述目标图像数据;
所述载入模块,具体用于为所述第一数据缓冲模块和第二数据缓冲模块分配与所述目标图像数据所需存储空间相等的存储空间,将所述第一目标图像数据载入所述第一数据缓冲模块。
在一个可选的实现方式中,所述上传模块,具体用于在所述计算单元读取所述第一数据缓冲模块存储的所述第一目标图像数据后进行卷积计算,得到所述计算结果后,将所述计算结果上传到所述外部存储;
或者,具体用于在所述计算单元存储到缓存中的计算结果的数据量达到预设条件后,将所述计算结果上传到所述外部存储。
在一个可选的实现方式中,所述计算单元,具体用于从缓存中读取所述图像数据的权重参数以及读取所述第一数据缓冲模块存储的所述第一目标图像数据后,进行卷积计算,得到计算结果,将所述计算结果存储到所述缓存。
在一个可选的实现方式中,所述系统还包括:
确定模块,用于根据缓存当前可用的存储空间以及计算单元的数量,确定第一数据缓冲模块和第二数据缓冲模块可分配的最大存储空间,所述第一数据缓冲模块和所述第二数据缓冲模块可分配的最大存储空间相同;确定若将所述图像数据分成所述N个数据量相等的所述目标图像数据后,所述第一目标图像数据所需的存储空间是否小于或等于所述第一数据缓冲模块可分配的最大存储空间;
所述划分模块,还用于在所述确定模块确定所述第一目标图像数据所需的存储空间小于或等于所述第一数据缓冲模块可分配的最大存储空间后,将所述图像数据分成所述N个所述目标图像数据。
三方面本发明实施例还提供了一种计算机设备,包括:
存储器,存储可执行指令以及图像数据;
一个或多个处理器,与存储器通信以执行可执行指令从而完成以下操作:
将图像数据分成N个目标图像数据,所述N为大于1的整数,所述N个目标图像数据中包含第一目标图像数据和第二目标图像数据,所述第一目标图像数据和所述第二目标图像数据为相邻的目标图像数据;
将所述第一目标图像数据载入第一数据缓冲模块,计算单元读取所述第一数据缓冲模块存储的所述第一目标图像数据后进行卷积计算;在所述计算单元读取所述第一数据缓冲模块存储的所述第一目标图像数据后进行卷积计算的过程中,将所述第二目标图像数据载入第二数据缓冲模块;
所述计算单元得到计算结果后将所述计算结果上传到外部存储。
从以上技术方案可以看出,本发明实施例具有以下优点:将图像数据拆分成多个所需存储空间较小的目标图像数据,处理该目标图像数据所需的存储空间较少,解决了卷积神经网络硬件设计中存储空间的限制问题,可以提高计算能力;在进行卷积计算的过程中,动态地载入和上传数据可以有效地减少数据的载入和上传的时间。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简要介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本发明实施例一种卷积神经网络的数据调度方法流程示意图;
图2为本发明实施例图像数据存储的结构示意图;
图3为本发明实施例载入目标图像数据的流程示意图;
图4为本发明实施例另一种卷积神经网络的数据调度方法流程示意图;
图5为本发明实施例卷积神经网络的数据调度系统结构示意图;
图6为本发明实施例卷积神经网络的数据调度系统结构示意图;
图7为本发明实施例计算机设备结构示意图。
具体实施方式
为了使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明作进一步地详细描述,显然,所描述的实施例仅仅是本发明一部份实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。
本发明实施例提供了一种卷积神经网络的数据调度方法,如图1所示,包括:
101、将图像数据分成N个目标图像数据;
上述N为大于1的整数,上述N个目标图像数据中包含第一目标图像数据和第二目标图像数据,上述第一目标图像数据和上述第二目标图像数据为相邻的上述目标图像数据。如图2所示,左半图可以表示整个图像数据的存储情况,该图像数据的存储地址可以是连续的也可以不是连续的,该图像数据是按照相邻顺序依次存储的,右半图可以表示该图像数据分成4个目标图像数据的情况,上述4目标图像数据是相邻的即第一目标图像数据和第二目标图像相邻、第二目标图像数据和第三目标图像相邻、第三目标图像数据和第四目标图像相邻。上述图像数据是对采集到的原始图像数据进行预处理后得到的,每个目标图像数据可以表示该原始图像数据经过预处理后的一部分连续的数据。上述目标图像数据所需的存储空间是相同的。上述图像数据分成上述N个目标数据图像后,每个目标数据图像所需占据的存储空间是相等。
102、将上述第一目标图像数据载入第一数据缓冲模块,计算单元读取上述第一数据缓冲模块存储的上述第一目标图像数据后进行卷积计算;在上述计算单元读取上述第一数据缓冲模块存储的上述第一目标图像数据后进行卷积计算的过程中,将上述第二目标图像数据载入第二数据缓冲模块;
上述第一数据缓冲模块和上述第二数据缓冲模块都是缓存的一部分,且占据的缓存空间的大小相同。例如上述第一数据缓冲模块和上述第二数据缓冲模块都为缓存中的100MB的存储区域。由于缓存空间有限,图像数据可以存储在外部存储设备中,当需要处理上述图像数据时,将上述图像数据从上述外部存储设备中载入到上述第一数据缓冲模块和上述第二数据缓冲模块。上述第一数据缓冲模块和上述第二数据缓冲模块为乒乓缓存,可以进行输入和输出切换,如图3所示,图3中表示了上述第一数据缓冲模块和上述第二数据缓冲模块以及计算单元在三个时间段执行的操作,在第一时间段将上述第一目标图像数据载入第一数据缓冲模块,在这一时间段第二数据缓冲模块不执行载入目标图像数据的操作,也未存储有目标图像数据;在第二时间段,上述第二数据缓冲模块载入上述第二目标图像数据,并行地上述计算单元读取上述第一数据缓冲模块中的上述第一目标图像数据后进行卷积计算,并将计算结果上述到缓存;在第三时间段,上述第一数据缓冲模块载入第三目标图像数据,并行地上述计算单元读取上述第二数据缓冲模块中的上述第二目标图像数据后进行卷积计算,并将计算结果上述到上述缓存。上述三个时间段的时间长短可以是不同的。从图3可以看出,计算单元在读取目标图像数据进行卷积计算的过程中,上述第一数据缓冲模块或上述第二数据缓冲模块可以载入上述计算单元下一次计算所需的目标图像数据,节省了载入目标图像数据的时间。
103、上述计算单元得到计算结果后将上述计算结果上传到外部存储。
上述计算单元可以将每次计算得到的计算结果先上传到缓存,当缓存中存储的计算结果达到预设条件时,再上传到外部存储中;也可以直接上传到外部存储。例如,当缓存中的计算结果的数量达到100个时,将这100个计算结果上传到上述外部存储。
本发明实施例中,将图像数据拆分成多个所需存储空间较小的目标图像数据,处理该目标图像数据所需的存储空间较少,解决了卷积神经网络硬件设计中存储空间的限制问题,可以提高计算能力;在进行卷积计算的过程中,动态地载入和上传数据可以有效地减少数据的载入和上传的时间。
本发明实施例中,提出了一种将图像数据划分为多个目标图像数据的方法,具体如下:上述将图像数据分成N个目标图像数据包括:
将上述图像数据分成上述N个所需存储空间相等的上述目标图像数据;
上述将上述第一目标图像数据载入第一数据缓冲模块包括:
为上述第一数据缓冲模块和第二数据缓冲模块分配与上述目标图像数据所需存储空间相等的存储空间,将上述第一目标图像数据载入上述第一数据缓冲模块。
上述目标图像数据所需的存储空间是相等且上述每个目标图像数据中的数据都是连续的。在确定上述目标图像数据所需的存储空间后,检测缓存中可用的存储空间,为上述第一数据缓冲模块和第二数据缓冲模块分配与上述目标图像数据所需存储空间相等的存储空间,上述第一数据缓冲模块和上述第二数据缓冲模块交替执行载入操作,即当上述计算单元读取上述第一数据缓冲模块中的目标图像数据时,上述第二数据缓冲模块载入上述计算单元所需读取的下一个目标图像数据,这样可以有效节省数据的载入时间。另外,只需上述第一数据缓冲模块和上述第二数据缓冲模块就可以完成计算任务,所需的存储空间较少。将上述图像数据划分为多少个目标图像数据,即上述N的确定,可以根据上述图像数据的数据量确定。
本发明实施例中,先将图像数据分成N个所需存储空间相同的目标图像数据,再根据目标图像数据所需的存储空间为第一数据缓冲模块和第二数据缓冲模块分配缓存空间,可以有效地减少占用的存储空间。
本发明实施例在前实施例的基础上提出了一种将计算结果上传到外部存储的方法,具体如下:上述计算单元得到计算结果后将上述计算结果上传到外部存储包括:
上述计算单元读取上述第一数据缓冲模块存储的上述第一目标图像数据后进行卷积计算,得到上述计算结果后将上述计算结果上传到上述外部存储;
或者,上述计算单元将计算出的计算结果存储到缓存中,若上述缓存中存储的上述计算结果的数据量达到预设条件,则上传到上述外部存储。
上述计算单元可以得到一个计算结果就进行一次上传,这样可以节省缓存空间。上述计算结果也可以暂时存储在缓存中,当上述缓存中存储的上述计算结果的数据量达到预设条件时,再将上述结果上传到上述外部存储。由于得到一个计算结果就上传一次需要上传的次数较多,可以先将计算结果存储在缓存中,当上述缓存中的计算结果达到预设条件时,一次上传给上述外部存储。上述预设条件可以根据将计算结果上述到外部存储的速度以及计算单元将计算结果上传到缓存的速度确定。例如,计算单元每1秒上传5个计算结果到上述缓存中,上述缓存每1秒上传500个计算结果到上述外部存储中,上述缓存可以在存储的计算结果的数量达到500后,进行一次上传。
本发明实施例中提出了两种将计算单元得到的计算结果进行上传的方法,一种可以节省缓存空间,另一种可以减少上传的次数。
本发明实施例中,提出了计算单元进行卷积计算的方法,具体如下:
上述计算单元读取上述第一数据缓冲模块存储的上述第一目标图像数据后进行卷积计算包括:
上述计算单元从缓存中读取上述图像数据的权重参数以及读取上述第一数据缓冲模块存储的上述第一目标图像数据后,进行卷积计算,得到计算结果,将上述计算结果存储到上述缓存。
上述权重参数存储在缓存中,上述图像数据仅对应一份权重参数且权重参数的数据量较少。上述权重参数和上述目标图像数据是行数和列数都相同的矩阵。上述计算单元在读取上述目标图像数据和上述权重参数后,进行矩阵点乘计算,如果目标图像数据有多个输入层,那么需要将每一输入层的中间结果算出后,进行求和运算得到一个输出层的一个点的最终结果。上述中间结果为一个输入层和权重参数进行矩阵点乘计算后得到的。
本发明实施例中,每次计算一个目标图像数据可以有效减少占用的存储空间。
本发明实施例在前实施例的基础上,提出了另一种将图像数据分成N个目标图像数据的方法,具体如下:在上述将上述图像数据分成N个上述目标图像数据之前,上述方法还包括:
根据缓存当前可用的存储空间以及计算单元的数量,确定第一数据缓冲模块和第二数据缓冲模块可分配的最大存储空间,上述第一数据缓冲模块和上述第二数据缓冲模块可分配的最大存储空间相同;
上述将图像数据分成N个目标图像数据包括:
确定若将上述图像数据分成上述N个数据量相等的上述目标图像数据后,上述第一目标图像数据所需的存储空间是否小于或等于上述第一数据缓冲模块可分配的最大存储空间;
若是,将上述图像数据分成上述N个上述目标图像数据。
本发明实施例可以先确定当前缓存中可用的存储空间,即可以分配给上述第一数据缓冲模块和上述第二数据缓冲模块的最大存储空间;再确定上述图像数据分成多少个目标图像数据;最后为上述第一数据缓冲模块和上述第二数据缓冲模块分配缓存空间。
举例来说,当前缓存的可用存储空间为100MB、计算单元的数量为10个、图像数据所需的存储空间为200MB,每个计算单元可以分配的最大存储空间为100MB/10即10MB,上述第一数据缓冲模块和上述第二数据缓冲模块的最大存储空间为10MB/2即5MB,上述图像数据可以分成40、45、50、100、200等多个目标图像数据。若上述图像数据分为50个目标图像数据,每个目标图像数据所需的存储空间为4MB,需要为上述第一数据缓冲模块和上述第二数据缓冲模块的都分配4MB的缓存。上述图像数据分成目标图像数据后,只需保证目标图像数据所需的存储空间小于或等于即可,分成的目标图像数据的个数不作限定。
本发明实施例中,根据缓存当前可用的存储空间以及计算单元的数量确定图像数据分成目标图像数据的个数,可以充分利用缓存空间,提高计算能力。
本发明实施例中,提出了一种卷积神经网络的数据调度方法,如图4所示,可以包括以下步骤:
401、根据缓存当前可用的存储空间以及计算单元的数量,确定第一数据缓冲模块和第二数据缓冲模块可分配的最大存储空间;
402、确定若将图像数据分成上述N个数据量相等的目标图像数据后,第一目标图像数据所需的存储空间是否小于或等于上述第一数据缓冲模块可分配的最大存储空间;
上述图像数据包含第一目标图像数据和第二目标图像数据,上述第一目标图像数据和上述第二目标图像数据为相邻的上述目标图像数据。
若402的判断结果为是,则403、将上述图像数据分成上述N个上述目标图像数据;
404、将上述第一目标图像数据载入上述第一数据缓冲模块;
405、计算单元读取上述第一数据缓冲模块存储的上述第一目标图像数据后进行卷积计算,在上述计算单元读取上述第一数据缓冲模块存储的上述第一目标图像数据后进行卷积计算的过程中,将上述第二目标图像数据载入第二数据缓冲模块;
406、将上述计算单元得到的计算结果存储到上述缓存;
407、检测上述缓存中存储的上述计算结果的数据量是否达到预设条件,若达到上述预设条件,执行408,否则,执行406;
408、将上述计算结果上传到外部存储;
若402的判断结果为否,则执行409,调整上述N之后,执行402。
可以增大上述N,例如上述N原来为8,可以将N调整为9、10或者其他整数。
本发明实施例中,将图像数据拆分成多个所需存储空间较小的目标图像数据,处理该目标图像数据所需的存储空间较少,解决了卷积神经网络硬件设计中存储空间的限制问题,可以提高计算能力;在进行卷积计算的过程中,动态地载入和上传数据可以有效地减少数据的载入和上传的时间。
本发明实施例中,提出了一种卷积神经网络的数据调度系统,如图5所示,包括:
划分模块501,用于将图像数据分成N个目标图像数据,上述N为大于1的整数,上述N个目标图像数据中包含第一目标图像和第二目标图像,上述第一目标图像数据和上述第二目标图像数据为相邻的上述目标图像数据;
载入模块502,用于将上述第一目标图像数据载入第一数据缓冲模块;在上述计算单元读取上述第一数据缓冲模块存储的上述第一目标图像数据后进行卷积计算的过程中,将上述第二目标图像数据载入第二数据缓冲模块;
计算单元503,用于读取上述第一数据缓冲模块存储的上述第一目标图像数据后进行卷积计算;
第一数据缓冲模块504,用于存储上述第一目标图像数据;
第二数据缓冲模块505,用于存储上述第二目标图像数据;
上传模块506,用于将上述计算单元得到的计算结果上传到外部存储。
实现方法和图1中的方法相同,这里不作详述。
本发明实施例中,提出了一种将图像数据划分为多个目标图像数据的方法,具体如下:进一步地,上述划分501模块,具体用于将上述图像数据分成上述N个数据量相等的上述目标图像数据;
上述载入模块502,具体用于为上述第一数据缓冲模块和第二数据缓冲模块分配与上述目标图像数据所需存储空间相等的存储空间,将上述第一目标图像数据载入上述第一数据缓冲模块。
上述目标图像数据所需的存储空间是相等且上述每个目标图像数据中的数据都是连续的。在确定上述目标图像数据所需的存储空间后,检测缓存中可用的存储空间,为上述第一数据缓冲模块和第二数据缓冲模块分配与上述目标图像数据所需存储空间相等的存储空间,上述第一数据缓冲模块和上述第二数据缓冲模块交替执行载入操作,即当上述计算单元读取上述第一数据缓冲模块中的目标图像数据时,上述第二数据缓冲模块载入上述计算单元所需读取的下一个目标图像数据,这样可以有效节省数据的载入时间。另外,只需上述第一数据缓冲模块和上述第二数据缓冲模块就可以完成计算任务,所需的存储空间较少。将上述图像数据划分为多少个目标图像数据,即上述N的确定,可以根据上述图像数据的数据量确定。
本发明实施例中,先将图像数据分成N个上述存储空间相同的目标图像数据,再根据目标图像数据所需的存储空间为第一数据缓冲模块和第二数据缓冲模块分配缓存空间,可以有效地减少占用的存储空间。
本发明实施例在前实施例的基础上提出了一种将计算结果上传到外部存储的方法,具体如下:上述上传模块506,具体用于在上述计算单元读取上述第一数据缓冲模块存储的上述第一目标图像数据后进行卷积计算,得到上述计算结果后,将上述计算结果上传到上述外部存储;
或者,上述上传模块506具体用于在上述计算单元存储到缓存中的计算结果的数据量达到预设条件后,将上述计算结果上传到上述外部存储。
上述计算单元可以得到一个计算结果就进行一次上传,可以节省缓存空间。上述计算结果可以暂时存储在缓存中,当上述缓存中存储的上述计算结果的数据量达到预设条件时,再将上述结果上传到上述外部存储。由于得到一个计算结果就上传一次需要上传的次数较多,可以先将计算结果存储在缓存中,当上述缓存中的计算结果达到预设条件时,一次上传给上述外部存储。上述预设条件可以根据将计算结果上述到外部存储的速度以及计算单元将计算结果上传到缓存的速度确定。例如,计算单元每1秒上传5个计算结果到上述缓存中,上述缓存每1秒上传500个计算结果到上述外部存储中,上述缓存可以在存储的计算结果的数量达到500后,进行一次上传。
本发明实施例中提出了两种将计算单元得到的计算结果进行上传的方法,一种可以节省缓存空间,另一种可以减少上传的次数。
本发明实施例中,提出了计算单元进行卷积计算的方法,具体如下:
上述计算单元503,具体用于从缓存中读取上述图像数据的权重参数以及读取上述第一数据缓冲模块存储的上述第一目标图像数据后,进行卷积计算,得到计算结果,将上述计算结果存储到上述缓存。
上述权重参数存储在缓存中,上述图像数据仅对应一份权重参数且权重参数的数据量较少。上述权重参数和上述目标图像数据是行数和列数都相同的矩阵。上述计算单元在读取上述目标图像数据和上述权重参数后,进行矩阵点乘计算,如果目标图像数据有多个输入层,那么需要将每一输入层的中间结果算出后,进行求和运算得到一个输出层的一个点的最终结果。上述中间结果为一个输入层和权重参数进行矩阵点乘计算后得到的。
本发明实施例中,每次计算一个目标图像数据可以有效减少占用的存储空间。
本发明实施例在前实施例的基础上,提出了另一种将图像数据分成N个目标图像数据的方法,具体如下:进一步地,如图6所示,上述系统还包括:
确定模块601,用于根据缓存当前可用的存储空间以及计算单元的数量,确定第一数据缓冲模块和第二数据缓冲模块可分配的最大存储空间,上述第一数据缓冲模块和上述第二数据缓冲模块可分配的最大存储空间相同;确定若将上述图像数据分成上述N个数据量相等的上述目标图像数据后,上述第一目标图像数据所需的存储空间是否小于或等于上述第一数据缓冲模块可分配的最大存储空间;
上述划分模块501,还用于在上述确定模块601确定上述第一目标图像数据所需的存储空间小于或等于上述第一数据缓冲模块可分配的最大存储空间后,将上述图像数据分成上述N个上述目标图像数据。
本发明实施例可以先确定当前缓存中可用的存储空间,即可以分配给上述第一数据缓冲模块和上述第二数据缓冲模块的最大存储空间;再确定上述图像数据分成多少个目标图像数据;最后为上述第一数据缓冲模块和上述第二数据缓冲模块分配缓存空间。
举例来说,当前缓存的可用存储空间为100MB、计算单元的数量为10个、图像数据所需的存储空间为200MB,每个计算单元可以分配的最大存储空间为100MB/10即10MB,上述第一数据缓冲模块和上述第二数据缓冲模块的最大存储空间为10MB/2即5MB,上述图像数据可以分成40、50、100、200等多个目标图像数据。若上述图像数据分为50个目标图像数据,每个目标图像数据所需的存储空间为4MB,需要为上述第一数据缓冲模块和上述第二数据缓冲模块的都分配4MB的缓存。上述图像数据分成目标图像数据后,只需保证目标图像数据所需的存储空间小于或等于即可,分成的目标图像数据的个数不作限定。
本发明实施例中,根据缓存当前可用的存储空间以及计算单元的数量确定图像数据分成目标图像数据的个数,可以充分利用缓存空间,提高计算能力。
本发明实施例提出了一种计算机设备,如图7所示,包括:
存储器701,存储可执行指令以及图像数据;
处理器702,与存储器701通信以执行可执行指令从而完成以下操作:
将图像数据分成N个目标图像数据,上述N为大于1的整数,上述N个目标图像数据中包含第一目标图像数据和第二目标图像数据,上述第一目标图像数据和上述第二目标图像数据为相邻的目标图像数据;
将上述第一目标图像数据载入第一数据缓冲模块,计算单元读取上述第一数据缓冲模块存储的上述第一目标图像数据后进行卷积计算;在上述计算单元读取上述第一数据缓冲模块存储的上述第一目标图像数据后进行卷积计算的过程中,将上述第二目标图像数据载入第二数据缓冲模块;
上述计算单元得到计算结果后将上述计算结果上传到外部存储。
实现方法和图1中的方法相同,这里不做详述。
本发明实施例中,提出了一种将图像数据划分为多个目标图像数据的方法,具体如下:进一步地,上述处理器702,具体用于将上述图像数据分成上述N个数据量相等的上述目标图像数据;为上述第一数据缓冲模块和第二数据缓冲模块分配与上述目标图像数据所需存储空间相等的存储空间,将上述第一目标图像数据载入上述第一数据缓冲模块。
上述目标图像数据所需的存储空间是相等且上述每个目标图像数据中的数据都是连续的。在确定上述目标图像数据所需的存储空间后,检测缓存中可用的存储空间,为上述第一数据缓冲模块和第二数据缓冲模块分配与上述目标图像数据所需存储空间相等的存储空间,上述第一数据缓冲模块和上述第二数据缓冲模块交替执行载入操作,即当上述计算单元读取上述第一数据缓冲模块中的目标图像数据时,上述第二数据缓冲模块载入上述计算单元所需读取的下一个目标图像数据,这样可以有效节省数据的载入时间。另外,只需上述第一数据缓冲模块和上述第二数据缓冲模块就可以完成计算任务,所需的存储空间较少。将上述图像数据划分为多少个目标图像数据,即上述N的确定,可以根据上述图像数据的数据量确定。
本发明实施例中,先将图像数据分成N个上述存储空间相同的目标图像数据,再根据目标图像数据所需的存储空间为第一数据缓冲模块和第二数据缓冲模块分配缓存空间,可以有效地减少占用的存储空间。
本发明实施例在前实施例的基础上提出了一种将计算结果上传到外部存储的方法,具体如下:上述处理器702,具体用于在上述计算单元读取上述第一数据缓冲模块存储的上述第一目标图像数据后进行卷积计算,得到上述计算结果后,将上述计算结果上传到上述外部存储;或者,具体用于在上述计算单元存储到缓存中的计算结果的数据量达到预设条件后,将上述计算结果上传到上述外部存储。
上述计算单元可以得到一个计算结果就进行一次上传,可以节省缓存空间。上述计算结果可以暂时存储在缓存中,当上述缓存中存储的上述计算结果的数据量达到预设条件时,再将上述结果上传到上述外部存储。由于得到一个计算结果就上传一次需要上传的次数较多,可以先将计算结果存储在缓存中,当上述缓存中的计算结果达到预设条件时,一次上传给上述外部存储。上述预设条件可以根据将计算结果上述到外部存储的速度以及计算单元将计算结果上传到缓存的速度确定。例如,计算单元每1秒上传5个计算结果到上述缓存中,上述缓存每1秒上传500个计算结果到上述外部存储中,上述缓存可以在存储的计算结果的数量达到500后,进行一次上传。
本发明实施例中提出了两种将计算单元得到的计算结果进行上传的方式,一种可以节省缓存空间,另一种可以减少上传的次数。
本发明实施例中,提出了计算单元进行卷积计算的方法,具体如下:
上述处理器702,具体用于从缓存中读取上述图像数据的权重参数以及读取上述第一数据缓冲模块存储的上述第一目标图像数据后,进行卷积计算,得到计算结果,将上述计算结果存储到上述缓存。
上述权重参数存储在缓存中,上述图像数据仅对应一份权重参数且权重参数的数据量较少。上述权重参数和上述目标图像数据是行数和列数都相同的矩阵。上述计算单元在读取上述目标图像数据和上述权重参数后,进行矩阵点乘计算,如果目标图像数据有多个输入层,那么需要将每一输入层的中间结果算出后,进行求和运算得到一个输出层的一个点的最终结果。上述中间结果为一个输入层和权重参数进行矩阵点乘计算后得到的。
本发明实施例中,每次计算一个目标图像数据可以有效减少占用的存储空间。
本发明实施例在前实施例的基础上,提出了另一种将图像数据分成N个目标图像数据的方法,具体如下:上述处理器702,还用于根据缓存当前可用的存储空间以及计算单元的数量,确定第一数据缓冲模块和第二数据缓冲模块可分配的最大存储空间,上述第一数据缓冲模块和上述第二数据缓冲模块可分配的最大存储空间相同;确定若将上述图像数据分成上述N个数据量相等的上述目标图像数据后,上述第一目标图像数据所需的存储空间是否小于或等于上述第一数据缓冲模块可分配的最大存储空间;在上述确定模块确定上述第一目标图像数据所需的存储空间小于或等于上述第一数据缓冲模块可分配的最大存储空间后,将上述图像数据分成上述N个上述目标图像数据。
本发明实施例可以先确定当前缓存中可用的存储空间,即可以分配给上述第一数据缓冲模块和上述第二数据缓冲模块的最大存储空间;再确定上述图像数据分成多少个目标图像数据;最后为上述第一数据缓冲模块和上述第二数据缓冲模块分配缓存空间。
举例来说,当前缓存的可用存储空间为100MB、计算单元的数量为10个、图像数据所需的存储空间为200MB,每个计算单元可以分配的最大存储空间为100MB/10即10MB,上述第一数据缓冲模块和上述第二数据缓冲模块的最大存储空间为10MB/2即5MB,上述图像数据可以分成40、50、100、200等多个目标图像数据。上述图像数据分成目标图像数据后,只需保证目标图像数据所需的存储空间小于或等于即可,分成的目标图像数据的个数不作限定。
本发明实施例中,根据缓存当前可用的存储空间以及计算单元的数量确定图像数据分成目标图像数据的个数,可以充分利用缓存空间,提高计算能力。
以上仅为本发明较佳的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明实施例揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应该以权利要求的保护范围为准。

Claims (11)

  1. 一种卷积神经网络的数据调度方法,其特征在于,包括:
    将图像数据分成N个目标图像数据,所述N为大于1的整数,所述N个目标图像数据中包含第一目标图像数据和第二目标图像数据,所述第一目标图像数据和所述第二目标图像数据为相邻的所述目标图像数据;
    将所述第一目标图像数据载入第一数据缓冲模块,计算单元读取所述第一数据缓冲模块存储的所述第一目标图像数据后进行卷积计算;在所述计算单元读取所述第一数据缓冲模块存储的所述第一目标图像数据后进行卷积计算的过程中,将所述第二目标图像数据载入第二数据缓冲模块;
    所述计算单元得到计算结果后将所述计算结果上传到外部存储。
  2. 根据权利要求1所述方法,所述将图像数据分成N个目标图像数据包括:
    将所述图像数据分成所述N个所需存储空间相等的所述目标图像数据;
    所述将所述第一目标图像数据载入第一数据缓冲模块包括:
    为所述第一数据缓冲模块和第二数据缓冲模块分配与所述目标图像数据所需存储空间相等的存储空间,将所述第一目标图像数据载入所述第一数据缓冲模块。
  3. 根据权利要求1或2所述方法,所述计算单元得到计算结果后将所述计算结果上传到外部存储包括:
    所述计算单元读取所述第一数据缓冲模块存储的所述第一目标图像数据后进行卷积计算,得到所述计算结果后将所述计算结果上传到所述外部存储;
    或者,所述计算单元将计算出的计算结果存储到缓存中,若所述缓存中存储的所述计算结果的数据量达到预设条件,则上传到所述外部存储。
  4. 根据权利要求1所述方法,所述计算单元读取所述第一数据缓冲模块存储的所述第一目标图像数据后进行卷积计算包括:
    所述计算单元从缓存中读取所述图像数据的权重参数以及读取所述第一数据缓冲模块存储的所述第一目标图像数据后,进行卷积计算,得到计算结果,将所述计算结果存储到所述缓存。
  5. 根据权利要求1所述方法,在所述将所述图像数据分成N个所述目标图像数据之前,所述方法还包括:
    根据缓存当前可用的存储空间以及计算单元的数量,确定第一数据缓冲模块和第二数据缓冲模块可分配的最大存储空间,所述第一数据缓冲模块和所述第二数据缓冲模块可分配的最大存储空间相同;
    所述将图像数据分成N个目标图像数据包括:
    确定若将所述图像数据分成所述N个数据量相等的所述目标图像数据后,所述第一目标图像数据所需的存储空间是否小于或等于所述第一数据缓冲模块可分配的最大存储空间;
    若是,将所述图像数据分成所述N个所述目标图像数据。
  6. 一种卷积神经网络的数据调度系统,其特征在于,包括:
    划分模块,用于将图像数据分成N个目标图像数据,所述N为大于1的整数,所述N个目标图像数据中包含第一目标图像和第二目标图像,所述第一目标图像数据和所述第二目标图像数据为相邻的所述目标图像数据;
    载入模块,用于将所述第一目标图像数据载入第一数据缓冲模块;在所述计算单元读取所述第一数据缓冲模块存储的所述第一目标图像数据后进行卷积计算的过程中,将所述第二目标图像数据载入第二数据缓冲模块;
    计算单元,用于读取所述第一数据缓冲模块存储的所述第一目标图像数据后进行卷积计算;
    第一数据缓冲模块,用于存储所述第一目标图像数据;
    第二数据缓冲模块,用于存储所述第二目标图像数据;
    上传模块,用于将所述计算单元得到的计算结果上传到外部存储。
  7. 根据权利要求6所述系统,其特征在于,
    所述划分模块,具体用于将所述图像数据分成所述N个数据量相等的所述目标图像数据;
    所述载入模块,具体用于为所述第一数据缓冲模块和第二数据缓冲模块分配与所述目标图像数据所需存储空间相等的存储空间,将所述第一目标图像数据载入所述第一数据缓冲模块。
  8. 根据权利要求6或7所述系统,其特征在于,
    所述上传模块,具体用于在所述计算单元读取所述第一数据缓冲模块存储的所述第一目标图像数据后进行卷积计算,得到所述计算结果后,将所述计算结果上传到所述外部存储;
    或者,具体用于在所述计算单元存储到缓存中的计算结果的数据量达到预设条件后,将所述计算结果上传到所述外部存储。
  9. 根据权利要求6所述系统,其特征在于,
    所述计算单元,具体用于从缓存中读取所述图像数据的权重参数以及读取所述第一数据缓冲模块存储的所述第一目标图像数据后,进行卷积计算,得到计算结果,将所述计算结果存储到所述缓存。
  10. 根据权利要求6所述系统,其特征在于,所述系统还包括:
    确定模块,用于根据当前可用的存储空间以及计算单元的数量,确定第一数据缓冲模块和第二数据缓冲模块可分配的最大存储空间,所述第一数据缓冲模块和所述第二数据缓冲模块可分配的最大存储空间相同;确定若将所述图像数据分成所述N个数据量相等的所述目标图像数据后,所述第一目标图像数据所需的存储空间是否小于或等于所述第一数据缓冲模块可分配的最大存储空间;
    所述划分模块,还用于在所述确定模块确定所述第一目标图像数据所需的存储空间小于或等于所述第一数据缓冲模块可分配的最大存储空间后,将所述图像数据分成所述N个所述目标图像数据。
  11. 一种计算机设备,包括存储器以及处理器,所述存储器存储可执行指令以及图像数据,其特征在于,所述处理器与所述存储器通信以执行所述可执行指令从而执行如权利要求1至5中任意一项所述的一种卷积神经网络的数据调度方法。
PCT/CN2017/090792 2016-12-23 2017-06-29 一种卷积神经网络的数据调度方法、系统及计算机设备 WO2018113239A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611205487.2A CN106874219B (zh) 2016-12-23 2016-12-23 一种卷积神经网络的数据调度方法、系统及计算机设备
CN201611205487.2 2016-12-23

Publications (1)

Publication Number Publication Date
WO2018113239A1 true WO2018113239A1 (zh) 2018-06-28

Family

ID=59164919

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/090792 WO2018113239A1 (zh) 2016-12-23 2017-06-29 一种卷积神经网络的数据调度方法、系统及计算机设备

Country Status (2)

Country Link
CN (1) CN106874219B (zh)
WO (1) WO2018113239A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20210044669A (ko) * 2018-08-28 2021-04-23 캠브리콘 테크놀로지스 코퍼레이션 리미티드 데이터 전처리 방법, 장치, 컴퓨터 설비 및 저장 매체
CN112732601A (zh) * 2018-08-28 2021-04-30 中科寒武纪科技股份有限公司 数据预处理方法、装置、计算机设备和存储介质
CN113536081A (zh) * 2021-06-25 2021-10-22 浙江海瑞网络科技有限公司 基于人工智能的数据中心数据管理方法及系统

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106874219B (zh) * 2016-12-23 2018-11-02 深圳云天励飞技术有限公司 一种卷积神经网络的数据调度方法、系统及计算机设备
CN108038815B (zh) * 2017-12-20 2019-12-17 深圳云天励飞技术有限公司 集成电路
CN108133270B (zh) * 2018-01-12 2020-08-04 清华大学 卷积神经网络加速方法及装置
CN108564524A (zh) * 2018-04-24 2018-09-21 开放智能机器(上海)有限公司 一种视觉图像的卷积计算优化方法
CN110032538B (zh) * 2019-03-06 2020-10-02 上海熠知电子科技有限公司 一种数据读取系统和方法
CN111832585B (zh) * 2019-04-16 2023-04-18 杭州海康威视数字技术股份有限公司 图像处理的方法和装置
CN110390626A (zh) * 2019-07-02 2019-10-29 深兰科技(上海)有限公司 一种卷积神经网络的图像处理方法及装置
WO2021179286A1 (zh) * 2020-03-13 2021-09-16 深圳市大疆创新科技有限公司 卷积神经网络的数据处理方法、预测方法、计算装置和存储介质
CN113537448A (zh) * 2020-04-22 2021-10-22 杭州智芯科微电子科技有限公司 流式数据处理的方法、装置、半导体芯片和计算机设备
CN111666150B (zh) * 2020-05-09 2022-01-11 深圳云天励飞技术股份有限公司 存储空间的分配方法、装置、终端及计算机可读存储介质
CN114090470B (zh) * 2020-07-29 2023-02-17 深圳市中科元物芯科技有限公司 数据预加载装置及其预加载方法、存储介质和计算机设备
CN112099943B (zh) * 2020-08-13 2024-05-03 深圳云天励飞技术股份有限公司 内存分配方法及相关设备
CN115271047A (zh) * 2021-04-29 2022-11-01 华为技术有限公司 一种数据处理方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101303668A (zh) * 2008-07-10 2008-11-12 北京海尔集成电路设计有限公司 一种数据转置的方法和系统
CN103236033A (zh) * 2013-04-16 2013-08-07 重庆绿色智能技术研究院 基于嵌入式处理器的积分图快速生成方法和装置
CN104077233A (zh) * 2014-06-18 2014-10-01 百度在线网络技术(北京)有限公司 单通道卷积层及多通道卷积层处理方法和装置
CN106156793A (zh) * 2016-06-27 2016-11-23 西北工业大学 结合深层特征提取和浅层特征提取的医学图像分类方法
CN106874219A (zh) * 2016-12-23 2017-06-20 深圳云天励飞技术有限公司 一种卷积神经网络的数据调度方法、系统及计算机设备

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2804144A1 (en) * 2013-05-16 2014-11-19 SMR Patents S.à.r.l. Method and device for processing input image data
JP6365102B2 (ja) * 2014-08-14 2018-08-01 富士ゼロックス株式会社 データ処理装置およびプログラム
CN105550222B (zh) * 2015-12-07 2019-04-05 中国电子科技网络信息安全有限公司 一种基于分布式存储的图像服务系统及方法
CN105528758B (zh) * 2016-01-12 2018-12-14 武汉精测电子集团股份有限公司 基于可编程逻辑器件的图像重映射方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101303668A (zh) * 2008-07-10 2008-11-12 北京海尔集成电路设计有限公司 一种数据转置的方法和系统
CN103236033A (zh) * 2013-04-16 2013-08-07 重庆绿色智能技术研究院 基于嵌入式处理器的积分图快速生成方法和装置
CN104077233A (zh) * 2014-06-18 2014-10-01 百度在线网络技术(北京)有限公司 单通道卷积层及多通道卷积层处理方法和装置
CN106156793A (zh) * 2016-06-27 2016-11-23 西北工业大学 结合深层特征提取和浅层特征提取的医学图像分类方法
CN106874219A (zh) * 2016-12-23 2017-06-20 深圳云天励飞技术有限公司 一种卷积神经网络的数据调度方法、系统及计算机设备

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20210044669A (ko) * 2018-08-28 2021-04-23 캠브리콘 테크놀로지스 코퍼레이션 리미티드 데이터 전처리 방법, 장치, 컴퓨터 설비 및 저장 매체
CN112732601A (zh) * 2018-08-28 2021-04-30 中科寒武纪科技股份有限公司 数据预处理方法、装置、计算机设备和存储介质
EP3640810A4 (en) * 2018-08-28 2021-05-05 Cambricon Technologies Corporation Limited DATA PRE-PROCESSING PROCESS AND APPARATUS, COMPUTER DEVICE AND STORAGE MEDIA
US11243895B2 (en) 2018-08-28 2022-02-08 Cambricon Technologies Corporation Limited Data pre-processing method and device, and related computer device and storage medium
KR102519467B1 (ko) 2018-08-28 2023-04-06 캠브리콘 테크놀로지스 코퍼레이션 리미티드 데이터 전처리 방법, 장치, 컴퓨터 설비 및 저장 매체
US11966583B2 (en) 2018-08-28 2024-04-23 Cambricon Technologies Corporation Limited Data pre-processing method and device, and related computer device and storage medium
CN112732601B (zh) * 2018-08-28 2024-06-18 中科寒武纪科技股份有限公司 数据预处理方法、装置、计算机设备和存储介质
CN113536081A (zh) * 2021-06-25 2021-10-22 浙江海瑞网络科技有限公司 基于人工智能的数据中心数据管理方法及系统

Also Published As

Publication number Publication date
CN106874219A (zh) 2017-06-20
CN106874219B (zh) 2018-11-02

Similar Documents

Publication Publication Date Title
WO2018113239A1 (zh) 一种卷积神经网络的数据调度方法、系统及计算机设备
WO2018068533A1 (zh) 一种人脸检测的方法及装置
WO2019194465A1 (ko) 뉴럴 네트워크 프로세서
WO2019098538A1 (en) Device and method for processing convolution operation using kernel
WO2016099036A1 (ko) 메모리 접근 방법 및 장치
WO2019216513A1 (ko) 행 단위 연산 뉴럴 프로세서 및 이를 이용한 데이터 처리 방법
WO2021091022A1 (ko) 머신 러닝 시스템 및 머신 러닝 시스템의 동작 방법
WO2021153969A1 (en) Methods and systems for managing processing of neural network across heterogeneous processors
WO2021137415A1 (ko) 머신 러닝에 기반한 이미지 처리 방법 및 장치
WO2016195422A1 (en) Method and apparatus for managing memory
WO2019088470A1 (en) Processor and control methods thereof
WO2023080333A1 (ko) 인공지능 코어, 인공지능 코어 시스템 및 인공지능 코어 시스템의 로드/스토어 방법
WO2020159185A1 (en) Electronic device and control method thereof
WO2021072860A1 (zh) 视频解码方法、装置、设备及计算机可读存储介质
WO2020141720A1 (en) Apparatus and method for managing application program
WO2023003246A1 (ko) 멀티레벨 룩업테이블을 이용한 함수근사 장치 및 방법
WO2020138630A1 (en) Display apparatus and image processing method thereof
WO2021246586A1 (ko) 하드웨어 가속기를 위한 파라미터를 메모리로부터 액세스하는 방법 및 이를 이용한 장치
WO2022004970A1 (ko) 신경망 기반의 특징점 학습 장치 및 방법
WO2016208806A1 (ko) 블록 구조를 이용한 적분 영상 생성 장치 및 그 방법
WO2023013817A1 (ko) 브로드캐스팅 멀티플라이 최적화 방법 및 이를 이용한 하드웨어 가속기, 이를 이용한 컴퓨팅 장치
WO2024185924A1 (ko) 외부환경의 변화를 고려한 딥러닝 신경망 모델의 양자화 방법 및 장치
WO2022250211A1 (ko) 인티져 타입 데이터의 해상도를 증가시키는 연산방법 및 이를 적용한 장치
WO2023085535A1 (ko) 1차원 어레이 풀링 방법 및 이를 위한 장치
WO2021020762A1 (en) Processor and control method thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17885253

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 251019)

122 Ep: pct application non-entry in european phase

Ref document number: 17885253

Country of ref document: EP

Kind code of ref document: A1