WO2020248499A1 - 基于卷积神经网络的显存处理方法、装置及存储介质 - Google Patents

基于卷积神经网络的显存处理方法、装置及存储介质 Download PDF

Info

Publication number
WO2020248499A1
WO2020248499A1 PCT/CN2019/118467 CN2019118467W WO2020248499A1 WO 2020248499 A1 WO2020248499 A1 WO 2020248499A1 CN 2019118467 W CN2019118467 W CN 2019118467W WO 2020248499 A1 WO2020248499 A1 WO 2020248499A1
Authority
WO
WIPO (PCT)
Prior art keywords
storage space
data
temporary storage
video memory
neural network
Prior art date
Application number
PCT/CN2019/118467
Other languages
English (en)
French (fr)
Inventor
张萌
唐义君
高鹏
郑强
谢国彤
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Priority to JP2021506309A priority Critical patent/JP7174831B2/ja
Publication of WO2020248499A1 publication Critical patent/WO2020248499A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/14Digital output to display device ; Cooperation and interconnection of the display device with other functional units
    • G06F3/1407General aspects irrespective of display type, e.g. determination of decimal point position, display with fixed or driving decimal point, suppression of non-significant zeros
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/4401Bootstrapping
    • G06F9/4411Configuring for operating with peripheral devices; Loading of device drivers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • This application relates to the technical field of convolutional neural networks, and in particular to a method, device and storage medium for video memory processing based on convolutional neural networks.
  • the video memory is a temporary memory of the GPU display core, which is used to temporarily store the core data that needs to be processed.
  • the role is the same as the role of CPU and memory.
  • the size of the video memory capacity determines the ability of the video memory to temporarily store data.
  • the large video memory can reduce the number of times to read data and reduce latency. The applicant realizes that in the current training process of the convolutional neural network model, the input and output data of the model are repeatedly stored in different video memory spaces, resulting in unnecessary video memory overhead, reducing the number of batch processing for model training, and thus affecting the accuracy of model training .
  • the Concat layer and the Addition layer are some layers commonly used in deep learning classification networks and target detection networks.
  • the Concat layer is used to merge multiple input data in feature dimensions
  • the addition layer is used to accumulate multiple input data.
  • Existing deep learning network training frameworks such as Caffe, TensorFlow, etc., do not optimize the video memory of the Concat and Addition layers, so that input and output data are repeatedly stored in different video memory spaces, which brings unnecessary video memory overhead and leads to model training The number of batch processing is reduced, thereby affecting the accuracy of model training.
  • the video memory space will also limit the search space for optimization solutions of the automated machine learning technology autoML.
  • This application provides a video memory processing method, device, and computer-readable storage medium based on a convolutional neural network. Its main purpose is to create a shared temporary space and read or write data to the corresponding data type and instructions according to the type of data to be processed. In the temporary storage space, compared with the existing framework, users can freely mix and match various modules to form a new CNN structure, which can save a lot of video memory and improve the parallelism of GPU computing.
  • the present application provides a video memory processing method based on a convolutional neural network, which is applied to an electronic device, and the method includes:
  • the temporary storage space being a storage space for temporarily storing input data, output data, input errors, and output errors;
  • the data in the temporary storage space retrieved is written into the designated external storage space.
  • This application also provides a video memory processing system based on a convolutional neural network, including:
  • the space creation unit is used to create a temporary storage space, where the temporary storage space is a storage space for temporarily storing input data, output data, input errors, and output errors;
  • a data retrieval unit configured to retrieve a temporary storage space corresponding to the data to be processed according to the type and direction of the data to be processed, and read the data to be processed into the retrieved temporary storage space;
  • a preprocessing unit configured to perform preset processing on the data to be processed in the retrieved temporary storage space
  • the data writing unit is used to write the data in the temporary storage space retrieved into the designated external storage space according to the type and direction of the processed data.
  • the present application also provides an electronic device, the electronic device includes: a memory and a processor, the memory includes a convolutional neural network-based display memory processing program, the convolutional neural network-based display memory When the processing program is executed by the processor, the steps of the foregoing display memory processing method based on the convolutional neural network are realized.
  • the present application also provides a computer-readable storage medium, the computer-readable storage medium includes a convolutional neural network-based video memory processing program, the convolutional neural network-based video memory processing program is When the processor is executed, the steps of the video memory processing method based on the convolutional neural network as described above are implemented.
  • the video memory processing method, system, electronic device, and computer readable storage medium based on the convolutional neural network proposed in this application by setting a shared temporary storage space, call the corresponding temporary storage space according to the type of data to be processed and instructions, And read or write data to the corresponding temporary storage space for calculation processing, which can be applied to the CNN algorithm.
  • Dense, Residual, and Inception modules can be freely mixed to form a new CNN structure, which can save about half of the video memory. It also improves the parallelism of GPU computing.
  • FIG. 1 is a schematic diagram of an application environment of a video memory processing method based on a convolutional neural network implemented according to the present application;
  • FIG. 2 is a schematic diagram of modules of a specific embodiment of a video memory processing program based on a convolutional neural network in FIG. 1;
  • Figure 3 is a schematic diagram of part of the existing CNN network structure
  • FIG. 4 is a schematic diagram of a part of the structure of FIG. 3 after optimized processing of video memory
  • FIG. 5 is a flowchart of a video memory processing method based on a convolutional neural network according to an embodiment of the present application
  • Fig. 6 is a schematic diagram of the logical structure of a video memory processing system based on a convolutional neural network according to an embodiment of the present application.
  • This application provides a video memory processing method based on a convolutional neural network, which is applied to an electronic device 1.
  • 1 is a schematic diagram of the application environment of the preferred embodiment of the video memory processing method based on the convolutional neural network of this application.
  • the electronic device 1 may be a terminal device with arithmetic function, such as a server, a smart phone, a tablet computer, a portable computer, a desktop computer, and the like.
  • the electronic device 1 includes a processor 12, a memory 11, a network interface 14 and a communication bus 15.
  • the memory 11 includes at least one type of readable storage medium, and may be a non-volatile storage medium such as a flash memory, a hard disk, a multimedia card, and a card-type memory 11.
  • the readable storage medium may be an internal storage unit of the electronic device 1, such as a hard disk of the electronic device 1.
  • the readable storage medium may also be the external memory 11 of the electronic device 1, for example, a plug-in hard disk equipped on the electronic device 1, a smart media card (SMC), a secure digital ( Secure Digital, SD card, Flash Card, etc.
  • the readable storage medium of the memory 11 is generally used to store the video memory processing program 10 based on the convolutional neural network installed in the electronic device 1 and the like.
  • the memory 11 can also be used to temporarily store data that has been output or will be output.
  • the processor 12 may be a central processing unit (CPU), a microprocessor or other data processing chip, which is used to run the program code or process data stored in the memory 11, for example, perform convolution-based Neural network memory processing program 10 etc.
  • CPU central processing unit
  • microprocessor or other data processing chip, which is used to run the program code or process data stored in the memory 11, for example, perform convolution-based Neural network memory processing program 10 etc.
  • the network interface 14 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface), and is generally used to establish a communication connection between the electronic device 1 and other electronic devices.
  • the communication bus 15 is used to realize the connection and communication between these components.
  • FIG. 1 only shows the electronic device 1 with the components 11-15, but it should be understood that it is not required to implement all the illustrated components, and more or fewer components may be implemented instead.
  • the electronic device 1 may include a user interface, a display, and a touch sensor.
  • the user interface may include an input unit such as a keyboard (Keyboard), a voice input device such as a microphone (microphone) and other devices with voice recognition functions, and a voice output device such as audio, earphones, etc. .
  • the display may be an LED display, a liquid crystal display, a touch liquid crystal display, an organic light-emitting diode (Organic Light-Emitting Diode, OLED) touch device, etc.
  • the touch sensor may be a resistive touch sensor, a capacitive touch sensor, etc.
  • the touch sensor includes not only a contact type touch sensor, but also a proximity type touch sensor and the like.
  • the touch sensor may be a single sensor, or may be, for example, a plurality of sensors arranged in an array.
  • the electronic device 1 may also include a radio frequency (RF) circuit, a sensor, an audio circuit, etc., which will not be repeated here.
  • RF radio frequency
  • the memory 11 as a computer storage medium may include an operating system, and a video memory processing program 10 based on a convolutional neural network, etc.; wherein, the processor 12 executes the memory 11 stored in the memory 11
  • the video memory processing program 10 based on convolutional neural network implements the following steps:
  • Step 1 Create a temporary storage space, the temporary storage space is a storage space for temporarily storing input data, output data, input errors, and output errors;
  • Step 2 According to the type and direction of the data to be processed, retrieve the temporary storage space corresponding to the data to be processed, and read the data to be processed into the retrieved temporary storage space;
  • Step 3 Perform preset processing on the data to be processed in the temporary storage space retrieved
  • Step 4 Write the retrieved data in the temporary storage space into the designated external storage space according to the type and direction of the processed data.
  • the temporary storage space is the storage space for temporarily storing input data, output data, input error and output error; the corresponding temporary storage space includes input data temporary storage space, output data temporary storage space, input error temporary storage space and Temporary storage space for output errors.
  • the temporary storage space can be set in the video memory.
  • the video memory is used to store models or data.
  • Common graphics cards mainly include the following:
  • G Graphics card Video memory
  • Tflops GeForce GTX 1080 8 8.2 GeForce GTX 1080 Ti 11 10.6 Nvidia TITAN X 12 10.2 Nvidia TITAN Xp 12 10.8 GeForce GTX 1080 Titan 12 4.5 K80 GPU Accelerator 12 5.6-8.8
  • the storage units of video memory mainly include the following:
  • Int is an integer value
  • long is a long integer value
  • float is a floating-point value (single is a single-precision floating-point value, and double is a double-precision floating-point value).
  • step 2 when the type of the data to be processed is error and the direction is output, the corresponding output error temporary storage space can be called according to the output error data, and the output error is read into the output error temporary storage space for processing .
  • Performing preset processing on the to-be-processed data includes: performing at least one of convolution processing, superposition processing, multiplication processing, or integral operation on the to-be-processed data.
  • the data when the data is convolved, it is mainly the result of the summation of two variables after being multiplied within a certain range. If the variables of the convolution are the sequence x(n) and h(n), the result of the convolution,
  • * means convolution.
  • the calculation method of sum is called convolution sum, or convolution for short.
  • n is the amount by which h(-i) is displaced, and different n corresponds to different convolution results.
  • p is an integral variable
  • integral is also a summation
  • t is the amount of displacement of the function h(-p)
  • * means convolution.
  • step 4 the step of writing the data in the temporary storage space called into the designated external storage space includes: writing the processed data in the temporary storage space into the designated external storage space according to the configured writing mode.
  • the writing mode includes the Addition mode and the Concat mode.
  • the type of data includes input data, output data, input error, and output error; the direction of the data includes input and output.
  • different methods can be used to write data to the designated memory space according to the write mode (Addition ⁇ Concat) configured by the user. For example: when the user configures the Addition mode, the data in the corresponding temporary storage space is written into the designated storage space in a cumulative manner; when the user configures the Concat mode, according to the data length information configured by the user, the corresponding temporary storage space is Data is written into the designated storage space at regular intervals.
  • the video memory occupancy of the entire neural network passes: model video memory + batch size * video memory occupancy of each sample. In the case of a small model, it is approximately equal to batch size * video memory occupancy of each sample.
  • the Concat and Addition layers can be optimized for the video memory. For example, multiple input data can be merged in the corresponding temporary storage space. Accumulate processing and so on in the corresponding temporary storage space.
  • Figure 3 shows part of the structure of an existing CNN network that has not been optimized for video memory.
  • the backward propagation is not considered for the time being.
  • the input data size of the convolutional layer is 32*32*3. If the batch size is 5, the input data size of the layer is 32* 32*3*5, the calculation method of each input and output data size is the same as above. Therefore, if the data is represented by a float, the video memory that needs to be consumed by the part of the CNN network that has not been optimized is 1980kb.
  • the video memory processing method based on the convolutional neural network of the present application is used to optimize the video memory of the above part, and the optimized structure diagram is shown in FIG. 4.
  • the size or size of the temporary storage space is set to the maximum size of the output data of the convolutional layer in the CNN network, which is 32*32 in this embodiment. *16.
  • the output data of the convolutional layer in the dashed box in Figure 4 does not allocate actual video memory space, but calls temporary storage space for output data.
  • the CNN network part needs to consume 1340kb of video memory, which can save 32.3% of the video memory.
  • the electronic device 1 by setting a shared temporary storage space, retrieves the corresponding temporary storage space according to the type of data to be processed and instructions, and reads or writes the data to the corresponding temporary storage space for calculation processing It can be applied to the CNN algorithm.
  • Dense, Residual, and Inception modules can be mixed and matched to form a new CNN structure, which can save about half of the video memory and improve the parallelism of GPU computing.
  • the video memory processing program 10 based on the convolutional neural network may also be provided with a shared temporary storage space manager, which contains temporary storage space for temporarily storing input data, output data, input errors, and output errors.
  • the manager provides some sub-modules for acquiring and operating the corresponding temporary storage space.
  • One or more modules are stored in the memory 11 and executed by the processor 12 to complete the application.
  • the module referred to in this application refers to a series of computer program instruction segments that can complete specific functions.
  • FIG. 2 it is a program module diagram of a preferred embodiment of a video memory processing program 10 based on a convolutional neural network in FIG. 1.
  • the video memory processing program 10 based on the convolutional neural network can be divided into:
  • Temporary space acquisition submodule 210 According to the data type (data or error) and direction (input or output) input by the module, return the corresponding temporary storage space.
  • the sub-module retrieves and outputs the corresponding: output error temporary storage space.
  • Data reading sub-module 220 According to the data type (data or error) and direction (input or output) input to the data reading sub-module, read the data in the designated storage space into the corresponding temporary storage space, and output the data Temporary storage space.
  • the above-mentioned designated space mainly refers to the storage space where the data to be processed currently exists, and the data to be processed is read from the designated space to the temporary storage space for processing, the same below.
  • Data writing submodule 230 According to the data type (data or error) and direction (input or output) input by the data writing submodule, write the data of the corresponding temporary storage space into the designated temporary storage space.
  • the data write sub-module will also write data to the designated memory space in different ways according to the write mode (Addition ⁇ Concat) configured by the user. For example: when the user configures the Addition mode, the data write submodule will write the data in the corresponding temporary storage space into the designated storage space in an accumulative manner; when the user configures the Concat mode, the data write submodule will be based on the user configuration Data length information, the data of the corresponding temporary storage space is written into the designated storage space in an orderly interval.
  • the write mode Addition ⁇ Concat
  • This application also provides a video memory processing method based on convolutional neural network.
  • FIG. 5 it is a flowchart of a preferred embodiment of a video memory processing method based on a convolutional neural network of this application.
  • the method can be executed by a device, and the device can be implemented by software and/or hardware.
  • the video memory processing method based on the convolutional neural network includes:
  • S110 Create a temporary storage space, where the temporary storage space is a storage space for temporarily storing input data, output data, input errors, and output errors.
  • the temporary storage space is the storage space for temporarily storing input data, output data, input error and output error; the corresponding temporary storage space includes input data temporary storage space, output data temporary storage space, input error temporary storage space and Temporary storage space for output errors.
  • the temporary storage space can be set in the video memory.
  • the video memory is used to store models or data.
  • Common graphics cards mainly include the following:
  • G Graphics card Video memory
  • Tflops GeForce GTX 1080 8 8.2 GeForce GTX 1080 Ti 11 10.6 Nvidia TITAN X 12 10.2 Nvidia TITAN Xp 12 10.8 GeForce GTX 1080 Titan 12 4.5 K80 GPU Accelerator 12 5.6-8.8
  • the storage units of video memory mainly include the following:
  • Int is an integer value
  • long is a long integer value
  • float is a floating-point value (single is a single-precision floating-point value, and double is a double-precision floating-point value).
  • S120 According to the type and direction of the data to be processed, retrieve the temporary storage space corresponding to the data to be processed, and read the data to be processed into the retrieved temporary storage space.
  • the corresponding output error temporary storage space can be called according to the output error data, and the output error can be read into the output error temporary storage space for processing.
  • S130 Perform preset processing on the to-be-processed data in the retrieved temporary storage space.
  • performing preset processing on the data to be processed includes: performing at least one of convolution processing, superposition processing, multiplication processing, or integration operation on the data to be processed.
  • the data when the data is convolved, it is mainly the result of the summation of two variables within a certain range. If the variables of the convolution are the sequence x(n) and h(n), the result of the convolution,
  • * means convolution.
  • the calculation method of sum is called convolution sum, or convolution for short.
  • n is the amount by which h(-i) is displaced, and different n corresponds to different convolution results.
  • p is an integral variable
  • integral is also a summation
  • t is the amount of displacement of the function h(-p)
  • * means convolution.
  • S140 According to the type and direction of the processed data, write the retrieved data in the temporary storage space into the designated external storage space.
  • the step of writing the retrieved data in the temporary storage space into the designated external storage space includes: writing the processed data in the temporary storage space into the designated external storage space according to the configured writing mode.
  • writing methods include addition mode and concat mode.
  • the type of data includes input data, output data, input error, and output error; the direction of the data includes input and output.
  • different methods can be used to write data to the designated memory space according to the write mode (Addition ⁇ Concat) configured by the user. For example: when the user configures the Addition mode, the data in the corresponding temporary storage space is written into the designated storage space in a cumulative manner; when the user configures the Concat mode, according to the data length information configured by the user, the corresponding temporary storage space is Data is written into the designated storage space at regular intervals.
  • a convolutional neural network will be used as an example to describe in detail the video memory processing method based on the convolutional neural network of the present application.
  • the video memory occupancy of the entire neural network passes: model video memory + batch size * video memory occupancy of each sample. In the case of a small model, it is approximately equal to batch size * video memory occupancy of each sample.
  • the concat and addition layers can be optimized for video memory. For example, multiple input data can be merged in the corresponding temporary storage space. Accumulate processing etc. in the corresponding temporary storage space.
  • Figure 3 shows part of the structure of an existing CNN network that has not been optimized for video memory.
  • the backward propagation is not considered for the time being.
  • the input data size of the convolutional layer is 32*32*3. If the batch size is 5, the input data size of the layer is 32* 32*3*5, the calculation method of each input and output data size is the same as above. Therefore, if the data is represented by a float, the video memory that needs to be consumed by the part of the CNN network that has not been optimized is 1980kb.
  • the video memory processing method based on the convolutional neural network of the present application is used to optimize the video memory of the above part, and the optimized structure diagram is shown in FIG. 4.
  • the size or size of the temporary storage space is set to the maximum size of the output data of the convolutional layer in the CNN network, which is 32*32 in this embodiment *16.
  • the output data of the convolutional layer in the dashed box in Figure 4 does not allocate actual video memory space, but calls temporary storage space for output data.
  • the CNN network part needs to consume 1340kb of video memory, which can save 32.3% of the video memory.
  • the video memory processing method based on convolutional neural proposed in the above embodiments by setting a shared temporary storage space, according to the type of data to be processed and instructions, call the corresponding temporary storage space, and read or write the data to the corresponding temporary storage Computational processing in the space can be applied to the CNN algorithm.
  • Dense, Residual, and Inception modules can be freely mixed to form a new CNN structure, which can save about half of the video memory and improve the parallelism of GPU computing.
  • this application also provides a video memory processing system based on the convolutional neural network.
  • Fig. 6 shows the logical structure of a video memory processing system based on a convolutional neural network according to this embodiment.
  • the video memory processing system 600 based on the convolutional neural network provided by this embodiment includes a space creation unit 610, a data retrieval unit 620, a preprocessing unit 630, and a data writing unit 640.
  • the functions implemented by the space creation unit 610, the data retrieval unit 620, the preprocessing unit 630, and the data writing unit 640 correspond to the corresponding steps in the convolutional neural network-based video memory processing method in the second embodiment above. .
  • the space creating unit 610 is configured to create a temporary storage space, which is a storage space for temporarily storing input data, output data, input errors, and output errors;
  • the space creating unit 610 can create a temporary storage space in the video memory.
  • the video memory is used to store models or data.
  • the larger the video memory the larger the network that can be operated.
  • the created temporary storage space may include input data temporary storage space, output data temporary storage space, input error temporary storage space, and output error temporary storage space.
  • the data retrieval unit 620 is configured to retrieve a temporary storage space corresponding to the data to be processed according to the type and direction of the data to be processed, and read the data to be processed into the retrieved temporary storage space. For example, when the type of data to be processed is error and the direction is output, the corresponding output error temporary storage space can be called according to the output error data, and the output error can be read into the output error temporary storage space for processing.
  • the preprocessing unit 630 is configured to perform preset processing on the data to be processed in the temporary storage space retrieved by the data retrieval unit 620.
  • the preset processing may include at least one of convolution processing, superposition processing, multiplication processing, or integration operation on the data to be processed.
  • the preprocessing unit 630 when the preprocessing unit 630 performs convolution processing on data, it is mainly the result of the summation of two variables after being multiplied within a certain range. If the variables of the convolution are the sequence x(n) and h(n), the result of the convolution,
  • * means convolution.
  • the calculation method of sum is called convolution sum, or convolution for short.
  • n is the amount by which h(-i) is displaced, and different n corresponds to different convolution results.
  • p is an integral variable
  • integral is also a summation
  • t is the amount of displacement of the function h(-p)
  • * means convolution.
  • the data writing unit 640 is configured to write the retrieved data in the temporary storage space into the designated external storage space according to the type and direction of the processed data.
  • the data writing unit 640 can write the processed data in the temporary storage space into the designated external storage space according to the configured writing mode; the writing mode includes the Addition mode and the Concat mode.
  • the data in the corresponding temporary storage space is written into the designated storage space in a cumulative manner; when the user configures the Concat mode, according to the data length information configured by the user, the data in the corresponding temporary storage space is Write to the designated storage space sequentially at intervals.
  • the video memory processing system based on convolutional neural proposed in the above embodiment by setting a shared temporary storage space, according to the type of data to be processed and instructions, call the corresponding temporary storage space, and read or write the data to the corresponding temporary storage Computational processing in the space can be applied to the CNN algorithm.
  • Dense, Residual, and Inception modules can be freely mixed to form a new CNN structure, which can save about half of the video memory and improve the parallelism of GPU computing.
  • the embodiment of the present application also proposes a computer-readable storage medium that includes a video memory processing program based on convolutional nerves, and the following operations are implemented when the video memory processing program based on convolutional nerves is executed by a processor:
  • the temporary storage space includes an input data temporary storage space, an output data temporary storage space, an input error temporary storage space, and an output error temporary storage space.
  • performing preset processing on the data to be processed includes: performing at least one of convolution processing, superimposition processing, multiplication processing, or integration operation on the data to be processed.
  • the step of writing the data in the retrieved temporary storage space into the designated external storage space includes: writing the processed data in the temporary storage space into the designated external storage space according to the configured writing mode Medium; the writing methods include Addition mode and Concat mode.
  • the type of data includes input data, output data, input error, and output error; the direction of the data includes input and output.
  • the specific implementation of the computer-readable storage medium of the present application is substantially the same as the specific implementation of the above-mentioned convolutional neural network-based display memory processing method, system, and electronic device, and will not be repeated here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

一种基于卷积神经网络的显存处理方法、装置及存储介质,涉及神经网络领域,所述方法包括:创建临时存储空间,所述临时存储空间为用于临时存放输入数据、输出数据、输入误差和输出误差的存储空间(S110);根据待处理数据的类型和方向,调取与所述待处理数据对应的临时存储空间,并将所述待处理数据读至调取的临时存储空间中(S120);在所述调取的临时存储空间内,对所述待处理数据进行预设处理(S130);根据处理后的数据的类型及方向,将所述调取的临时存储空间内的数据写入指定的外部存储空间中(S140)。所述方法能够大量节省显存,提高GPU计算的并行度。

Description

基于卷积神经网络的显存处理方法、装置及存储介质
本申请要求申请号为201910497396.8,申请日为2019年6月10日,发明创造名称为“基于卷积神经网络的显存处理方法、装置及存储介质”的专利申请的优先权。
技术领域
本申请涉及卷积神经网络技术领域,尤其涉及一种基于卷积神经网络的显存处理方法、装置及存储介质。
背景技术
显存是GPU显示核心的一个临时存储器,用来暂存需要处理的核心数据。作用与CPU和内存的作用是一样的。显存容量的大小决定着显存临时存储数据的能力,在显卡核心足够强劲的前提下,大显存能减少读取数据的次数,降低延迟。申请人意识到,在目前的卷积神经网络模型训练过程中,模型输入输出数据被重复存储在不同的显存空间,导致不必要的显存开销,降低模型训练批量处理数量,从而影响模型训练的精度。
例如,Concat层和Addition层是目前深度学习分类网络和目标检测网络常用到的一些层。Concat层用于对多个输入数据在特征维度进行合并处理,addition层用于对多个输入数据进行累加处理。现有的深度学习网络训练框架如Caffe、TensorFlow等,都没有对Concat和Addition层进行显存优化,使得输入输出数据被重复存储在不同的显存空间内,带来不必要的显存开销,导致模型训练批处理的数量降低,从而影响模型训练的精度。同时,显存空间还会限制自动化机器学习技术autoML的优化方案搜索空间等。
发明内容
本申请提供一种基于卷积神经网络的显存处理方法、装置及计算机可读存储介质,其主要目的在于通过创建共享临时空间,根据需要处理的数据类型及指示,将数据读或写至对应的临时存储空间内,相比现有框架,用户可 随意混搭各种模块形成新的CNN结构,能够大量节省显存,提高GPU计算的并行度。
为实现上述目的,本申请提供一种基于卷积神经网络的显存处理方法,应用于电子装置,该方法包括:
创建临时存储空间,所述临时存储空间为用于临时存放输入数据、输出数据、输入误差和输出误差的存储空间;
根据待处理数据的类型和方向,调取与所述待处理数据对应的临时存储空间,并将所述待处理数据读至调取的临时存储空间中;
在所述调取的临时存储空间内,对所述待处理数据进行预设处理;
根据处理后的数据的类型及方向,将所述调取的临时存储空间内的数据写入指定的外部存储空间中。
本申请还提供一种基于卷积神经网络的显存处理系统,包括:
空间创建单元,用于创建临时存储空间,所述临时存储空间为用于临时存放输入数据、输出数据、输入误差和输出误差的存储空间;
数据调取单元,用于根据待处理数据的类型和方向,调取与所述待处理数据对应的临时存储空间,并将所述待处理数据读至调取的临时存储空间中;
预处理单元,用于在所述调取的临时存储空间内,对所述待处理数据进行预设处理;
数据写入单元,用于根据处理后的数据的类型及方向,将所述调取的临时存储空间内的数据写入指定的外部存储空间中。
此外,为实现上述目的,本申请还提供一种电子装置,该电子装置包括:存储器和处理器,所述存储器中包括基于卷积神经网络的显存处理程序,所述基于卷积神经网络的显存处理程序被所述处理器执行时实现前述基于卷积神经网络的显存处理方法的步骤。
此外,为实现上述目的,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质中包括基于卷积神经网络的显存处理程序,所述基于卷积神经网络的显存处理程序被处理器执行时,实现如上所述的基于卷积神经网络的显存处理方法的步骤。
本申请提出的基于卷积神经网络的显存处理方法、系统、电子装置及计 算机可读存储介质,通过设置共享的临时存储空间,根据需要处理的数据类型及指示,调取对应的临时存储空间,并将数据读或写至对应的临时存储空间内进行运算处理,可适用于CNN算法,相比其他框架可随意混搭Dense,Residual,Inception模块来形成新的CNN结构,可节省约一半的显存,同时还提高GPU计算的平行度。
附图说明
图1为根据本申请实施的基于卷积神经网络的显存处理方法的应用环境示意图;
图2为图1中基于卷积神经网络的显存处理程序具体实施例的模块示意图;
图3为现有CNN网络结构中的部分结构示意图;
图4为图3经显存优化处理后的部分结构示意图;
图5为根据本申请实施例的基于卷积神经网络的显存处理方法的流程图;
图6为根据本申请实施例的基于卷积神经网络的显存处理系统的逻辑结构示意图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
实施例一
本申请提供一种基于卷积神经网络的显存处理方法,应用于一种电子装置1。参照图1所示,为本申请基于卷积神经网络的显存处理方法较佳实施例的应用环境示意图。
在本实施例中,电子装置1可以是服务器、智能手机、平板电脑、便携计算机、桌上型计算机等具有运算功能的终端设备。
该电子装置1包括:处理器12、存储器11、网络接口14及通信总线15。
存储器11包括至少一种类型的可读存储介质,可为如闪存、硬盘、多媒体卡、卡型存储器11等的非易失性存储介质。在一些实施例中,可读存储介 质可以是所述电子装置1的内部存储单元,例如该电子装置1的硬盘。在另一些实施例中,可读存储介质也可以是所述电子装置1的外部存储器11,例如电子装置1上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。
在本实施例中,存储器11的可读存储介质通常用于存储安装于电子装置1的基于卷积神经网络的显存处理程序10等。存储器11还可以用于暂时地存储已经输出或者将要输出的数据。
处理器12在一些实施例中可以是一中央处理器(Central Processing Unit,CPU),微处理器或其他数据处理芯片,用于运行存储器11中存储的程序代码或处理数据,例如执行基于卷积神经网络的显存处理程序10等。
网络接口14可选地可以包括标准的有线接口、无线接口(如WI-FI接口),通常用于在该电子装置1与其他电子设备之间建立通信连接。
通信总线15用于实现这些组件之间的连接通信。
图1仅示出了具有组件11-15的电子装置1,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。
该电子装置1可以包括用户接口、显示器、触摸传感器,用户接口可以包括输入单元比如键盘(Keyboard)、语音输入装置比如麦克风(microphone)等具有语音识别功能的设备、语音输出装置比如音响、耳机等。显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及有机发光二极管(Organic Light-Emitting Diode,OLED)触摸器等。触摸传感器可以为电阻式触摸传感器、电容式触摸传感器等。而且,所述触摸传感器不仅包括接触式的触摸传感器,也可包括接近式的触摸传感器等。此外,所述触摸传感器可以为单个传感器,也可以为例如阵列布置的多个传感器。
可选地,该电子装置1还可以包括射频(Radio Frequency,RF)电路,传感器、音频电路等等,在此不再赘述。
在图1所示的装置实施例中,作为一种计算机存储介质的存储器11中可以包括操作系统、以及基于卷积神经网络的显存处理程序10等;其中,处理器12执行存储器11中存储的基于卷积神经网络的显存处理程序10时实现如下所示几个步骤:
步骤一:创建临时存储空间,所述临时存储空间为用于临时存放输入数 据、输出数据、输入误差和输出误差的存储空间;
步骤二:根据待处理数据的类型和方向,调取与所述待处理数据对应的临时存储空间,并将所述待处理数据读至调取的临时存储空间中;
步骤三:在调取的临时存储空间内,对所述待处理数据进行预设处理;
步骤四:根据处理后的数据的类型及方向,将所述调取的临时存储空间内的数据写入指定的外部存储空间中。
在步骤一中,临时存储空间为临时存放输入数据、输出数据、输入误差和输出误差的存储空间;对应的临时存储空间包括输入数据临时存储空间、输出数据临时存储空间、输入误差临时存储空间和输出误差临时存储空间。
该临时存储空间可设置在显存内,显存用于存放模型或数据,显存越大,所能运行的网络也就越大,常见的显卡主要有以下几种:
显卡 显存(G) 处理能力(Tflops)
GeForce GTX 1080 8 8.2
GeForce GTX 1080 Ti 11 10.6
Nvidia TITAN X 12 10.2
Nvidia TITAN Xp 12 10.8
GeForce GTX 1080 Titan 12 4.5
K80 GPU Accelerator 12 5.6-8.8
其中,显存的存储单位主要包括以下几种:
1Byte=8bit
1K=1024Byte
1KB=1000Byte
1M=1024K
1MB=1000KB
1G=1024M
1GB=1000GB
10K=10*1024Byte
10KB=10000Byte
常见的数值类型及其大小如下表所示:
类型 大小 备注
Int8 1个字节 又名Byte
Int16 2个字节 又名short
Int32 4个字节 又名int
Int64 8个字节 又名long
Float32 4个字节 单精度浮点数
Float16 2个字节 半精度浮点数
在上述列表中,Int为整型数值、long为长整型数值,float为浮点型数值(single为单精度浮点型数值,double为双精度浮点型数值)。
在步骤二中,当待处理数据的类型为误差,方向为输出时,可根据输出误差数据调取对应的输出误差临时存储空间,并将该输出误差读至该输出误差临时存放空间中进行处理。
对所述待处理数据进行预设处理包括:对所述待处理数据进行卷积处理、叠加处理、相乘处理或者积分运算中的至少一项。
例如,在对数据进行卷积处理时,主要是两个变量在某范围内相乘后求和的结果。如果卷积的变量是序列x(n)和h(n),则卷积的结果,
Figure PCTCN2019118467-appb-000001
其中,*表示卷积。当时序n=0时,序列h(-i)是h(i)的时序i取反的结果;时序取反使得h(i)以纵轴为中心翻转180度,所以这种相乘后求和的计算法称为卷积和,简称卷积。另外,n是使h(-i)位移的量,不同的n对应不同的卷积结果。
如果卷积的变量是两个函数x(t)和h(t),则卷积的计算变为
Figure PCTCN2019118467-appb-000002
其中,p是积分变量,积分也是求和,t是使函数h(-p)位移的量,*表示卷积。
类似上述这些运算均可在临时存储空间内进行,以达到节省显存的目的。
在步骤四中,将所述调取的临时存储空间内的数据写入指定的外部存储空间中的步骤包括:根据配置的写入方式将所述临时存储空间内的处理后的数据写入指定的外部存储空间中;其中,所述写入方式包括Addition模式和Concat模式。
另外,数据的类型包括输入数据、输出数据、输入误差和输出误差;所述数据的方向包括输入和输出。
具体地,可根据用户配置的写入方式(Addition\Concat)来用不同的方式向指定内存空间写数据。比如:当用户配置Addition模式时,以累加的方式将相应的临时存储空间内的数据写入指定存储空间;当用户配置Concat模式时,根据用户配置的数据长度信息,将相应的临时存储空间的数据有序间隔地写入指定存储空间内。
以下将以卷积神经网络作为示例,对本申请的基于卷积神经网络的显存处理程序的执行进行详细阐述。
要获取神经网络每一层输出的显存占用情况,需要计算每一层的feature map的形状,且保存梯度用于反向传播,显存占用与batch size成正比。整个神经网络的显存占用通过:模型显存+batch size*每个样本的显存占用,在模型较小的情况下,约等于batch size*每个样本的显存占用。
为节省卷积神经网络模型在训练过程中对显存的占用,可以对Concat和Addition层等进行显存优化,例如,使多个输入数据在对应的临时存储空间内进行合并处理,多个输入数据在对应的临时存储空间内进行累加处理等。
例如,图3为未经显存优化处理的现有CNN网络当中的部分结构。
如图3所示,暂时不考虑后向传播,以向前传输为例,卷积层的输入数据大小为32*32*3,如果batch size为5,则该层的输入数据大小为32*32*3*5,各个输入输出数据大小计算方式同上。所以,如果数据采用float表示,则未经优化处理的该CNN网络部分需要消耗的显存为1980kb。
利用本申请的基于卷积神经网络的显存处理方法对上述部分进行显存优化,优化后的结构图如图4所示。
因为暂时不考虑后向传播,因此只需考虑调用输出数据临时存储空间,该临时存储空间的尺寸或大小设置为CNN网络中卷积层输出数据的最大尺寸, 在该实施例中为32*32*16。图4中虚线框内的卷积层输出数据均不分配实际的显存空间,而是调用输出数据临时存储空间。
可知,当batch size为5,且数据使用float进行表示时,经显存优化后,该CNN网络部分需要消耗的显存为1340kb,可节省显存32.3%。
上述实施例提出的电子装置1,通过设置共享的临时存储空间,根据需要处理的数据类型及指示,调取对应的临时存储空间,并将数据读或写至对应的临时存储空间内进行运算处理,可适用于CNN算法,相比其他框架可随意混搭Dense,Residual,Inception模块来形成新的CNN结构,可节省约一半的显存,同时还提高GPU计算的平行度。
在其他实施例中,基于卷积神经网络的显存处理程序10还可设置共享临时存储空间管理器,该管理器包含用于临时存放输入数据、输出数据、输入误差和输出误差的临时存储空间。该管理器提供用于获取和操作相应临时存储空间的一些子模块,一个或者多个模块被存储于存储器11中,并由处理器12执行,以完成本申请。本申请所称的模块是指能够完成特定功能的一系列计算机程序指令段。参照图2所示,为图1中基于卷积神经网络的显存处理程序10较佳实施例的程序模块图。所述基于卷积神经网络的显存处理程序10可以被分割为:
临时空间获取子模块210:根据模块输入的数据类型(数据或误差)和方向(输入或输出),返回相应的临时存储空间。
比如,向该临时空间获取子模块输入:“误差和输出”,则该子模块调取并输出对应的:输出误差临时存储空间。
读数据子模块220:根据向该读数据子模块输入的数据类型(数据或误差)和方向(输入或输出),将指定存储空间内的数据读出到相应的临时存储空间中,并输出该临时存储空间。
比如,向该读数据子模块输入:“误差和输出”,则该模块将指定存储空间内的数据读出到输出误差临时存储空间,并输出:输出误差临时存储空间。
上述指定空间主要指待处理的数据当前所存在的存储空间,待处理的数据从该指定空间内读出至临时存储空间内进行处理,下同。
写数据子模块230:根据该写数据子模块输入的数据类型(数据或误差)和方向(输入或输出),将相应的临时存储空间的数据写入指定临时存储空间。
比如:向该写数据子模块输入:“误差和输入”,则该写数据模块将输入误差临时存储空间的数据写入指定存储空间内。
需要说明的是,该写数据子模块还会根据用户配置的写入方式(Addition\Concat)来用不同的方式向指定内存空间写数据。比如:当用户配置Addition模式时,该写数据子模块会以累加的方式将相应的临时存储空间内的数据写入指定存储空间;当用户配置Concat模式时,该写数据子模块根据用户配置的数据长度信息,将相应的临时存储空间的数据有序间隔地写入指定存储空间内。
实施例二
本申请还提供一种基于卷积神经网络的显存处理方法。参照图5所示,为本申请基于卷积神经网络的显存处理方法较佳实施例的流程图。该方法可以由一个装置执行,该装置可以由软件和/或硬件实现。
在本实施例中,基于卷积神经网络的显存处理方法包括:
S110:创建临时存储空间,所述临时存储空间为用于临时存放输入数据、输出数据、输入误差和输出误差的存储空间。
在该步骤中,临时存储空间为临时存放输入数据、输出数据、输入误差和输出误差的存储空间;对应的临时存储空间包括输入数据临时存储空间、输出数据临时存储空间、输入误差临时存储空间和输出误差临时存储空间。
该临时存储空间可设置在显存内,显存用于存放模型或数据,显存越大,所能运行的网络也就越大,常见的显卡主要有以下几种:
显卡 显存(G) 处理能力(Tflops)
GeForce GTX 1080 8 8.2
GeForce GTX 1080 Ti 11 10.6
Nvidia TITAN X 12 10.2
Nvidia TITAN Xp 12 10.8
GeForce GTX 1080 Titan 12 4.5
K80 GPU Accelerator 12 5.6-8.8
其中,显存的存储单位主要包括以下几种:
1Byte=8bit
1K=1024Byte
1KB=1000Byte
1M=1024K
1MB=1000KB
1G=1024M
1GB=1000GB
10K=10*1024Byte
10KB=10000Byte
常见的数值类型及其大小如下表所示:
类型 大小 备注
Int8 1个字节 又名Byte
Int16 2个字节 又名short
Int32 4个字节 又名int
Int64 8个字节 又名long
Float32 4个字节 单精度浮点数
Float16 2个字节 半精度浮点数
在上述列表中,Int为整型数值、long为长整型数值,float为浮点型数值(single为单精度浮点型数值,double为双精度浮点型数值)。
S120:根据待处理数据的类型和方向,调取与所述待处理数据对应的临时存储空间,并将所述待处理数据读至调取的临时存储空间中。
例如,当待处理数据的类型为误差,方向为输出时,可根据输出误差数据调取对应的输出误差临时存储空间,并将该输出误差读至该输出误差临时存放空间中进行处理。
S130:在调取的临时存储空间内,对所述待处理数据进行预设处理。
其中,对所述待处理数据进行预设处理包括:对所述待处理数据进行卷积处理、叠加处理、相乘处理或者积分运算中的至少一项。
例如,在对数据进行卷积处理时,主要是两个变量在某范围内相乘后求 和的结果。如果卷积的变量是序列x(n)和h(n),则卷积的结果,
Figure PCTCN2019118467-appb-000003
其中,*表示卷积。当时序n=0时,序列h(-i)是h(i)的时序i取反的结果;时序取反使得h(i)以纵轴为中心翻转180度,所以这种相乘后求和的计算法称为卷积和,简称卷积。另外,n是使h(-i)位移的量,不同的n对应不同的卷积结果。
如果卷积的变量是两个函数x(t)和h(t),则卷积的计算变为
Figure PCTCN2019118467-appb-000004
其中,p是积分变量,积分也是求和,t是使函数h(-p)位移的量,*表示卷积。
类似上述这些运算均可在临时存储空间内进行,以达到节省显存的目的。
S140:根据处理后的数据的类型及方向,将所述调取的临时存储空间内的数据写入指定的外部存储空间中。
在该步骤中,将所述调取的临时存储空间内的数据写入指定的外部存储空间中的步骤包括:根据配置的写入方式将所述临时存储空间内的处理后的数据写入指定的外部存储空间中;写入方式包括addition模式和concat模式。
另外,数据的类型包括输入数据、输出数据、输入误差和输出误差;所述数据的方向包括输入和输出。
具体地,可根据用户配置的写入方式(Addition\Concat)来用不同的方式向指定内存空间写数据。比如:当用户配置Addition模式时,以累加的方式将相应的临时存储空间内的数据写入指定存储空间;当用户配置Concat模式时,根据用户配置的数据长度信息,将相应的临时存储空间的数据有序间隔地写入指定存储空间内。
以下将以卷积神经网络作为示例,对本申请的基于卷积神经网络的显存处理方法进行详细阐述。
要获取神经网络每一层输出的显存占用情况,需要计算每一层的feature map的形状,且保存梯度用于反向传播,显存占用与batch size成正比。整个神经网络的显存占用通过:模型显存+batch size*每个样本的显存占用,在模型较小的情况下,约等于batch size*每个样本的显存占用。
为节省卷积神经网络模型在训练过程中对显存的占用,可以对concat和addition层等进行显存优化,例如,使多个输入数据在对应的临时存储空间内进行合并处理,多个输入数据在对应的临时存储空间内进行累加处理等。
例如,图3为未经显存优化处理的现有CNN网络当中的部分结构。
如图3所示,暂时不考虑后向传播,以向前传输为例,卷积层的输入数据大小为32*32*3,如果batch size为5,则该层的输入数据大小为32*32*3*5,各个输入输出数据大小计算方式同上。所以,如果数据采用float表示,则未经优化处理的该CNN网络部分需要消耗的显存为1980kb。
利用本申请的基于卷积神经网络的显存处理方法对上述部分进行显存优化,优化后的结构图如图4所示。
因为暂时不考虑后向传播,因此只需考虑调用输出数据临时存储空间,该临时存储空间的尺寸或大小设置为CNN网络中卷积层输出数据的最大尺寸,在该实施例中为32*32*16。图4中虚线框内的卷积层输出数据均不分配实际的显存空间,而是调用输出数据临时存储空间。
可知,当batch size为5,且数据使用float进行表示时,经显存优化后,该CNN网络部分需要消耗的显存为1340kb,可节省显存32.3%。
上述实施例提出的基于卷积神经的显存处理方法,通过设置共享的临时存储空间,根据需要处理的数据类型及指示,调取对应的临时存储空间,并将数据读或写至对应的临时存储空间内进行运算处理,可适用于CNN算法,相比其他框架可随意混搭Dense,Residual,Inception模块来形成新的CNN结构,可节省约一半的显存,同时还提高GPU计算的平行度。
实施例三
与上述实施例二提供的基于卷积神经网络的显存处理方法相对应,本申请还提供一种基于卷积神经网络的显存处理系统。图6示出了根据本实施例的基于卷积神经网络的显存处理系统的逻辑结构。
如图6所示,本实施例提供的基于卷积神经网络的显存处理系统600包括空间创建单元610、数据调取单元620、预处理单元630以及数据写入单元640。其中,空间创建单元610、数据调取单元620、预处理单元630以及数据写入单元640所实现的功能与上述实施例二中的基于卷积神经网络的显存处理方法中对应的步骤一一对应。
具体的,空间创建单元610用于创建临时存储空间,该临时存储空间为用于临时存放输入数据、输出数据、输入误差和输出误差的存储空间;
其中,空间创建单元610可以在显存内创建临时存储空间,显存用于存放模型或数据,显存越大,所能运行的网络也就越大。所创建的临时存储空间可以包括输入数据临时存储空间、输出数据临时存储空间、输入误差临时存储空间和输出误差临时存储空间。
数据调取单元620用于根据待处理数据的类型和方向,调取与该待处理数据对应的临时存储空间,并将该待处理数据读至调取的临时存储空间中。比如,当待处理数据的类型为误差,方向为输出时,可根据输出误差数据调取对应的输出误差临时存储空间,并将该输出误差读至该输出误差临时存放空间中进行处理。
预处理单元630用于在数据调取单元620所调取的临时存储空间内,对待处理数据进行预设处理。该预设处理可以包括对所述待处理数据进行卷积处理、叠加处理、相乘处理或者积分运算中的至少一项。
例如,预处理单元630在对数据进行卷积处理时,主要是两个变量在某范围内相乘后求和的结果。如果卷积的变量是序列x(n)和h(n),则卷积的结果,
Figure PCTCN2019118467-appb-000005
其中,*表示卷积。当时序n=0时,序列h(-i)是h(i)的时序i取反的结果;时序取反使得h(i)以纵轴为中心翻转180度,所以这种相乘后求和的计算法称为卷积和,简称卷积。另外,n是使h(-i)位移的量,不同的n对应不同的卷积结果。
如果卷积的变量是两个函数x(t)和h(t),则卷积的计算变为
Figure PCTCN2019118467-appb-000006
其中,p是积分变量,积分也是求和,t是使函数h(-p)位移的量,*表示卷积。
类似上述这些运算均可在临时存储空间内进行,以达到节省显存的目的。
数据写入单元640用于根据处理后的数据的类型及方向,将所调取的临时存储空间内的数据写入指定的外部存储空间中。
其中,数据写入单元640可以根据配置的写入方式将所述临时存储空间内的处理后的数据写入指定的外部存储空间中;该写入方式包括Addition模式和Concat模式。当用户配置Addition模式时,以累加的方式将相应的临时存储空间内的数据写入指定存储空间;当用户配置Concat模式时,根据用户配置的数据长度信息,将相应的临时存储空间的数据有序间隔地写入指定存储空间内。
上述实施例提出的基于卷积神经的显存处理系统,通过设置共享的临时存储空间,根据需要处理的数据类型及指示,调取对应的临时存储空间,并将数据读或写至对应的临时存储空间内进行运算处理,可适用于CNN算法,相比其他框架可随意混搭Dense,Residual,Inception模块来形成新的CNN结构,可节省约一半的显存,同时还提高GPU计算的平行度。
实施例四
本申请实施例还提出一种计算机可读存储介质,该计算机可读存储介质中包括基于卷积神经的显存处理程序,该基于卷积神经的显存处理程序被处理器执行时实现如下操作:
创建临时存储空间,该临时存储空间为用于临时存放输入数据、输出数据、输入误差和输出误差的存储空间;
根据待处理数据的类型和方向,调取与待处理数据对应的临时存储空间,并将待处理数据读至调取的临时存储空间中;
在调取的临时存储空间内,对待处理数据进行预设处理;
根据处理后的数据的类型及方向,将调取的临时存储空间内的数据写入指定的外部存储空间中。
优选地,临时存储空间包括输入数据临时存储空间、输出数据临时存储空间、输入误差临时存储空间和输出误差临时存储空间。
优选地,对待处理数据进行预设处理包括:对待处理数据进行卷积处理、叠加处理、相乘处理或者积分运算中的至少一项。
优选地,将所述调取的临时存储空间内的数据写入指定的外部存储空间中的步骤包括:根据配置的写入方式将临时存储空间内的处理后的数据写入指定的外部存储空间中;其中的写入方式包括Addition模式和Concat模式。
优选地,数据的类型包括输入数据、输出数据、输入误差和输出误差; 所述数据的方向包括输入和输出。
本申请之计算机可读存储介质的具体实施方式与上述基于卷积神经网络的显存处理方法、系统、电子装置的具体实施方式大致相同,在此不再赘述。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种基于卷积神经网络的显存处理方法,应用于电子装置,其特征在于,所述方法包括:
    创建临时存储空间,所述临时存储空间为用于临时存放输入数据、输出数据、输入误差和输出误差的存储空间;
    根据待处理数据的类型和方向,调取与所述待处理数据对应的临时存储空间,并将所述待处理数据读至调取的临时存储空间中;
    在所述调取的临时存储空间内,对所述待处理数据进行预设处理;
    根据处理后的数据的类型及方向,将所述调取的临时存储空间内的数据写入指定的外部存储空间中。
  2. 如权利要求1所述的基于卷积神经网络的显存处理方法,其特征在于,
    所述临时存储空间,包括输入数据临时存储空间、输出数据临时存储空间、输入误差临时存储空间和输出误差临时存储空间。
  3. 如权利要求1所述的基于卷积神经网络的显存处理方法,其特征在于,
    所述临时存储空间设置在显存内。
  4. 如权利要求1所述的基于卷积神经网络的显存处理方法,其特征在于,
    对所述待处理数据进行预设处理包括:对所述待处理数据进行卷积处理、叠加处理、相乘处理或者积分运算中的至少一项。
  5. 如权利要求1所述的基于卷积神经网络的显存处理方法,其特征在于,
    对所述待处理数据进行预设处理为卷积处理,用于获取两个变量在某范围内相乘后求和的结果。
  6. 如权利要求5所述的基于卷积神经网络的显存处理方法,其特征在于,
    如果卷积的变量是序列x(n)和h(n),则卷积的结果
    Figure PCTCN2019118467-appb-100001
    其中,*表示卷积,n是使h(-i)位移的量,不同的n对应不同的卷积结果;当时序n=0时,序列h(-i)是h(i)的时序i取反的结果;时序取反使得h(i)以纵轴为中心翻转180度。
  7. 如权利要求5所述的基于卷积神经网络的显存处理方法,其特征在于,
    如果卷积的变量是两个函数x(t)和h(t),则卷积的结果
    Figure PCTCN2019118467-appb-100002
    其中,*表示卷积,t为使函数h(-p)位移的量;p为积分变量,积分为求和。
  8. 如权利要求1所述的基于卷积神经网络的显存处理方法,其特征在于,所述将所述调取的临时存储空间内的数据写入指定的外部存储空间中的步骤包括:
    根据配置的写入方式将所述临时存储空间内的处理后的数据写入指定的外部存储空间中;其中,所述写入方式包括Addition模式和Concat模式。
  9. 如权利要求8所述的基于卷积神经网络的显存处理方法,其特征在于,
    当配置Addition模式时,以累加的方式将相应的临时存储空间内的数据写入所述外部存储空间中;
    当配置Concat模式时,根据配置的数据长度信息,将相应的临时存储空间的数据有序间隔地写入所述外部存储空间中。
  10. 如权利要求1~9中任一项所述的基于卷积神经网络的显存处理方法,其特征在于,
    所述数据的类型包括输入数据、输出数据、输入误差和输出误差;
    所述数据的方向包括输入和输出。
  11. 一种基于卷积神经网络的显存处理系统,其特征在于,包括:
    空间创建单元,用于创建临时存储空间,所述临时存储空间为用于临时存放输入数据、输出数据、输入误差和输出误差的存储空间;
    数据调取单元,用于根据待处理数据的类型和方向,调取与所述待处理数据对应的临时存储空间,并将所述待处理数据读至调取的临时存储空间中;
    预处理单元,用于在所述调取的临时存储空间内,对所述待处理数据进行预设处理;
    数据写入单元,用于根据处理后的数据的类型及方向,将所述调取的临时存储空间内的数据写入指定的外部存储空间中。
  12. 如权利要求11所述的基于卷积神经网络的显存处理系统,其特征在于,
    所述空间创建单元创建的临时存储空间,包括输入数据临时存储空间、 输出数据临时存储空间、输入误差临时存储空间和输出误差临时存储空间。
  13. 如权利要求11所述的基于卷积神经网络的显存处理系统,其特征在于,
    所述空间创建单元在显存内创建所述临时存储空间。
  14. 如权利要求11所述的基于卷积神经网络的显存处理系统,其特征在于,
    所述预处理单元对所述待处理数据进行预设处理包括:对所述待处理数据进行卷积处理、叠加处理、相乘处理或者积分运算中的至少一项。
  15. 如权利要求11所述的基于卷积神经网络的显存处理系统,其特征在于,所述预处理单元对所述待处理数据进行预设处理为卷积处理,用于获取两个变量在某范围内相乘后求和的结果;其中,
    如果卷积的变量是序列x(n)和h(n),则卷积的结果
    Figure PCTCN2019118467-appb-100003
    其中,*表示卷积,n是使h(-i)位移的量,不同的n对应不同的卷积结果;当时序n=0时,序列h(-i)是h(i)的时序i取反的结果;时序取反使得h(i)以纵轴为中心翻转180度。
  16. 如权利要求11所述的基于卷积神经网络的显存处理系统,其特征在于,所述预处理单元对所述待处理数据进行预设处理为卷积处理,用于获取两个变量在某范围内相乘后求和的结果;其中,
    如果卷积的变量是两个函数x(t)和h(t),则卷积的结果
    Figure PCTCN2019118467-appb-100004
    其中,*表示卷积,t为使函数h(-p)位移的量;p为积分变量,积分为求和。
  17. 如权利要求11所述的基于卷积神经网络的显存处理系统,其特征在于,
    所述数据写入单元根据配置的写入方式将所述临时存储空间内的处理后的数据写入指定的外部存储空间中;其中,所述写入方式包括Addition模式和Concat模式。
  18. 如权利要求11所述的基于卷积神经网络的显存处理系统,其特征在 于,
    当配置Addition模式时,所述数据写入单元以累加的方式将相应的临时存储空间内的数据写入所述外部存储空间中;
    当配置Concat模式时,所述数据写入单元根据配置的数据长度信息,将相应的临时存储空间的数据有序间隔地写入所述外部存储空间中。
  19. 一种电子装置,其特征在于,该电子装置包括:存储器和处理器,所述存储器中包括基于卷积神经网络的显存处理程序,所述基于卷积神经网络的显存处理程序被所述处理器执行时实现如权利要求1~11中任一项所述的基于卷积神经网络的显存处理方法的步骤。
  20. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中包括基于卷积神经网络的显存处理程序,所述基于卷积神经网络的显存处理程序被处理器执行时,实现如权利要求1~11中任一项所述的基于卷积神经网络的显存处理方法的步骤。
PCT/CN2019/118467 2019-06-10 2019-11-14 基于卷积神经网络的显存处理方法、装置及存储介质 WO2020248499A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2021506309A JP7174831B2 (ja) 2019-06-10 2019-11-14 畳み込みニューラルネットワークに基づくビデオメモリ処理方法、装置及び記録媒体

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910497396.8A CN110377342B (zh) 2019-06-10 2019-06-10 基于卷积神经网络的显存处理方法、装置及存储介质
CN201910497396.8 2019-06-10

Publications (1)

Publication Number Publication Date
WO2020248499A1 true WO2020248499A1 (zh) 2020-12-17

Family

ID=68249933

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/118467 WO2020248499A1 (zh) 2019-06-10 2019-11-14 基于卷积神经网络的显存处理方法、装置及存储介质

Country Status (3)

Country Link
JP (1) JP7174831B2 (zh)
CN (1) CN110377342B (zh)
WO (1) WO2020248499A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112957068A (zh) * 2021-01-29 2021-06-15 青岛海信医疗设备股份有限公司 超声信号处理方法及终端设备
CN114330755A (zh) * 2022-03-11 2022-04-12 深圳鹏行智能研究有限公司 数据集的生成方法、装置、机器人和存储介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110377342B (zh) * 2019-06-10 2022-08-30 平安科技(深圳)有限公司 基于卷积神经网络的显存处理方法、装置及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7526634B1 (en) * 2005-12-19 2009-04-28 Nvidia Corporation Counter-based delay of dependent thread group execution
CN103136724A (zh) * 2011-11-30 2013-06-05 北大方正集团有限公司 加网方法和装置
CN106779057A (zh) * 2016-11-11 2017-05-31 北京旷视科技有限公司 基于gpu的计算二值神经网络卷积的方法及装置
CN108197705A (zh) * 2017-12-29 2018-06-22 国民技术股份有限公司 卷积神经网络硬件加速装置及卷积计算方法及存储介质
CN108229687A (zh) * 2016-12-14 2018-06-29 腾讯科技(深圳)有限公司 数据处理方法、数据处理装置及电子设备
CN110377342A (zh) * 2019-06-10 2019-10-25 平安科技(深圳)有限公司 基于卷积神经网络的显存处理方法、装置及存储介质

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103207841B (zh) * 2013-03-06 2016-01-20 青岛海信传媒网络技术有限公司 基于键值对缓存的数据读写方法及装置
CN104090938A (zh) * 2014-06-26 2014-10-08 广州金山网络科技有限公司 一种提交数据的方法及装置
KR102158683B1 (ko) 2015-12-10 2020-09-22 딥마인드 테크놀로지스 리미티드 외부 메모리로 신경망들 증강
JP2018067154A (ja) 2016-10-19 2018-04-26 ソニーセミコンダクタソリューションズ株式会社 演算処理回路および認識システム
CN107832839B (zh) * 2017-10-31 2020-02-14 南京地平线机器人技术有限公司 执行卷积神经网络中的运算的方法和装置
JP6839641B2 (ja) 2017-11-17 2021-03-10 株式会社東芝 演算処理装置
CN108182469A (zh) * 2017-12-27 2018-06-19 郑州云海信息技术有限公司 一种神经网络模型训练方法、系统、装置及存储介质
CN109657793B (zh) * 2018-12-26 2020-09-22 广州小狗机器人技术有限公司 模型训练方法及装置、存储介质及电子设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7526634B1 (en) * 2005-12-19 2009-04-28 Nvidia Corporation Counter-based delay of dependent thread group execution
CN103136724A (zh) * 2011-11-30 2013-06-05 北大方正集团有限公司 加网方法和装置
CN106779057A (zh) * 2016-11-11 2017-05-31 北京旷视科技有限公司 基于gpu的计算二值神经网络卷积的方法及装置
CN108229687A (zh) * 2016-12-14 2018-06-29 腾讯科技(深圳)有限公司 数据处理方法、数据处理装置及电子设备
CN108197705A (zh) * 2017-12-29 2018-06-22 国民技术股份有限公司 卷积神经网络硬件加速装置及卷积计算方法及存储介质
CN110377342A (zh) * 2019-06-10 2019-10-25 平安科技(深圳)有限公司 基于卷积神经网络的显存处理方法、装置及存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112957068A (zh) * 2021-01-29 2021-06-15 青岛海信医疗设备股份有限公司 超声信号处理方法及终端设备
CN114330755A (zh) * 2022-03-11 2022-04-12 深圳鹏行智能研究有限公司 数据集的生成方法、装置、机器人和存储介质

Also Published As

Publication number Publication date
CN110377342A (zh) 2019-10-25
JP7174831B2 (ja) 2022-11-17
JP2021532498A (ja) 2021-11-25
CN110377342B (zh) 2022-08-30

Similar Documents

Publication Publication Date Title
WO2020248499A1 (zh) 基于卷积神经网络的显存处理方法、装置及存储介质
US10394753B2 (en) Conditional operation in an internal processor of a memory device
US10607668B2 (en) Data processing method and apparatus
US11188744B2 (en) Spatially sparse convolutional neural networks for inking applications
WO2021042844A1 (zh) 大规模数据聚类方法、装置、计算机设备及计算机可读存储介质
CN106202548A (zh) 数据存储方法、查找方法及装置
US20220351490A1 (en) Convolution calculation method, convolution calculation apparatus, and terminal device
EP4113316A2 (en) Method and apparatus for processing table, device, and storage medium
US20230135109A1 (en) Method for processing signal, electronic device, and storage medium
CN115858628A (zh) 一种获取多列数据的综合排列数据的方法与设备
US9443493B2 (en) Graph display control apparatus, graph display control method and non-transitory storage medium having stored thereon graph display control program
US20140258960A1 (en) Integrating optimal planar and three-dimensional semiconductor design layouts
CN111178513B (zh) 神经网络的卷积实现方法、卷积实现装置及终端设备
CN111931937B (zh) 图像处理模型的梯度更新方法、装置及系统
CN114328486A (zh) 基于模型的数据质量核查方法及装置
CN114581676B (zh) 特征图像的处理方法、装置和存储介质
US11409523B2 (en) Graphics processing unit
CN113282624B (zh) 规则匹配方法、装置、电子设备及存储介质
EP4276703A1 (en) Quantum program execution method and quantum program compilation method
TWI711984B (zh) 深度學習加速方法及用戶終端
CN115238677A (zh) 短文本修正方法、装置、电子设备、介质和程序产品
WO2022267453A1 (zh) 关键信息提取模型的训练方法、提取方法、设备及介质
CN114666008A (zh) 数据传输方法、装置、计算机设备和存储介质
CN116055003A (zh) 数据最优传输方法、装置、计算机设备和存储介质
US8458441B2 (en) Vector extensions to an interpreted general expression evaluator in a database system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19932796

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021506309

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19932796

Country of ref document: EP

Kind code of ref document: A1