WO2020135602A1 - 图像处理方法、装置、智能驾驶系统和车载运算平台 - Google Patents

图像处理方法、装置、智能驾驶系统和车载运算平台 Download PDF

Info

Publication number
WO2020135602A1
WO2020135602A1 PCT/CN2019/128764 CN2019128764W WO2020135602A1 WO 2020135602 A1 WO2020135602 A1 WO 2020135602A1 CN 2019128764 W CN2019128764 W CN 2019128764W WO 2020135602 A1 WO2020135602 A1 WO 2020135602A1
Authority
WO
WIPO (PCT)
Prior art keywords
fixed
point
network parameters
image
processing
Prior art date
Application number
PCT/CN2019/128764
Other languages
English (en)
French (fr)
Inventor
温拓朴
程光亮
石建萍
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Priority to JP2021533181A priority Critical patent/JP2022515343A/ja
Priority to KR1020217018122A priority patent/KR20210092254A/ko
Publication of WO2020135602A1 publication Critical patent/WO2020135602A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/0007Image acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management

Definitions

  • the present application relates to fixed-point technology, in particular to an image processing method and device, an intelligent driving system, and a vehicle-mounted computing platform.
  • convolutional neural network technology is more and more frequently applied to products such as image processing, unmanned driving system, assisted driving system and so on. Since convolutional neural networks process image data, convolutional neural network technologies usually rely on high-performance graphics processors (Graphics Processing Units, GPUs), require huge amounts of computation, and consume large memory.
  • graphics processors Graphics Processing Units, GPUs
  • Embodiments of the present application provide an image processing method, device, intelligent driving system, and vehicle-mounted computing platform.
  • an image processing method provided includes:
  • the convolutional neural network is fixed-point processed by using floating-point network parameters, and the fixed-point processing network parameters are the values expressed by powers of 2;
  • an image processing apparatus includes:
  • Parameter fixed-point module used for fixed-point processing of convolutional neural network using floating-point network parameters according to the fixed-point bit width hardware resources of the arithmetic unit.
  • the fixed-point processing network parameters are values expressed by powers of 2 ;
  • Image acquisition module for acquiring images to be processed
  • An image processing module is used to control the arithmetic unit to process the image according to the network parameters after the fixed-point processing of the convolutional neural network to obtain the processing result of the image.
  • an intelligent driving system including: a vehicle-mounted camera, a convolutional neural network subsystem, and a control subsystem; wherein, the control subsystem is used for:
  • the convolutional neural network is fixed-point processed by using floating-point network parameters, and the fixed-point processing network parameters are powers of 2 Expressed value
  • an on-board computing platform based on FPGA including: a processor, an external memory, a memory, and an FPGA computing unit;
  • the external memory stores the network parameters after the fixed-point processing of the neural network, or the binary values and the look-up tables corresponding to the network parameters after the fixed-point processing of the neural network are stored, and the look-up table is used to Indicates the binary value corresponding to the power value of different network parameters; the network parameters after fixed-point processing are the values expressed by powers of 2;
  • the processor reads the network parameters of the fixed-point processing of the neural network into the memory, and inputs the data on the memory and the image information to be processed to the FPGA arithmetic unit; or, the processor will According to the binary value and the look-up table, the searched network parameters obtained by the fixed-point processing are read into the memory, and the data on the memory and the image information to be processed are input to the FPGA arithmetic unit;
  • the FPGA operation unit obtains a shift operation result according to the image information to be processed and the network parameters of the fixed-point processing, and sums the results for multiple times to obtain a processing result of the image.
  • an electronic device includes:
  • the memory is used to store at least one executable instruction, and the executable instruction causes the processor to perform the operation corresponding to the image processing method according to any one of the embodiments.
  • an electronic device includes:
  • the processor and the image processing device according to any one of the embodiments; when the processor runs the image processing device, the modules in the image processing device according to any one of the embodiments are executed.
  • a computer-readable storage medium in which a computer program is stored, and the computer program is used to execute any feasible implementation manner of the first aspect above Image processing method steps in.
  • Embodiments of the present application provide an image processing method and device, an intelligent driving system, and an on-board computing platform.
  • the image processing method includes: according to the fixed-point bit width hardware resources of the computing unit, the convolutional neural network is fixed-pointed using floating-point network parameters To obtain the image to be processed, and control the operation unit to process the image according to the network parameters after the convolutional neural network fixed-point processing to obtain the image processing result.
  • the image processing method includes: according to the fixed-point bit width hardware resources of the computing unit, the convolutional neural network is fixed-pointed using floating-point network parameters To obtain the image to be processed, and control the operation unit to process the image according to the network parameters after the convolutional neural network fixed-point processing to obtain the image processing result.
  • the use of a power of 2 to represent network parameters can reduce the complexity of the operation, increase the operation speed, achieve fast real-time response, and reduce the power consumption during the operation.
  • the operation unit is FPGA and other hardware with limited hardware resources, solve The problem that the convolutional neural network cannot be applied to the hardware or the acceleration calculation cannot be implemented on the hardware.
  • FIG. 1 is a schematic flowchart of an image processing method provided by an embodiment of the present application.
  • FIG 2 is another schematic flowchart of an image processing method provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of an image processing apparatus provided by an embodiment of the present application.
  • FIG 4 is another schematic structural diagram of an image processing apparatus provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of an intelligent driving system provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of an on-board computing platform based on FPGA provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the embodiments of the present application can be applied to a computer system/server, which can operate together with many other general-purpose or special-purpose computing system environments or configurations.
  • Examples of well-known computing systems, environments, and/or configurations suitable for use with computer systems/servers include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, based on Microprocessor systems, set-top boxes, programmable consumer electronics, network personal computers, small computer systems, large computer systems, and distributed cloud computing technology environments that include any of the above systems, etc.
  • the computer system/server may be described in the general context of computer system executable instructions (such as program modules) executed by the computer system.
  • program modules may include routines, programs, target programs, components, logic, data structures, etc., which perform specific tasks or implement specific abstract data types.
  • the computer system/server can be implemented in a distributed cloud computing environment. In the distributed cloud computing environment, tasks are performed by remote processing devices linked through a communication network. In a distributed cloud computing environment, program modules may be located on local or remote computing system storage media including storage devices.
  • FIG. 1 is a schematic flowchart of an image processing method provided by an embodiment of the present application.
  • Exemplary execution subjects of the embodiments of the present application may be electronic devices such as image processing apparatuses and processors, or any devices and systems applying image processing methods, such as monitoring systems and intelligent driving systems.
  • the image processing method includes:
  • the convolutional neural network uses fixed-point processing of network parameters expressed by floating-point.
  • the network parameters after fixed-point processing are values expressed by powers of two.
  • the computing unit in the embodiment of the present application may be a computing unit that supports fixed-point operations, such as a field programmable logic gate array (Field Programmable Gate Array, FPGA for short) digital signal processor (Digital Signal Processor, DSP for short) )Wait.
  • FPGA Field Programmable Gate Array
  • DSP Digital Signal Processor
  • the fixed-point width resources of the computing unit are usually relatively limited.
  • the fixed-point bit width resource is selected as small as possible, such as 8-bit or 4-bit bit or even less wide resource amount to implement fixed-point operation.
  • the small fixed-point bit width resources often affect the calculation speed.
  • the embodiments of the present application adapt the fixed-point hardware by adapting the convolutional neural network.
  • the optimization of wide resource volume realizes high-speed operation on the limited resource platform.
  • Convolutional neural networks usually include multiple convolutional layers for feature extraction of the image to be processed, and classification of the extracted features to achieve various functions of the convolutional neural network.
  • the convolutional neural network contains multiple network parameters, and the value of the network parameters in the convolutional neural network determines the performance of the convolutional neural network.
  • the network parameters of convolutional neural networks are usually expressed as floating point numbers.
  • the convolutional neural network in the embodiment of the present application may be a trained convolutional neural network.
  • floating-point network parameters occupy storage space and have high computational complexity.
  • fixed-point processing is performed on the convolutional neural network using floating-point network parameters, so that the fixed-point processing network parameters are adopted.
  • the value represented by a power of 2 reduces the occupation of storage space by network parameters on the one hand, and simplifies the operations required for image processing based on convolutional neural networks on the other hand.
  • the power of 2 is used to represent the network parameters in a fixed point.
  • the operation unit can use shift and addition operations to replace the slower processing speed and greater power consumption Multiplication to achieve image processing.
  • the use of a power of 2 to represent network parameters can greatly reduce the complexity of the operation, increase the operation speed, achieve fast real-time response, and reduce the power consumption during the operation.
  • the operation unit is FPGA and other hardware with limited hardware resources, the volume can be solved. The problem that the product neural network cannot be applied to hardware or cannot realize accelerated calculation on hardware.
  • the network parameters of the convolutional neural network it may be detected whether the form of the network parameter is a floating-point number, and if so, the value of the network parameter needs to be fixed-pointed.
  • the network parameter after the fixed-point processing is a sum of M powers of 2 and M is an integer greater than 1.
  • M powers of 2 may be used for a floating point number.
  • the floating point number 36.11 it can be represented by 2 5 +2 2 .
  • 2 4 +2 2 +2 1 can be used for the floating point number 21.42.
  • the floating point number 16.25 it can be expressed by 2 4 +2 -2 .
  • the multiplication operations related to the network parameters in the convolutional neural network can be simplified to shift operations, which simplifies the operation of the convolutional neural network and improves the operation rate of the convolutional neural network.
  • the fixed-point network parameters can also be expressed by the sum of powers of 2 less than M.
  • the floating point number 32 it can be represented by 2 5 .
  • M is equal to 2.
  • M is equal to 2.
  • M for a fixed-point result of a network parameter, it can be expressed by 2 k +2 j .
  • the accuracy of the network parameters can be ensured, and the amount of calculation is not increased, and the occupation of memory and storage resources is reduced.
  • the embodiments of the present application provide a possible storage method of the network parameters after the fixed-point processing. In this method:
  • the fixed-point network parameters are the sum of the power of 2 to the power of k and the power of 2 to j, both k and j are integers, k is greater than j, and the difference between k and j is less than the preset threshold.
  • 2 k +2 j can be used for fixed-point representation, where k and j are both integers, and k is greater than j.
  • the difference between k and j may be limited to be less than a preset threshold.
  • the process of determining the preset threshold can first determine the initial values of all network parameters of the convolutional neural network, and then determine the minimum precision value s according to the precision of all the initial values.
  • the minimum precision value s is the possible value of j The minimum value.
  • the maximum value can be rounded up to the maximum initial value.
  • a power of 2 may be used, for example, 2 t represents the maximum value, or M powers of 2 represent the maximum value, such as 2 p +2 q . t, p, and q are integers, and p is greater than q. t or p is the maximum possible value of k.
  • the difference between k and j is greater than the preset threshold, the smallest possible value of j can be increased, that is, the fixed-point accuracy of the network parameters is reduced.
  • the optional way of storing the fixed-point processing network parameter may include:
  • the binary value mapping table is used to indicate the binary values corresponding to different combinations of k and j values;
  • the network parameters after the fixed-point processing are the sum of the power of k of 2 and the power of j of 2, considering that k and j are different integers, and the difference between k and j is less than the preset threshold d, so
  • the network parameter may have 28 values. For 28 different values, 5 digits can be used to distinguish. Therefore, a binary value mapping table can be established.
  • the binary mapping table indicates the binary values corresponding to different combinations of k and j values, thereby reducing the storage space occupied by the fixed-point processing network parameters during storage.
  • 2 log bits may be used to encode the network parameters after the fixed-point processing.
  • the first log bit can be used to represent the first power of 2
  • the last log bit can represent the next power of 2.
  • this step S101 may be executed by the processor invoking the corresponding instruction stored in the memory, or may be executed by the parameter fixed-point module 301 executed by the processor.
  • the image to be processed may be an image captured by a surveillance camera, an image captured by a car camera, or an image pre-stored in an image library.
  • the embodiment of the present application does not limit the image to be processed Acquisition method.
  • this step S102 may be executed by the processor invoking the corresponding instruction stored in the memory, or may be executed by the image acquisition module 302 executed by the processor.
  • the control operation unit processes the image according to the network parameters after the fixed-point processing of the convolutional neural network to obtain an image processing result.
  • this step S103 may be executed by the processor invoking the corresponding instruction stored in the memory, or may be executed by the image processing module 303 executed by the processor.
  • the image processing result includes but is not limited to at least one of the following: object detection/tracking result, feature extraction result, segmentation result, and classification result.
  • control arithmetic unit processes the image through the fixed-point processed convolutional neural network according to the network parameter, which may include:
  • the control operation unit determines the network parameters after the fixed-point processing of the convolutional neural network according to the binary value and the binary mapping table.
  • control operation unit determines the network parameters after the fixed-point processing of the convolutional neural network by searching in the binary mapping table according to the stored binary value, or determining the power k and j by searching in the binary mapping table, Therefore, the network parameter 2 k +2 j after fixed-point processing can be determined.
  • the image processing method provided by the embodiment of the present application includes: performing fixed-point processing on the convolutional neural network using floating-point network parameters according to the fixed-point bit width hardware resources of the arithmetic unit to obtain the image to be processed, and controlling the arithmetic unit according to the convolution
  • the network parameters after the neural network fixed-point processing processes the image to obtain the image processing result.
  • the use of a power of 2 to represent network parameters reduces the complexity of the operation, improves the speed of the operation, achieves fast real-time response, and reduces the power consumption during the operation.
  • the operation unit is hardware with limited hardware resources such as FPGA
  • convolutional neural networks cannot be applied to hardware or cannot be accelerated on hardware.
  • FIG. 2 is another schematic flowchart of the image processing method provided by the embodiment of the present application.
  • the image processing method includes:
  • the convolutional neural network is fixed-point processed by using floating-point network parameters.
  • the training data may be labeled data.
  • the training data is an image labeling a face area or a driving area.
  • the training, fixed-point, and retraining of the network parameters of the convolutional neural network may be performed multiple times to improve the accuracy of the network parameters.
  • the control operation unit processes the image according to the revised network parameters of the convolutional neural network to obtain the image processing result.
  • the training data is used to train the convolutional neural network including the fixed-point processing network parameters again to Correcting the network parameters after fixed-point processing can improve the accuracy of the network parameters of the convolutional neural network.
  • FIG. 3 is a schematic structural diagram of an image processing apparatus provided by an embodiment of the present application. As shown in FIG. 3, the image processing apparatus includes:
  • the parameter fixed-point module 301 is used for fixed-point processing of the network parameters expressed by the floating point of the convolutional neural network according to the fixed-point bit width hardware resource of the arithmetic unit.
  • the network parameters after fixed-point processing are values expressed by powers of two.
  • the image acquisition module 302 is used to acquire an image to be processed.
  • the image processing module 303 is used to control the arithmetic unit to process the image according to the network parameters after the fixed-point processing of the convolutional neural network to obtain the image processing result.
  • the network parameters after the fixed-point processing are the sum of M powers of 2 and M is an integer greater than 1.
  • M is equal to 2.
  • the network parameter after the fixed-point processing is the sum of the power of 2 to the power of k and the power of 2 to j, both k and j are integers, k is greater than j, and the difference between k and j is less than a preset threshold.
  • an embodiment of the present application further provides an image processing apparatus.
  • 4 is a schematic structural diagram of an image processing apparatus provided in Embodiment 2 of the present application. As shown in FIG. 4, the image processing device further includes:
  • the storage module 304 is used to obtain the binary values corresponding to the network parameters after the fixed-point processing according to the k and j, and the binary value mapping table, and the binary value mapping table is used to indicate the binary values corresponding to different combinations of k and j values; Store the binary value corresponding to the network parameters after the fixed-point processing.
  • the image processing module 303 in the embodiment shown in FIG. 4 is used to:
  • the control operation unit determines the network parameters after the fixed-point processing of the convolutional neural network according to the binary value and the binary mapping table;
  • the image is processed according to the network parameters after convolutional neural network fixed-point processing.
  • the image processing device further includes:
  • the training module 305 is used to train the convolutional neural network including the network parameters after the fixed-point processing using the training data before the control operation unit processes the image according to the network parameters after the fixed-point processing of the convolutional neural network Network parameters to be corrected.
  • the image processing results include but are not limited to at least one of the following: object detection/tracking results, feature extraction results, segmentation results, and classification results.
  • Another aspect of the embodiments of the present application also provides an intelligent driving system that uses the image processing method in the above embodiments and has the same or similar technical features and technical effects.
  • the intelligent driving system includes: an on-board camera 501, a convolutional neural network subsystem 502, and a control subsystem 503; where the control subsystem 503 is used to:
  • the convolutional neural network is fixed-point processed using floating-point network parameters.
  • the fixed-point processing network parameters are expressed by powers of 2 Value of
  • intelligent driving includes, but is not limited to: assisted driving, automatic driving, and switching between multiple driving modes such as assisted driving and automatic driving.
  • the network parameters after the fixed-point processing are the sum of M powers of 2 and M is an integer greater than 1.
  • M is equal to 2.
  • the network parameter after the fixed-point processing is the sum of the power of 2 to the power of k and the power of 2 to j, both k and j are integers, k is greater than j, and the difference between k and j is less than a preset threshold.
  • control subsystem 503 is also used to obtain the binary value corresponding to the network parameter after the fixed-point processing according to the k and j, and the binary value mapping table, and the binary value mapping table is used to indicate different values of k and j Combine the corresponding binary values.
  • the intelligent driving system also includes: a storage subsystem 504;
  • the storage subsystem 504 is used to store the binary value corresponding to the network parameter after the fixed-point processing.
  • control subsystem 503 is used for,
  • Control the convolutional neural network subsystem to determine the network parameters after the fixed-point processing of the convolutional neural network according to the binary values and binary value mapping tables stored in the storage subsystem;
  • the image of the road surface of the vehicle collected by the on-board camera is processed to obtain the image processing result.
  • the intelligent driving system further includes: a training subsystem 505;
  • the training subsystem 505 is used to train the convolutional neural network including the network parameters after the fixed-point processing by using the training data to correct the network parameters after the fixed-point processing.
  • the image processing result includes but is not limited to at least one of the following: license plate recognition result, drivable area detection result, lane line detection result, lane line attribute detection result, and vehicle camera attitude detection result.
  • Another aspect of the embodiments of the present application also provides an on-board computing platform based on FPGA, which uses the image processing method in the above embodiments, and has the same or similar technical features and technical effects.
  • the FPGA-based vehicle computing platform includes: a processor 601, an external memory 602, a memory 603, and an FPGA computing unit 604; wherein,
  • the external memory 602 stores the network parameters after the fixed-point processing of the neural network, or the binary values and look-up tables corresponding to the network parameters after the fixed-point processing of the neural network are stored.
  • the look-up table is used to indicate the power of different network parameters
  • the binary value corresponding to the secondary value; the network parameters after fixed-point processing are the values expressed in powers of 2;
  • the processor 601 reads the network parameters of the fixed-point processing of the neural network into the memory 603, and inputs the data on the memory and the image information to be processed to the FPGA arithmetic unit 604; or, the processor 601 will search according to the binary value and the lookup table
  • the network parameters obtained by the fixed-point processing are read into the memory 603, and the data on the memory 603 and the image information to be processed are input to the FPGA arithmetic unit 604;
  • the FPGA operation unit 604 obtains the shift operation result according to the image information to be processed and the network parameters of the fixed-point processing, and sums the results for multiple times to obtain the image processing result.
  • the network parameters after the fixed-point processing are the sum of M powers of 2 and M is an integer greater than 1.
  • M is equal to 2.
  • the network parameter after the fixed-point processing is the sum of the power of 2 to the power of k and the power of 2 to j, both k and j are integers, k is greater than j, and the difference between k and j is less than a preset threshold.
  • k and j are stored in the external memory 602;
  • the lookup table indicates the binary values corresponding to different combinations of k and j values.
  • the image processing results include but are not limited to at least one of the following: object detection/tracking results, feature extraction results, segmentation results, and classification results.
  • FIG. 7 is a schematic structural diagram of the electronic device provided by the embodiment of the present application. As shown in FIG. 7, the electronic device includes: a processor 702 and a memory 701;
  • the memory 701 is used to store at least one executable instruction, and the executable instruction causes the processor 702 to perform the operation corresponding to the image processing method provided in any one of the foregoing embodiments.
  • the electronic device further includes an operation unit 703, and the operation unit 703 is configured to implement the operation of the convolutional neural network in any of the foregoing embodiments.
  • Another aspect of the embodiments of the present application further provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program is used to execute the steps of the image processing method provided in any of the foregoing embodiments.
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of units is only a division of logical functions.
  • there may be other divisions for example, multiple units or components may be combined or integrated To another system, or some features can be ignored, or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical, or other forms.
  • the method and apparatus of the present application may be implemented in many ways.
  • the method and apparatus of the present application may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware.
  • the above sequence of steps for the method is for illustration only, and the steps of the method of the present application are not limited to the sequence specifically described above unless otherwise specifically stated.
  • the present application may also be implemented as programs recorded in a recording medium, and these programs include machine-readable instructions for implementing the method according to the present application.
  • the present application also covers a recording medium storing a program for executing the method according to the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)
  • Image Processing (AREA)

Abstract

一种图像处理方法、装置、智能驾驶系统和车载运算平台。该方法包括:根据运算单元(703)的定点位宽硬件资源量,将卷积神经网络使用浮点表示的网络参数定点化处理(S101,S201),获取待处理的图像(S102,S203),控制运算单元(703)根据所述卷积神经网络定点化处理后的网络参数处理图像,得到图像的处理结果(S103)。

Description

图像处理方法、装置、智能驾驶系统和车载运算平台
本申请要求在2018年12月29日提交中国专利局、申请号为CN201811643406.6、发明名称为“图像处理方法、装置、智能驾驶系统和车载运算平台”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及定点化技术,尤其涉及一种图像处理方法、装置、智能驾驶系统和车载运算平台。
背景技术
随着深度卷积神经网络技术在计算机视觉识别中的性能提升,卷积神经网络技术越来越频繁的被应用到如图像处理、无人驾驶系统、辅助驾驶系统等产品中。由于卷积神经网络处理的是图像数据,卷积神经网络技术通常应用依赖于高性能的图形处理器(Graphics Processing Unit,GPU),且需要巨大的运算量,消耗较大的内存。
发明内容
本申请实施例提供一种图像处理方法、装置、智能驾驶系统和车载运算平台。
根据本申请实施例的一个方面,提供的一种图像处理方法,包括:
根据运算单元的定点位宽硬件资源量,将卷积神经网络使用浮点表示的网络参数定点化处理,定点化处理后的网络参数为采用2的幂次表示的值;
获取待处理的图像;
控制所述运算单元根据所述卷积神经网络定点化处理后的网络参数处理所述图像,得到所述图像的处理结果。
根据本申请实施例的另一方面,提供的一种图像处理装置,包括:
参数定点化模块,用于根据运算单元的定点位宽硬件资源量,将卷积神 经网络使用浮点表示的网络参数定点化处理,定点化处理后的网络参数为采用2的幂次表示的值;
图像获取模块,用于获取待处理的图像;
图像处理模块,用于控制所述运算单元根据所述卷积神经网络定点化处理后的网络参数处理所述图像,得到所述图像的处理结果。
根据本申请实施例的又一方面,提供的一种智能驾驶系统,包括:车载摄像头,卷积神经网络子系统和控制子系统;其中,所述控制子系统用于:
根据运行所述卷积神经网络子系统的运算单元的定点位宽硬件资源量,将卷积神经网络使用浮点表示的网络参数定点化处理,定点化处理后的网络参数为采用2的幂次表示的值;
控制所述卷积神经网络子系统根据所述卷积神经网络定点化处理后的网络参数处理所述车载摄像头采集到的车辆行驶路面的图像,得到所述图像的处理结果;
根据所述图像的处理结果进行车辆智能驾驶。
根据本申请实施例的还一方面,提供的一种基于FPGA的车载运算平台,包括:处理器、外部存储器、内存和FPGA运算单元;
所述外部存储器中存储有所述神经网络的定点化处理后的网络参数,或者,存储有所述神经网络的定点化处理后的网络参数对应的二进制值和查找表,所述查找表用于指示不同的网络参数的幂次值对应的二进制值;定点化处理后的网络参数为采用2的幂次表示的值;
所述处理器将所述神经网络的定点化处理的网络参数读入所述内存,将所述内存上的数据和待处理的图像信息输入到所述FPGA运算单元;或者,所述处理器将根据所述二进制值和所述查找表,查找得到定点化处理的网络参数读入所述内存,将所述内存上的数据和待处理的图像信息输入到所述FPGA运算单元;
所述FPGA运算单元根据所述待处理的图像信息和定点化处理的网络参数得到移位运算结果,对多次结果求和运算,得到所述图像的处理结果。
根据本申请实施例的再一方面,提供的一种电子设备,包括:
处理器和存储器;
所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行如任一项实施例所述的图像处理方法对应的操作。
根据本申请实施例的再一方面,提供的一种电子设备,包括:
处理器和任一项实施例所述的图像处理装置;在处理器运行所述图像处理装置时,任一项实施例所述的图像处理装置中的模块被运行。
根据本申请实施例的再一方面,提供的一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序用于执行上述第一方面任一可行的实现方式中的图像处理方法步骤。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本公开。
本申请实施例提供一种图像处理方法、装置、智能驾驶系统和车载运算平台,图像处理方法包括:根据运算单元的定点位宽硬件资源量,将卷积神经网络使用浮点表示的网络参数定点化处理,获取待处理的图像,控制运算单元根据卷积神经网络定点化处理后的网络参数处理图像,得到图像的处理结果。通过对卷积神经网络的网络参数进行定点化,采用2的幂次来定点化表示网络参数,可减少存储空间占用以及运算过程中对内存空间的占用,节约了FPGA等硬件平台的资源。同时,采用2的幂次来表示网络参数可降低了运算复杂度,提高运算速度,实现快速实时响应,降低运算过程中的功耗,尤其在运算单元为FPGA等硬件资源有限的硬件时,解决了卷积神经网络无法应用在硬件上或者无法在硬件上实现加速运算的问题。
下面通过附图和实施例,对本申请的技术方案做进一步的详细描述。
附图说明
构成说明书的一部分的附图描述了本申请的实施例,并且连同描述一起用于解释本发明的原理。
参照附图,根据下面的详细描述,可以更加清楚地理解本申请,其中:
图1是本申请实施例提供的图像处理方法的一个流程示意图。
图2是本申请实施例提供的图像处理方法的另一流程示意图。
图3是本申请实施例提供的图像处理装置的一个结构示意图。
图4是本申请实施例提供的图像处理装置的另一结构示意图。
图5是本申请实施例提供的智能驾驶系统的一个结构示意图。
图6是本申请实施例提供的基于FPGA的车载运算平台的结构示意图。
图7是本申请实施例提供的电子设备的结构示意图。
具体实施方式
现在将参照附图来详细描述本申请的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本申请的范围。
同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本申请及其应用或使用的任何限制。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
本申请实施例可以应用于计算机系统/服务器,其可与众多其它通用或专用计算系统环境或配置一起操作。适于与计算机系统/服务器一起使用的众所周知的计算系统、环境和/或配置的例子包括但不限于:个人计算机系统、服务器计算机系统、瘦客户机、厚客户机、手持或膝上设备、基于微处理器的系统、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机系统﹑大型计算机系统和包括上述任何系统的分布式云计算技术环境,等等。
计算机系统/服务器可以在由计算机系统执行的计算机系统可执行指令(诸如程序模块)的一般语境下描述。通常,程序模块可以包括例程、程序、目标程序、组件、逻辑、数据结构等等,它们执行特定的任务或者实现特定 的抽象数据类型。计算机系统/服务器可以在分布式云计算环境中实施,分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或远程计算系统存储介质上。
图1是本申请实施例提供的图像处理方法的一个流程示意图。本申请实施例的执行主体示例性的可以为图像处理装置、处理器等电子设备,还可以为应用图像处理方法的任意装置和系统,例如监控系统、智能驾驶系统等。如图1所示,图像处理方法,包括:
S101、根据运算单元的定点位宽硬件资源量,将卷积神经网络使用浮点表示的网络参数定点化处理。
其中,定点化处理后的网络参数为采用2的幂次表示的值。
示例性的,本申请实施例中的运算单元可以为支持定点运算的计算单元,比如现场可编程逻辑门阵列(Field Programmable Gate Array,简称FPGA)中的数字信号处理器(Digital Signal Processor,简称DSP)等。当运算单元为FPGA等硬件平台时,为了发挥FPGA等硬件平台低功耗、加速运算等方面的综合优势,运算单元的定点位宽资源量通常较为有限。某些情形下,为了实现更低功耗往往会选择尽量小的定点位宽资源量,比如8比特或4比特位甚至更少宽资源量来实现定点运算。然而,定点位宽资源量少往往会影响运算速度,对于要求快速响应甚至是实时响应的平台,例如自动驾驶的车载运算平台等,本申请实施例通过对卷积神经网络进行适配硬件定点位宽资源量的优化,实现了在有限资源平台上的高速运算。
卷积神经网络通常包括多个卷积层,用于对待处理的图像进行特征提取,对提取到的特征进行分类,以实现卷积神经网络的各类功能。卷积神经网络中包含有多个网络参数,卷积神经网络中的网络参数的取值决定了卷积神经网络的性能。为提高卷积神经网络的性能,卷积神经网络的网络参数通常为浮点数表达。可选地,本申请实施例中的卷积神经网络可以为经过训练的卷积神经网络。但是浮点数形式的网络参数存在占用存储空间,计算复杂度高问题,本申请实施例中对卷积神经网络使用浮点表示的网络参数进行定点化 处理,使得定点化处理后的网络参数为采用2的幂次表示的值,一方面减少了网络参数对存储空间的占用,另一方面简化了基于卷积神经网络进行图像处理时所需的运算。
例如,采用2的幂次来定点化表示网络参数,在存储时,可仅存储网络参数对应的幂次数值,进而可减少存储空间占用以及运算过程中对内存空间的占用,节约了FPGA等硬件平台的资源。同时,当采用2的幂次来表示网络参数时,对于卷积神经网络中大量的乘法运算,可使得运算单元可以采用移位与加和运算的方式来替代处理速度较慢且功耗较大的乘法运算,实现图像处理。采用2的幂次来表示网络参数可大大降低运算复杂度,提高运算速度,实现快速实时响应,降低运算过程中的功耗,尤其在运算单元为FPGA等硬件资源有限的硬件时,可解决卷积神经网络无法应用在硬件上或者无法在硬件上实现加速运算的问题。
示例性的,在对卷积神经网络的网络参数进行定点化之前,可先检测网络参数的形式是否为浮点数,若是,则需对网络参数的取值进行定点化。
示例性的,在一种可能的实现方式中定点化处理后的网络参数为M个2的幂次之和,M为大于1的整数。
可选地,本实施例中,对于一个浮点数,可采用M个2的次幂来表示。例如,对于浮点数36.11,可采用2 5+2 2来表示。例如,对于浮点数21.42,可采用2 4+2 2+2 1来表示。例如,对于浮点数16.25,可采用2 4+2 -2来表示。通过采用M个2的幂次之和来对网络参数的取值进行定点化,一方面在偏差较小的情况下,减少了网络参数对存储空间的占用,确保了卷积神经网络的性能,另一方面,可将卷积神经网络中的与网络参数相关的乘法操作简化为移位操作,简化了卷积神经网络的运算,提高了卷积神经网络的运算速率。
可以理解的是,定点化后的网络参数也可以采用少于M个的2的幂次之和来表示。例如,对于浮点数32,可采用2 5来表示。
可选的,在一种可能的实现方式中,M等于2。例如,对于某一网络参数的定点化结果,可采用2 k+2 j来表示。
示例性的,通过均采用两个2的幂次来表示,可在确保网络参数的准确 度的同时,不增加过多的计算量,降低了内存及存储资源的占用量。
示例性的,在存储卷积神经网络的网络参数的取值时,由于网络参数数量较多,且每个网络参数的取值不同,在存储网络参数的取值时,存在占用存储空间较多的问题。为进一步减少存储空间占用,本申请实施例提供一种可能的定点化处理后的网络参数的存储方式,该种方式中:
定点化处理后的网络参数为2的k次幂与2的j次幂之和,k和j均为整数,k大于j、且k与j之差小于预设阈值。
可选地,对于任一浮点数,可采用2 k+2 j来进行定点化表示,其中k和j均为整数,且k大于j。为了简化网络参数的定点化过程,减少定点化处理后的网络参数存储时占用的空间,可限定k与j之差小于预设阈值。
示例性的,确定预设阈值的过程首先可以确定卷积神经网络的所有网络参数的初始值,然后根据所有初始值的精度,确定最小精度值s,最小精度值s即为j的可能取值的最小值。然后根据所有初始值中最大的初始值,确定最大数值,最大数值可以为对最大初始值进行向上取整。示例性的,可采用一个2的次幂,如2 t表示最大数值,或M个2的幂次表示最大数值,如2 p+2 q。t、p、q为整数,且p大于q。t或p即为k的可能取值的最大值。当k和j的差值大于预设阈值时,可将j的可能的最小取值增大,也即降低网络参数的定点化精度。
可选的,在本实施例中,当定点化处理后的网络参数为2的k次幂与2的j次幂之和,存储定点化处理后的网络参数的可选方式可以包括:
根据k与j,以及,二进制值映射表,获取定点化处理后的网络参数对应的二进制值,二进制值映射表用于指示不同k与j取值组合对应的二进制值;
存储定点化处理后的网络参数对应的二进制值。
示例性的,定点化处理后的网络参数为2的k次幂与2的j次幂之和,考虑到k和j为不相同的整数,且k与j之差小于预设阈值d,故可确定卷积神经网络的定点化处理后的网络参数,取值的所有可能性为
Figure PCTCN2019128764-appb-000001
种。其中,d=k-j+1。例如,当k和j之间差值的预设阈值d为7,此时,网络参数的取值可能为28种。对于28个不同的数值,可采用5位比位数进行区分。 因此,可建立二进制值映射表,二进制映射表中指示了不同k与j取值组合对应的二进制值,从而减少了定点化处理后的网络参数在存储时占用的存储空间。
示例性的,在确定定点化处理后的网络参数对应的二进制值时,可采用2log d比特位来编码定点化处理后的网络参数。可以为前log d比特位用于表示第一个2的幂次,后log d比特位表示后一个2的幂次。
在一个可选示例中,该步骤S101可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的参数定点化模块301执行。
S102、获取待处理的图像。
示例性地,待处理的图像可以为监控摄像头拍摄得到的图像,还可以为车载摄像头拍摄得到图像,还可以为预存储在的图像库中的图像等,本申请实施例不限制待处理的图像的获取方式。
在一个可选示例中,该步骤S102可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的图像获取模块302执行。
S103、控制运算单元根据卷积神经网络定点化处理后的网络参数处理图像,得到图像的处理结果。
在一个可选示例中,该步骤S103可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的图像处理模块303执行。
示例性地,图像的处理结果包括但不限于以下至少之一:对象检测/跟踪结果、特征提取结果、分割结果、分类结果。
示例性地,当采用存储二进制值的方式来指示定点化处理后的网络参数时,控制运算单元根据网络参数经过定点化处理的卷积神经网络处理图像,可包括:
S11、控制运算单元根据二进制值和二进制映射表,确定卷积神经网络的定点化处理后的网络参数。
示例性地,控制运算单元根据存储的二进制值,在二进制映射表中通过查找确定卷积神经网络的定点化处理后的网络参数,或者,在二进制映射表中通过查找确定幂次k和j,从而可确定定点化处理后的网络参数2 k+2 j
S12、根据卷积神经网络定点化处理后的网络参数处理图像。
本申请实施例提供的图像处理方法包括:根据运算单元的定点位宽硬件资源量,将卷积神经网络使用浮点表示的网络参数定点化处理,获取待处理的图像,控制运算单元根据卷积神经网络定点化处理后的网络参数处理图像,得到图像的处理结果。通过对卷积神经网络的网络参数进行定点化,采用2的幂次来定点化表示网络参数,可减少存储空间占用以及运算过程中对内存空间的占用,节约了FPGA等硬件平台的资源。同时,采用2的幂次来表示网络参数降低了运算复杂度,提高运算速度,实现快速实时响应,降低运算过程中的功耗,尤其在运算单元为FPGA等硬件资源有限的硬件时,解决了卷积神经网络无法应用在硬件上或者无法在硬件上实现加速运算的问题。
示例性地,在图1所示实施例的基础上,本申请实施例还提供了一种图像处理方法,图2是本申请实施例提供的图像处理方法的另一流程示意图。如图2所示,图像处理方法,包括:
S201、根据运算单元的定点位宽硬件资源量,将卷积神经网络使用浮点表示的网络参数定点化处理。
S202、采用训练数据训练包括定点化处理后的网络参数的卷积神经网络,以对定点化处理后的网络参数进行修正。
示例性的,训练数据可以为标注有标签的数据。例如,当采用卷积神经网络进行图像中的人脸检测或可行驶区域检测时,训练数据则为标注有人脸区域或可行驶区域的图像。
示例性的,对卷积神经网络的网络参数的训练、定点化以及再训练过程可以多次执行,以提高网络参数的准确性。
S203、获取待处理的图像。
S204、控制运算单元根据卷积神经网络经过修正后的网络参数处理图像,得到图像的处理结果。
本实施例提供的图像处理方法中,在对卷积神经网络使用浮点表示的网络参数进行定点化处理后,采用训练数据再次训练包括定点化处理后的网络参数的卷积神经网络,以对定点化处理后的网络参数进行修正,可提高卷积 神经网络的网络参数的准确性。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
图3是本申请实施例提供的图像处理装置的一个结构示意图。如图3所示,图像处理装置包括:
参数定点化模块301,用于根据运算单元的定点位宽硬件资源量,将卷积神经网络使用浮点表示的网络参数定点化处理。
其中,定点化处理后的网络参数为采用2的幂次表示的值。
图像获取模块302,用于获取待处理的图像。
图像处理模块303,用于控制运算单元根据卷积神经网络定点化处理后的网络参数处理图像,得到图像的处理结果。
可选的,定点化处理后的网络参数为M个2的幂次之和,M为大于1的整数。
可选的,M等于2。
可选的,定点化处理后的网络参数为2的k次幂与2的j次幂之和,k和j均为整数,k大于j、且k与j之差小于预设阈值。
示例性的,在图3所示实施例的基础上,本申请实施例还提供了一种图像处理装置。图4是本申请实施例二提供的图像处理装置的结构示意图。如图4所示,图像处理装置还包括:
存储模块304,用于根据k与j,以及,二进制值映射表,获取定点化处理后的网络参数对应的二进制值,二进制值映射表用于指示不同k与j取值组合对应的二进制值;存储定点化处理后的网络参数对应的二进制值。
可选的,上述图4所示实施例中的图像处理模块303,用于:
控制运算单元根据二进制值和二进制映射表,确定卷积神经网络的定点化处理后的网络参数;
根据卷积神经网络定点化处理后的网络参数处理图像。
示例性的,如图4所示,图像处理装置还包括:
训练模块305,用于在控制运算单元根据卷积神经网络定点化处理后的网络参数处理图像之前,采用训练数据训练包括定点化处理后的网络参数的卷积神经网络,以对定点化处理后的网络参数进行修正。
可选的,图像的处理结果包括但不限于以下至少之一:对象检测/跟踪结果、特征提取结果、分割结果、分类结果。
本公开实施例提供的图像处理装置任一实施例的工作过程、设置方式及相应技术效果,均可以参照本公开上述相应方法实施例的具体描述,限于篇幅,在此不再赘述。
本申请实施例另一方面还提供一种智能驾驶系统,采用了上述实施例中的图像处理方法,具有相同或相似的技术特征和技术效果。
图5是本申请实施例提供的智能驾驶系统的一个结构示意图。如图5所示,智能驾驶系统包括:车载摄像头501,卷积神经网络子系统502和控制子系统503;其中,控制子系统503用于:
根据运行卷积神经网络子系统502的运算单元的定点位宽硬件资源量,将卷积神经网络使用浮点表示的网络参数定点化处理,定点化处理后的网络参数为采用2的幂次表示的值;
控制卷积神经网络子系统502根据卷积神经网络定点化处理后的网络参数处理车载摄像头501采集到的车辆行驶路面的图像,得到图像的处理结果;
根据图像的处理结果进行车辆智能驾驶。
示例性的,智能驾驶包括但不限于:辅助驾驶,自动驾驶,以及辅助驾驶和自动驾驶等多种驾驶模式之间的切换等多种情形。
可选的,定点化处理后的网络参数为M个2的幂次之和,M为大于1的整数。
可选的,M等于2。
可选的,定点化处理后的网络参数为2的k次幂与2的j次幂之和,k和j均为整数,k大于j、且k与j之差小于预设阈值。
可选的,控制子系统503还用于,根据k与j,以及,二进制值映射表, 获取定点化处理后的网络参数对应的二进制值,二进制值映射表用于指示不同k与j取值组合对应的二进制值。
对应的,智能驾驶系统还包括:存储子系统504;
存储子系统504,用于存储定点化处理后的网络参数对应的二进制值。
可选的,控制子系统503用于,
控制卷积神经网络子系统根据存储子系统中存储的二进制值和二进制值映射表,确定卷积神经网络的定点化处理后的网络参数;
根据卷积神经网络定点化处理后的网络参数处理车载摄像头采集到的车辆行驶路面的图像,得到图像的处理结果。
可选的,智能驾驶系统还包括:训练子系统505;
训练子系统505用于,采用训练数据训练包括定点化处理后的网络参数的卷积神经网络,以对定点化处理后的网络参数进行修正。
可选的,图像的处理结果包括但不限于以下至少之一:车牌识别结果、可行驶区域检测结果、车道线检测结果、车道线属性检测结果、车载摄像头姿态检测结果。
本申请实施例另一方面还提供一种基于FPGA的车载运算平台,采用了上述实施例中的图像处理方法,具有相同或相似的技术特征和技术效果。
图6是本申请实施例提供的基于FPGA的车载运算平台的结构示意图。如图6所示,基于FPGA的车载运算平台包括:处理器601、外部存储器602、内存603和FPGA运算单元604;其中,
外部存储器602中存储有神经网络的定点化处理后的网络参数,或者,存储有神经网络的定点化处理后的网络参数对应的二进制值和查找表,查找表用于指示不同的网络参数的幂次值对应的二进制值;定点化处理后的网络参数为采用2的幂次表示的值;
处理器601将神经网络的定点化处理的网络参数读入内存603,将内存上的数据和待处理的图像信息输入到FPGA运算单元604;或者,处理器601将根据二进制值和查找表,查找得到定点化处理的网络参数读入内存603,将内存603上的数据和待处理的图像信息输入到FPGA运算单元604;
FPGA运算单元604根据待处理的图像信息和定点化处理的网络参数得到移位运算结果,对多次结果求和运算,得到图像的处理结果。
可选的,定点化处理后的网络参数为M个2的幂次之和,M为大于1的整数。
可选的,M等于2。
可选的,定点化处理后的网络参数为2的k次幂与2的j次幂之和,k和j均为整数,k大于j、且k与j之差小于预设阈值。
可选的,外部存储器602中存储有k与j;
查找表指示了不同的k和j的取值组合对应的二进制值。
可选的,图像的处理结果包括但不限于以下至少之一:对象检测/跟踪结果、特征提取结果、分割结果、分类结果。
本申请实施例另一方面还提供一种电子设备,图7是本申请实施例提供的电子设备的结构示意图,如图7所示,该电子设备包括:处理器702和存储器701;
存储器701用于存放至少一可执行指令,可执行指令使处理器702执行如上述任一项实施例提供的图像处理方法对应的操作。
示例性地,如图7所示,电子设备还包括运算单元703,运算单元703用于实现上述任一实施例中的卷积神经网络的运算。
本申请实施例另一方面还提供一种计算机可读存储介质,该计算机可读存储介质中存储有计算机程序,计算机程序用于执行上述任一实施例提供的的图像处理方法步骤。
本实施例中的装置与前述实施例中的方法是基于同一发明构思下的两个方面,在前面已经对方法实施过程作了详细的描述,所以本领域技术人员可根据前述描述清楚地了解本实施中的系统的结构及实施过程,为了说明书的简洁,在此就不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有 另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”“第四”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
本说明书中各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似的部分相互参见即可。对于系统实施例而言,由于其与方法实施例基本对应,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
可能以许多方式来实现本申请的方法和装置。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本申请的方法和装置。用于所述方法的步骤的上述顺序仅是为了进行说明,本申请的方法的步骤不限于以上具体描述的顺序,除非以其它方式特别说明。此外,在一些实施例中,还可将本申请实施为记录在记录介质中的程序,这些程序包括用于实现根据本申请的方法的机器可读指令。因而,本申请还覆盖存储用于执行根据本申请的方法的程序的记录介质。
本申请的描述是为了示例和描述起见而给出的,而并不是无遗漏的或者将本申请限于所公开的形式。很多修改和变化对于本领域的普通技术人员而言是显然的。选择和描述实施例是为了更好说明本申请的原理和实际应用,并且使本领域的普通技术人员能够理解本申请从而设计适于特定用途的带有各种修改的各种实施例。

Claims (33)

  1. 一种图像处理方法,其特征在于,包括:
    根据运算单元的定点位宽硬件资源量,将卷积神经网络使用浮点表示的网络参数定点化处理,定点化处理后的网络参数为采用2的幂次表示的值;
    获取待处理的图像;
    控制所述运算单元根据所述卷积神经网络定点化处理后的网络参数处理所述图像,得到所述图像的处理结果。
  2. 根据权利要求1所述的方法,其特征在于,所述定点化处理后的网络参数为M个2的幂次之和,所述M为大于1的整数。
  3. 根据权利要求2所述的方法,其特征在于,所述M等于2。
  4. 根据权利要求1-3任一所述的方法,其特征在于,所述定点化处理后的网络参数为2的k次幂与2的j次幂之和,所述k和所述j均为整数,所述k大于所述j、且所述k与所述j之差小于预设阈值。
  5. 根据权利要求4所述的方法,其特征在于,所述将卷积神经网络使用浮点表示的网络参数定点化处理之后,所述方法还包括:
    根据所述k与所述j,以及,二进制值映射表,获取所述定点化处理后的网络参数对应的二进制值,所述二进制值映射表用于指示不同k与j取值组合对应的二进制值;
    存储所述定点化处理后的网络参数对应的二进制值。
  6. 根据权利要求5所述的方法,其特征在于,所述控制所述运算单元根据所述卷积神经网络定点化处理后的网络参数处理所述图像,包括:
    控制所述运算单元根据所述二进制值和所述二进制映射表,确定所述卷积神经网络的定点化处理后的网络参数;
    根据所述卷积神经网络定点化处理后的网络参数处理所述图像。
  7. 根据权利要求1-6任一项所述的方法,其特征在于,所述控制所述运算单元根据定点化处理后的网络参数处理所述图像之前,还包括:
    采用训练数据训练包括定点化处理后的网络参数的卷积神经网络,以对所述定点化处理后的网络参数进行修正。
  8. 根据权利要求1-7任一项所述的方法,其特征在于,所述图像的处理结果包括以下至少之一:对象检测/跟踪结果、特征提取结果、分割结果、分类结果。
  9. 一种图像处理装置,其特征在于,包括:
    参数定点化模块,用于根据运算单元的定点位宽硬件资源量,将卷积神经网络使用浮点表示的网络参数定点化处理,定点化处理后的网络参数为采用2的幂次表示的值;
    图像获取模块,用于获取待处理的图像;
    图像处理模块,用于控制所述运算单元根据所述卷积神经网络定点化处理后的网络参数处理所述图像,得到所述图像的处理结果。
  10. 根据权利要求9所述的装置,其特征在于,所述定点化处理后的网络参数为M个2的幂次之和,所述M为大于1的整数。
  11. 根据权利要求10所述的装置,其特征在于,所述M等于2。
  12. 根据权利要求9-11任一所述的装置,其特征在于,所述定点化处理后的网络参数为2的k次幂与2的j次幂之和,所述k和所述j均为整数,所述k大于所述j、且所述k与所述j之差小于预设阈值。
  13. 根据权利要求12所述的装置,其特征在于,还包括:
    存储模块,用于根据所述k与所述j,以及,二进制值映射表,获取所述定点化处理后的网络参数对应的二进制值,所述二进制值映射表用于指示不同k与j取值组合对应的二进制值;存储所述定点化处理后的网络参数对应的二进制值。
  14. 根据权利要求13所述的装置,其特征在于,所述图像处理模块,用于:
    控制所述运算单元根据所述二进制值和所述二进制映射表,确定所述卷积神经网络的定点化处理后的网络参数;
    根据所述卷积神经网络定点化处理后的网络参数处理所述图像。
  15. 根据权利要求9-14任一项所述的装置,其特征在于,所述装置还包括:
    训练模块,用于在控制所述运算单元根据所述卷积神经网络定点化处理后的网络参数处理所述图像之前,采用训练数据训练包括定点化处理后的网络参数的卷积神经网络,以对所述定点化处理后的网络参数进行修正。
  16. 根据权利要求9-15任一项所述的装置,其特征在于,所述图像的处理结果包括以下至少之一:对象检测/跟踪结果、特征提取结果、分割结果、分类结果。
  17. 一种智能驾驶系统,其特征在于,包括:车载摄像头,卷积神经网络子系统和控制子系统;其中,所述控制子系统用于:
    根据运行所述卷积神经网络子系统的运算单元的定点位宽硬件资源量,将卷积神经网络使用浮点表示的网络参数定点化处理,定点化处理后的网络参数为采用2的幂次表示的值;
    控制所述卷积神经网络子系统根据所述卷积神经网络定点化处理后的网络参数处理所述车载摄像头采集到的车辆行驶路面的图像,得到所述图像的处理结果;
    根据所述图像的处理结果进行车辆智能驾驶。
  18. 根据权利要求17所述的系统,其特征在于,所述定点化处理后的网络参数为M个2的幂次之和,所述M为大于1的整数。
  19. 根据权利要求18所述的系统,其特征在于,所述M等于2。
  20. 根据权利要求17-19任一所述的系统,其特征在于,所述定点化处理后的网络参数为2的k次幂与2的j次幂之和,所述k和所述j均为整数,所述k大于所述j、且所述k与所述j之差小于预设阈值。
  21. 根据权利要求20所述的系统,其特征在于,所述控制子系统还用于,
    根据所述k与所述j,以及,二进制值映射表,获取所述定点化处理后的网络参数对应的二进制值,所述二进制值映射表用于指示不同k与j取值组合对应的二进制值;
    所述智能驾驶系统还包括:存储子系统;
    所述存储子系统,用于存储所述定点化处理后的网络参数对应的二进制值。
  22. 根据权利要求21所述的系统,其特征在于,所述控制子系统用于,
    控制所述卷积神经网络子系统根据所述存储子系统中存储的二进制值和二进制值映射表,确定所述卷积神经网络的定点化处理后的网络参数;
    根据所述卷积神经网络定点化处理后的网络参数处理所述车载摄像头采集到的车辆行驶路面的图像,得到所述图像的处理结果。
  23. 根据权利要求17-22任一项所述的系统,其特征在于,所述智能驾驶系统还包括:训练子系统;
    所述训练子系统用于,采用训练数据训练包括定点化处理后的网络参数的所述卷积神经网络,以对所述定点化处理后的网络参数进行修正。
  24. 根据权利要求17-21任一项所述的系统,其特征在于,所述图像的处理结果包括以下至少之一:车牌识别结果、可行驶区域检测结果、车道线检测结果、车道线属性检测结果、车载摄像头姿态检测结果。
  25. 一种基于FPGA的车载运算平台,其特征在于,包括:处理器、外部存储器、内存和FPGA运算单元;
    所述外部存储器中存储有神经网络的定点化处理后的网络参数,或者,存储有所述神经网络的定点化处理后的网络参数对应的二进制值和查找表,所述查找表用于指示不同的网络参数的幂次值对应的二进制值;定点化处理后的网络参数为采用2的幂次表示的值;
    所述处理器将所述神经网络的定点化处理的网络参数读入所述内存,将所述内存上的数据和待处理的图像信息输入到所述FPGA运算单元;或者,所述处理器将根据所述二进制值和所述查找表,查找得到定点化处理的网络参数读入所述内存,将所述内存上的数据和待处理的图像信息输入到所述FPGA运算单元;
    所述FPGA运算单元根据所述待处理的图像信息和定点化处理的网络参数得到移位运算结果,对多次结果求和运算,得到所述图像的处理结果。
  26. 根据权利要求25所述的平台,其特征在于,所述定点化处理后的网络参数为M个2的幂次之和,所述M为大于1的整数。
  27. 根据权利要求26所述的平台,其特征在于,所述M等于2。
  28. 根据权利要求25-27任一所述的平台,其特征在于,所述定点化处理后的网络参数为2的k次幂与2的j次幂之和,所述k和所述j均为整数,所述k大于所述j、且所述k与所述j之差小于预设阈值。
  29. 根据权利要求28所述的平台,其特征在于,所述外部存储器中存储有所述k与所述j;
    所述查找表指示了不同的k和j的取值组合对应的二进制值。
  30. 根据权利要求25-29任一项所述的平台,其特征在于,所述图像的处理结果包括以下至少之一:对象检测/跟踪结果、特征提取结果、分割结果、分类结果。
  31. 一种电子设备,其特征在于,包括:处理器和存储器;
    所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行如权利要求1-8任一项所述的图像处理方法对应的操作。
  32. 一种电子设备,其特征在于,包括:
    处理器和权利要求9-16任一项所述的图像处理装置;在处理器运行所述图像处理装置时,权利要求9-16任一项所述的图像处理装置中的模块被运行。
  33. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机程序,所述计算机程序用于执行权利要求1-8任一项所述的图像处理方法步骤。
PCT/CN2019/128764 2018-12-29 2019-12-26 图像处理方法、装置、智能驾驶系统和车载运算平台 WO2020135602A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2021533181A JP2022515343A (ja) 2018-12-29 2019-12-26 画像処理方法、装置、インテリジェント運転システム及び車載演算プラットフォーム
KR1020217018122A KR20210092254A (ko) 2018-12-29 2019-12-26 이미지 처리 방법, 장치, 지능형 주행 시스템 및 차량 탑재 연산 플랫폼

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811643406.6A CN111383156B (zh) 2018-12-29 2018-12-29 图像处理方法、装置、智能驾驶系统和车载运算平台
CN201811643406.6 2018-12-29

Publications (1)

Publication Number Publication Date
WO2020135602A1 true WO2020135602A1 (zh) 2020-07-02

Family

ID=71126827

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/128764 WO2020135602A1 (zh) 2018-12-29 2019-12-26 图像处理方法、装置、智能驾驶系统和车载运算平台

Country Status (4)

Country Link
JP (1) JP2022515343A (zh)
KR (1) KR20210092254A (zh)
CN (1) CN111383156B (zh)
WO (1) WO2020135602A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107704921A (zh) * 2017-10-19 2018-02-16 北京智芯原动科技有限公司 基于Neon指令的卷积神经网络的算法优化方法及装置
CN107885214A (zh) * 2017-11-22 2018-04-06 济南浪潮高新科技投资发展有限公司 一种基于fpga的加速自动驾驶视觉感知的方法及装置
US20180157969A1 (en) * 2016-12-05 2018-06-07 Beijing Deephi Technology Co., Ltd. Apparatus and Method for Achieving Accelerator of Sparse Convolutional Neural Network
CN108985449A (zh) * 2018-06-28 2018-12-11 中国科学院计算技术研究所 一种对卷积神经网络处理器的控制方法及装置
CN109002881A (zh) * 2018-06-28 2018-12-14 郑州云海信息技术有限公司 基于fpga的深度神经网络的定点化计算方法及装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2862337B2 (ja) * 1990-06-19 1999-03-03 キヤノン株式会社 ニューラルネットワークの構築方法
US10373050B2 (en) * 2015-05-08 2019-08-06 Qualcomm Incorporated Fixed point neural network based on floating point neural network quantization
JP2018173672A (ja) * 2015-09-03 2018-11-08 株式会社Preferred Networks 実装装置
JP6734938B2 (ja) * 2017-01-10 2020-08-05 株式会社日立製作所 ニューラルネットワーク回路
CN108830288A (zh) * 2018-04-25 2018-11-16 北京市商汤科技开发有限公司 图像处理方法、神经网络的训练方法、装置、设备及介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180157969A1 (en) * 2016-12-05 2018-06-07 Beijing Deephi Technology Co., Ltd. Apparatus and Method for Achieving Accelerator of Sparse Convolutional Neural Network
CN107704921A (zh) * 2017-10-19 2018-02-16 北京智芯原动科技有限公司 基于Neon指令的卷积神经网络的算法优化方法及装置
CN107885214A (zh) * 2017-11-22 2018-04-06 济南浪潮高新科技投资发展有限公司 一种基于fpga的加速自动驾驶视觉感知的方法及装置
CN108985449A (zh) * 2018-06-28 2018-12-11 中国科学院计算技术研究所 一种对卷积神经网络处理器的控制方法及装置
CN109002881A (zh) * 2018-06-28 2018-12-14 郑州云海信息技术有限公司 基于fpga的深度神经网络的定点化计算方法及装置

Also Published As

Publication number Publication date
JP2022515343A (ja) 2022-02-18
KR20210092254A (ko) 2021-07-23
CN111383156A (zh) 2020-07-07
CN111383156B (zh) 2022-08-02

Similar Documents

Publication Publication Date Title
US10929746B2 (en) Low-power hardware acceleration method and system for convolution neural network computation
CN113033537B (zh) 用于训练模型的方法、装置、设备、介质和程序产品
US10762373B2 (en) Image recognition method and device
CN113920307A (zh) 模型的训练方法、装置、设备、存储介质及图像检测方法
WO2022227770A1 (zh) 目标对象检测模型的训练方法、目标对象检测方法和设备
CN113642583B (zh) 用于文本检测的深度学习模型训练方法及文本检测方法
CN108229658B (zh) 基于有限样本的物体检测器的实现方法及装置
US11699240B2 (en) Target tracking method and apparatus, and storage medium
US20240153240A1 (en) Image processing method, apparatus, computing device, and medium
WO2020135601A1 (zh) 图像处理方法、装置、车载运算平台、电子设备及系统
CN109902588B (zh) 一种手势识别方法、装置及计算机可读存储介质
WO2024040954A1 (zh) 点云语义分割网络训练方法、点云语义分割方法及装置
WO2019177731A1 (en) Cluster compression for compressing weights in neural networks
CN114724133B (zh) 文字检测和模型训练方法、装置、设备及存储介质
CN114187459A (zh) 目标检测模型的训练方法、装置、电子设备以及存储介质
CN113348472A (zh) 具有软内核选择的卷积神经网络
Wasala et al. Real-time HOG+ SVM based object detection using SoC FPGA for a UHD video stream
WO2020135602A1 (zh) 图像处理方法、装置、智能驾驶系统和车载运算平台
CN115082598B (zh) 文本图像生成、训练、文本图像处理方法以及电子设备
CN113139463B (zh) 用于训练模型的方法、装置、设备、介质和程序产品
CN114282664A (zh) 自反馈模型训练方法、装置、路侧设备及云控平台
CN112861940A (zh) 双目视差估计方法、模型训练方法以及相关设备
JP2022539554A (ja) 高精度のニューラル処理要素
CN113343979B (zh) 用于训练模型的方法、装置、设备、介质和程序产品
CN115578613B (zh) 目标再识别模型的训练方法和目标再识别方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19901983

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021533181

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20217018122

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 01.10.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19901983

Country of ref document: EP

Kind code of ref document: A1