WO2021093780A1 - 一种目标识别方法及装置 - Google Patents

一种目标识别方法及装置 Download PDF

Info

Publication number
WO2021093780A1
WO2021093780A1 PCT/CN2020/128171 CN2020128171W WO2021093780A1 WO 2021093780 A1 WO2021093780 A1 WO 2021093780A1 CN 2020128171 W CN2020128171 W CN 2020128171W WO 2021093780 A1 WO2021093780 A1 WO 2021093780A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
picture
feature
recognized
bit width
Prior art date
Application number
PCT/CN2020/128171
Other languages
English (en)
French (fr)
Inventor
杨希超
张渊
谢迪
浦世亮
Original Assignee
杭州海康威视数字技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州海康威视数字技术股份有限公司 filed Critical 杭州海康威视数字技术股份有限公司
Publication of WO2021093780A1 publication Critical patent/WO2021093780A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • This application relates to the field of machine learning technology, and in particular to a target recognition method and device.
  • Deep neural network as an emerging field in machine learning research, analyzes data by imitating the mechanism of the human brain. It is an intelligent model for analyzing and learning by establishing and simulating the human brain.
  • deep learning models such as convolutional neural network models, recurrent neural network models, and long- and short-term memory network models, have become mainstream application methods in image classification, target detection, and speech recognition.
  • the picture to be recognized is input into the trained target deep learning model, and the network layers in the target deep learning model are calculated, and the target in the picture to be recognized can be identified based on the result of the calculation.
  • the data involved in the operation is single-precision floating-point data. Because single-precision floating-point data has a higher bit width, the amount of data involved in the operation is large, which leads to the target recognition. The efficiency is low.
  • the purpose of the embodiments of the present application is to provide a target recognition method and device to improve the efficiency of target recognition.
  • the specific technical solutions are as follows:
  • an embodiment of the present application provides a target recognition method, which includes:
  • the input characteristics of the network layer and the network of the network layer will be input
  • At least one of the weight and the output feature output by the network layer is quantized into integer data with a bit width less than 16 bits;
  • the target feature is compared with the pre-calibrated target feature to obtain the target recognition result of the picture to be recognized.
  • the method before the step of inputting the picture to be recognized into a pre-trained target deep learning model to obtain target features in the picture to be recognized, the method further includes:
  • a preprocessing operation is performed on the picture to be recognized to obtain a preprocessed picture to be recognized, wherein the preprocessing operation includes at least cropping the picture to be recognized.
  • the network layer in the target deep learning model includes: a convolutional layer, a fully connected layer, a pooling layer, a batch normalization layer, a merging layer, and a splicing layer.
  • the step of converting the network weight of the network layer into integer data with a bit width less than 16 bits includes:
  • each network weight in the filter is weighted into integer data with a preset bit width.
  • the step of quantizing the input features input to the network layer into integer data with a bit width less than 16 bits includes:
  • the input feature is quantized into integer data with a bit width less than 16 bits.
  • the step of quantizing the output feature output by the network layer into integer data with a bit width less than 16 bits includes:
  • the output feature is quantized into integer data with a bit width less than 16 bits.
  • the step of obtaining the picture to be recognized includes:
  • the step of inputting the picture to be recognized into the pre-trained target deep learning model to obtain the target feature in the picture to be recognized includes:
  • the steps of comparing the target feature with the pre-calibrated target feature to obtain the target identification result of the picture to be recognized include:
  • the target face feature is compared with the preset face feature to obtain the face recognition result, or the target vehicle feature is compared with the preset vehicle feature to obtain the vehicle recognition result.
  • an embodiment of the present application provides a target recognition device, which includes:
  • the obtaining module is used to obtain the picture to be recognized
  • the calculation module is used to input the picture to be recognized into the pre-trained target deep learning model to obtain the target features in the picture to be recognized. For each network layer in the target deep learning model, the input characteristics of the network layer will be input, At least one of the network weight of the network layer and the output feature output by the network layer is quantized as integer data with a bit width less than 16 bits;
  • the comparison module is used to compare the target feature with the pre-calibrated target feature to obtain the target recognition result of the picture to be recognized.
  • the device further includes:
  • the preprocessing module is used to perform a preprocessing operation on the picture to be recognized to obtain the preprocessed picture to be recognized, wherein the preprocessing operation includes at least cropping the picture to be recognized.
  • the network layer in the target deep learning model includes: a convolutional layer, a fully connected layer, a pooling layer, a batch normalization layer, a merging layer, and a splicing layer.
  • the calculation module when used to quantify the network weight of the network layer into integer data with a bit width less than 16 bits, it is specifically used for:
  • each network weight in the filter is weighted into integer data with a preset bit width.
  • calculation module when used to quantify the input features input to the network layer into integer data with a bit width less than 16 bits, it is specifically used to:
  • the input feature is quantized into integer data with a bit width less than 16 bits.
  • calculation module when used to quantify the output feature output by the network layer into integer data with a bit width less than 16 bits, it is specifically used to:
  • the output feature is quantized into integer data with a bit width less than 16 bits.
  • Calculation module specifically used for:
  • Comparison module specifically used for:
  • the target face feature is compared with the preset face feature to obtain the face recognition result, or the target vehicle feature is compared with the preset vehicle feature to obtain the vehicle recognition result.
  • an embodiment of the present application provides a computer device, including a processor and a machine-readable storage medium.
  • the machine-readable storage medium stores machine-executable instructions that can be executed by the processor, and the processor is executed by the machine-executable instructions.
  • Prompt to implement the method provided in the first aspect of the embodiments of the present application.
  • an embodiment of the present application provides a machine-readable storage medium that stores machine-executable instructions that, when called and executed by a processor, implement the method provided in the first aspect of the embodiments of the present application.
  • an embodiment of the present application provides a computer program product for executing the method provided in the first aspect of the embodiment of the present application at runtime.
  • the target recognition method and device obtained by the embodiments of the present application obtain a picture to be recognized, and input the picture to be recognized into a pre-trained target deep learning model to obtain the target feature in the picture to be recognized, and compare the target feature with a pre-calibrated target The features are compared, and the target recognition result of the picture to be recognized is obtained.
  • the target deep learning model performs operations on the input image to be recognized, for each network layer in the target deep learning model, it will input the input characteristics of the network layer, the network weight of the network layer, and the output of the network layer. At least one item of the output characteristics is quantized into integer data with a bit width less than 16 bits.
  • the input characteristics of the input network layer, the network weight of the network layer, or the output characteristics of the network layer output are quantized into integer data with a bit width less than 16 bits.
  • FIG. 1 is a schematic flowchart of a target recognition method according to an embodiment of the application
  • 2a is a schematic diagram of the structure of a convolutional layer according to an embodiment of the application.
  • 2b is a schematic diagram of the structure of a fully connected layer according to an embodiment of the application.
  • FIG. 2c is a schematic structural diagram of a pooling layer according to an embodiment of the application.
  • 2d is a schematic diagram of the structure of a batch normalization layer according to an embodiment of the application.
  • FIG. 2e is a schematic diagram of the structure of a merge layer in an embodiment of the application.
  • 2f is a schematic diagram of the structure of the splicing layer in an embodiment of the application.
  • FIG. 3 is a schematic structural diagram of a target recognition device according to an embodiment of the application.
  • Fig. 4 is a schematic structural diagram of a computer device according to an embodiment of the application.
  • embodiments of the present application provide a target recognition method, device, computer equipment, and machine-readable storage medium.
  • the target recognition method provided by the embodiment of the present application will be introduced first.
  • the execution subject of the target recognition method provided by the embodiments of the present application may be a computer device with a target recognition function, such as a smart camera, a target recognizer, etc., and the execution subject includes at least a core processing chip with data processing capabilities.
  • the method for implementing the target recognition method provided by the embodiment of the present application may be at least one of software, hardware circuit, and logic circuit provided in the execution subject.
  • Fig. 1 it is a schematic flow chart of the target recognition method according to the embodiment of this application.
  • the method may include the following steps.
  • the picture to be recognized is a picture that contains a target that needs to be recognized.
  • the picture to be recognized can be a smart camera to shoot the surveillance area.
  • a pedestrian enters the surveillance area it contains a face target.
  • the picture can also be a picture that contains a human face target entered by the user according to requirements.
  • the targets mentioned in the embodiments of the present application are not limited to face targets, and may also be targets such as cars, bicycles, and buildings.
  • the target deep learning model is a deep learning network model, such as convolutional neural network model, recurrent neural network model, long-term short-term memory network model, etc., after the target The operation of each network layer in the deep learning model, the target deep learning model can output the target features in the picture to be recognized.
  • the target deep learning model is pre-trained based on training samples.
  • the training samples can be sample pictures with specified targets pre-marked.
  • Input the training samples into the initial network model using BP (Back Propagation) algorithm or other
  • the model training algorithm performs calculations on the training samples, compares the calculation results with the set nominal values, and adjusts the network weights of the network models based on the comparison results.
  • the input characteristics of the network layer, the network weight of the network layer, and the output of the network layer At least one of the output features of is quantized as integer data with a bit width less than 16 bits, that is, at least one of the input features of the input network layer, the network weight of the network layer, and the output feature output of the network layer is quantified as Low-bit integer data with a bit width less than 16 bits.
  • the data involved in the operation changes from single-precision floating point data to low-bit integer data with a bit width less than 16 bits, which reduces the bit width and data volume of the data involved in the operation.
  • the network layer in the target deep learning model may include: a convolutional layer, a fully connected layer, a pooling layer, a batch normalization layer, a merging layer, and a splicing layer.
  • the deep learning model can specifically include the following network layers: Convolution layer (convolutional layer) (as shown in Figure 2a), InnerProduct layer (fully connected layer) (as shown in Figure 2b), Pooling layer (pooling layer) ) (Shown in Figure 2c), the BN layer (Batch Normalization) used to adjust the scale of the channel (shown in Figure 2d), and the Eltwise layer used to add and merge the two inputs (combined Layer) (as shown in Fig. 2e) and Concat layer (splicing layer) for splicing two inputs (as shown in Fig. 2f), the number of each type of network layer in the deep learning model is not limited.
  • I INTn represents an n-bit integer input feature
  • W INTn represents an n-bit integer network weight
  • O INTn represents an n-bit integer output feature
  • I1 INTn and I2 INTn represent two branches n-bit integer input feature, where n is less than 16.
  • the network weight, input feature, and output feature may all be n-bit integer data.
  • the target feature can be compared with the pre-calibrated target feature, and the feature value is compared one by one to determine whether the target feature is the calibrated target feature In this way, it is possible to obtain the recognition results such as whether the target in the picture to be recognized is a calibrated target, how likely the target in the picture to be recognized is the calibrated target, and the position of the target in the picture to be recognized.
  • the specific comparison process can be to compare feature points by feature point to determine whether each feature point is the same as the corresponding feature point in the calibrated target feature. If the number of the same feature points exceeds the threshold, the target in the image to be identified is considered to be calibrated The goal.
  • the target deep learning model performs operations on the input image to be recognized, for each network layer in the target deep learning model, it will input the input characteristics of the network layer, the network weight of the network layer, and the output of the network layer. At least one item of the output characteristics is quantized into integer data with a bit width less than 16 bits.
  • the input characteristics of the input network layer, the network weight of the network layer, or the output characteristics of the network layer output are quantized into integer data with a bit width less than 16 bits.
  • the execution process of face recognition mainly includes: obtaining a face picture collected by a face collection device; inputting the face picture into a pre-trained target deep learning model to obtain The target face features in the face picture; compare the target face features with the preset face features to obtain the face recognition result.
  • the execution process of vehicle recognition mainly includes: obtaining vehicle pictures collected by vehicle collection equipment; inputting the vehicle pictures into the pre-trained target deep learning model to obtain the target vehicle characteristics in the vehicle pictures; combining the target vehicle characteristics with The preset vehicle features are compared, and the vehicle recognition result is obtained.
  • the step of converting the network weight of the network layer into integer data with a bit width less than 16 bits may specifically be:
  • each filter in the network layer For each filter in the network layer, read the network weight with the largest absolute value in the filter; calculate the quantization step size corresponding to the filter according to the network weight with the largest absolute value and the preset bit width less than 16 bits; use quantization Step size, weighting each network weight in the filter into integer data with a preset bit width.
  • a network layer is composed of multiple filters (filters).
  • a filter is a convolution kernel.
  • a filter includes multiple network weights. For each filter, the largest absolute value can be read from the filter.
  • the network weight W max the preset bit width is the desired bit width of the network weight.
  • the preset bit width is less than 16 bits. It can be calculated based on the network weight W max with the largest absolute value and the preset bit width less than 16 bits.
  • the quantization step size step W corresponding to the filter can be calculated by using formula (1):
  • step W W max /(2 bitwidth -1) (1)
  • the quantization step size can be utilized for each step W weighting filters in the network re-quantizes the quantized low bit integer data of a predetermined bit width.
  • the step of quantizing the input feature input to the network layer into integer data with a bit width less than 16 bits may specifically be: obtaining multiple The preset undetermined step size, where the bit width corresponding to the undetermined step size is less than 16 bits; respectively calculate the quantization error for quantizing the input feature with each undetermined step size; use the undetermined step size corresponding to the smallest quantization error to convert the input feature
  • the quantization is integer data with a bit width less than 16 bits.
  • the step of quantizing the output feature output by the network layer into integer data with a bit width less than 16 bits may specifically be: obtaining multiple The preset undetermined step size, where the bit width corresponding to the undetermined step size is less than 16 bits; the quantization error for quantizing the output feature with each undetermined step size is calculated separately; the undetermined step size corresponding to the smallest quantization error is used to output the feature
  • the quantization is integer data with a bit width less than 16 bits.
  • a plurality of preset undetermined step size steps can be used to quantize the features respectively, and the quantized feature integer value A q under the undetermined step size step can be obtained, and the original feature floating point value A can be used float minus the product of the undetermined step size step and the characteristic integer value A q to obtain the quantization error under the undetermined step size step.
  • Multiple quantization errors can be obtained.
  • the minimum value of the quantization error corresponding to the value determined step input is the feature / output characteristic of the quantization step step a, using the quantization step step a feature of the input / output characteristics of the quantization, quantization preset low bit integer data bits wide.
  • step a argmin step
  • the quantized data can be specifically calculated using formula (3).
  • Q float is the floating point value of the original input feature, network weight or output feature
  • step is the step size used for quantization
  • Q INTn is the quantized data.
  • the quantization parameter When quantizing the network weight of the network layer, the quantization parameter can be pre-calculated or calculated in real time during quantization.
  • the above formula can be used whether it is pre-calculated or real-time calculation of quantization parameters.
  • the calculated quantization parameter In the case of calculating the quantization parameter, the calculated quantization parameter is pre-recorded in a buffer, and the quantization parameter is directly read from the buffer for quantization during quantization.
  • the embodiment of the present application may also perform: perform a preprocessing operation on the picture to be recognized to obtain the preprocessed picture to be recognized, wherein the preprocessing operation may at least include Crop the picture to be recognized.
  • the preprocessing operation includes at least cropping the image to be recognized, and may also include operations such as graying and pixel value normalization on the image to be recognized.
  • an embodiment of the present application provides a target recognition device.
  • the device may include:
  • the obtaining module 310 is used to obtain the picture to be recognized
  • the calculation module 320 is used to input the picture to be recognized into the pre-trained target deep learning model to obtain the target features in the picture to be recognized. For each network layer in the target deep learning model, the input characteristics of the network layer will be input , At least one of the network weight of the network layer and the output feature output by the network layer is quantized as integer data with a bit width less than 16 bits;
  • the comparison module 330 is configured to compare the target feature with the pre-calibrated target feature to obtain the target recognition result of the picture to be recognized.
  • the device may further include:
  • the preprocessing module is used to perform a preprocessing operation on the picture to be recognized to obtain the preprocessed picture to be recognized, wherein the preprocessing operation includes at least cropping the picture to be recognized.
  • the network layer in the target deep learning model may include: a convolutional layer, a fully connected layer, a pooling layer, a batch normalization layer, a merging layer, and a splicing layer.
  • the calculation module 320 when used to quantify the network weight of the network layer into integer data with a bit width less than 16 bits, it can be specifically used to: Filter, read the network weight with the largest absolute value in the filter; calculate the quantization step size corresponding to the filter according to the network weight with the largest absolute value and the preset bit width less than 16 bits; use the quantization step size to Each network weight in the filter is converted into integer data with a preset bit width.
  • the calculation module 320 when used to quantify the input feature input to the network layer into integer data with a bit width less than 16 bits, it can be specifically used to: obtain multiple presets The undetermined step size of, where the bit width corresponding to the undetermined step size is less than 16 bits; respectively calculate the quantization error for quantizing the input feature using each undetermined step size; use the undetermined step size corresponding to the smallest quantization error to quantize the input feature Integer data with a bit width less than 16 bits.
  • the calculation module 320 when used to quantify the output feature output by the network layer into integer data with a bit width less than 16 bits, it can be specifically used to: obtain multiple presets The undetermined step size of, where the bit width corresponding to the undetermined step size is less than 16 bits; respectively calculate the quantization error for quantizing the output feature with each undetermined step size; use the undetermined step size corresponding to the smallest quantization error to quantize the output feature Integer data with a bit width less than 16 bits.
  • the obtaining module 310 may be specifically used to: obtain a face picture collected by a face collection device, or obtain a vehicle picture collected by a vehicle collection device;
  • the calculation module 320 can be specifically used to: input the face image into the pre-trained target deep learning model to obtain the target face features in the face image, or input the vehicle image into the pre-trained target deep learning model to obtain the vehicle image Characteristics of the target vehicle in
  • the comparison module 330 may be specifically used to compare the target face feature with a preset face feature to obtain a face recognition result, or compare the target vehicle feature with a preset vehicle feature to obtain a vehicle recognition result.
  • the target deep learning model performs operations on the input image to be recognized, for each network layer in the target deep learning model, it will input the input characteristics of the network layer, the network weight of the network layer, and the output of the network layer. At least one item of the output characteristics is quantized into integer data with a bit width less than 16 bits.
  • the input characteristics of the input network layer, the network weight of the network layer, or the output characteristics of the network layer output are quantized into integer data with a bit width less than 16 bits.
  • An embodiment of the present application provides a computer device. As shown in FIG. 4, it may include a processor 401 and a machine-readable storage medium 402.
  • the machine-readable storage medium 402 stores machine executable instructions that can be executed by the processor 401.
  • the processor 401 is prompted by machine-executable instructions to implement the steps of the above-mentioned target recognition method.
  • the above-mentioned machine-readable storage medium may include RAM (Random Access Memory, random access memory), and may also include NVM (Non-Volatile Memory, non-volatile memory), for example, at least one disk storage.
  • NVM Non-Volatile Memory, non-volatile memory
  • the machine-readable storage medium may also be at least one storage device located far away from the foregoing processor.
  • the above-mentioned processor may be a general-purpose processor, including CPU (Central Processing Unit), NP (Network Processor, network processor), etc.; it may also be DSP (Digital Signal Processing, digital signal processor), ASIC (Application Specific Integrated Circuit, FPGA (Field-Programmable Gate Array, Field Programmable Gate Array) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • CPU Central Processing Unit
  • NP Network Processor, network processor
  • DSP Digital Signal Processing, digital signal processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array, Field Programmable Gate Array
  • other programmable logic devices discrete gates or transistor logic devices, discrete hardware components.
  • the machine-readable storage medium 402 and the processor 401 may perform data transmission through a wired connection or a wireless connection, and the computer device may communicate with other devices through a wired communication interface or a wireless communication interface. What is shown in FIG. 4 is only an example of data transmission between the processor 401 and the machine-readable storage medium 402 via a bus, and is not intended to limit the specific connection manner.
  • the processor 401 reads the machine-executable instructions stored in the machine-readable storage medium 402 and runs the machine-executable instructions to achieve: obtain the picture to be recognized, and input the picture to be recognized into the pre-trained
  • the target deep learning model obtains the target feature in the picture to be recognized, compares the target feature with the pre-calibrated target feature, and obtains the target recognition result of the picture to be recognized.
  • the target deep learning model performs operations on the input image to be recognized, for each network layer in the target deep learning model, it will input the input characteristics of the network layer, the network weight of the network layer, and the output of the network layer. At least one item of the output characteristics is quantized into integer data with a bit width less than 16 bits.
  • the input characteristics of the input network layer, the network weight of the network layer, or the output characteristics of the network layer output are quantized into integer data with a bit width less than 16 bits.
  • the embodiment of the present application also provides a machine-readable storage medium that stores machine-executable instructions, which, when called and executed by a processor, implement the steps of the above-mentioned target identification method.
  • the machine-readable storage medium stores machine executable instructions that execute the target recognition method provided by the embodiment of this application at runtime, so it can be achieved: to obtain a picture to be recognized, and input the picture to be recognized into a pre-trained target
  • the deep learning model obtains the target feature in the picture to be recognized, compares the target feature with the pre-calibrated target feature, and obtains the target recognition result of the picture to be recognized.
  • the target deep learning model calculates the input image to be recognized, for each network layer in the target deep learning model, it will input the input characteristics of the network layer, the network weight of the network layer, and the output of the network layer. At least one of the output characteristics is quantized into integer data with a bit width less than 16 bits.
  • the input characteristics of the input network layer, the network weight of the network layer, or the output characteristics of the network layer output are quantized into integer data with a bit width less than 16 bits.
  • the embodiment of the present application also provides a computer program product, which is used to execute the steps of the above-mentioned target recognition method at runtime.
  • the computer may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • software it can be implemented in the form of a computer program product in whole or in part.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a machine-readable storage medium, or transmitted from one machine-readable storage medium to another machine-readable storage medium. For example, the computer instructions may be sent from a website, computer, server, or data center.
  • the machine-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (such as a floppy disk, a hard disk, a magnetic tape), an optical medium (such as a DVD (Digital Versatile Disc)), or a semiconductor medium (such as an SSD (Solid State Disk)), etc. .
  • the program can be stored in a computer readable storage medium, which is referred to herein as Storage media, such as ROM/RAM, magnetic disks, optical disks, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

一种目标识别方法及装置,所述方法包括:获取待识别图片(S101),将待识别图片输入预先训练的目标深度学习模型,得到待识别图片中的目标特征,其中,针对目标深度学习模型中的每一个网络层,将输入该网络层的输入特征、该网络层的网络权重和该网络层输出的输出特征中的至少一项量化为位宽小于16比特的整型数据(S102)。将该目标特征与预先标定的目标特征进行比较,得到待识别图片的目标识别结果(S103)。对于每个网络层,参与运算的是位宽较低的低比特整型数据,降低了参与运算的数据的位宽和数据量,能够提升目标深度学习模型进行目标识别的运算速率,从而提高了目标识别的效率。

Description

一种目标识别方法及装置
本申请要求于2019年11月13日提交中国专利局、申请号为201911108141.4、发明名称为“一种目标识别方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及机器学习技术领域,特别是涉及一种目标识别方法及装置。
背景技术
深度神经网络作为机器学习研究中的一个新兴领域,通过模仿人脑的机制来解析数据,是一种通过建立和模拟人脑进行分析学习的智能模型。目前,深度学习模型,如卷积神经网络模型、循环神经网络模型、长短期记忆网络模型等已在图像分类、目标检测、语音识别等方面成为了主流的应用方法。
目前,在目标识别的场景下,将待识别图片输入到训练好的目标深度学习模型中,由目标深度学习模型中的各网络层进行运算,基于运算结果能够识别出待识别图片中的目标。目标深度学习模型中的各网络层在进行运算时,参与运算的数据为单精度浮点数据,由于单精度浮点数据具有较高的位宽,参与运算的数据量较大,导致目标识别的效率较低。
发明内容
本申请实施例的目的在于提供一种目标识别方法及装置,以提高目标识别的效率。具体技术方案如下:
第一方面,本申请实施例提供了一种目标识别方法,该方法包括:
获取待识别图片;
将待识别图片输入预先训练的目标深度学习模型,得到待识别图片中的目标特征,其中,针对目标深度学习模型中的每一个网络层,将输入该网络层的输入特征、该网络层的网络权重和该网络层输出的输出特征中的至少一项量化为位宽小于16比特的整型数据;
将目标特征与预先标定的目标特征进行比较,得到待识别图片的目标识别结果。
可选的,在将待识别图片输入预先训练的目标深度学习模型,得到待识别图片中的目标特征的步骤之前,该方法还包括:
对待识别图片进行预处理操作,得到预处理后的待识别图片,其中,预处理操作至少包括对待识别图片进行裁剪。
可选的,目标深度学习模型中的网络层包括:卷积层、全连接层、池化层、批归一化层、合并层和拼接层。
可选的,将该网络层的网络权重量化为位宽小于16比特的整型数据的步骤,包括:
针对该网络层的各滤波器,读取该滤波器中绝对值最大的网络权重;
根据绝对值最大的网络权重及小于16比特的预设位宽,计算该滤波器对应的量化步长;
利用量化步长,将该滤波器中的各网络权重量化为预设位宽的整型数据。
可选的,将输入该网络层的输入特征量化为位宽小于16比特的整型数据的步骤,包括:
获取多个预设的待定步长,其中,待定步长对应的位宽小于16比特;
分别计算利用各待定步长对输入特征进行量化的量化误差;
利用最小的量化误差对应的待定步长,将输入特征量化为位宽小于16比特的整型数据。
可选的,将该网络层输出的输出特征量化为位宽小于16比特的整型数据的步骤,包括:
获取多个预设的待定步长,其中,待定步长对应的位宽小于16比特;
分别计算利用各待定步长对输出特征进行量化的量化误差;
利用最小的量化误差对应的待定步长,将输出特征量化为位宽小于16比特的整型数据。
可选的,获取待识别图片的步骤,包括:
获取人脸采集设备采集的人脸图片,或者,获取车辆采集设备采集的车辆图片;
将待识别图片输入预先训练的目标深度学习模型,得到待识别图片中的目标特征的步骤,包括:
将人脸图片输入预先训练的目标深度学习模型,得到人脸图片中的目标人脸特征,或者,将车辆图片输入预先训练的目标深度学习模型,得到车辆图片中的目标车辆特征;
将目标特征与预先标定的目标特征进行比较,得到待识别图片的目标识 别结果的步骤,包括:
将目标人脸特征与预设人脸特征进行比较,得到人脸识别结果,或者,将目标车辆特征与预设车辆特征进行比较,得到车辆识别结果。
第二方面,本申请实施例提供了一种目标识别装置,该装置包括:
获取模块,用于获取待识别图片;
计算模块,用于将待识别图片输入预先训练的目标深度学习模型,得到待识别图片中的目标特征,其中,针对目标深度学习模型中的每一个网络层,将输入该网络层的输入特征、该网络层的网络权重和该网络层输出的输出特征中的至少一项量化为位宽小于16比特的整型数据;
比较模块,用于将目标特征与预先标定的目标特征进行比较,得到待识别图片的目标识别结果。
可选的,该装置还包括:
预处理模块,用于对待识别图片进行预处理操作,得到预处理后的待识别图片,其中,预处理操作至少包括对待识别图片进行裁剪。
可选的,目标深度学习模型中的网络层包括:卷积层、全连接层、池化层、批归一化层、合并层和拼接层。
可选的,计算模块,在用于将该网络层的网络权重量化为位宽小于16比特的整型数据时,具体用于:
针对该网络层的各滤波器,读取该滤波器中绝对值最大的网络权重;
根据绝对值最大的网络权重及小于16比特的预设位宽,计算该滤波器对应的量化步长;
利用量化步长,将该滤波器中的各网络权重量化为预设位宽的整型数据。
可选的,计算模块,在用于将输入该网络层的输入特征量化为位宽小于16比特的整型数据时,具体用于:
获取多个预设的待定步长,其中,待定步长对应的位宽小于16比特;
分别计算利用各待定步长对输入特征进行量化的量化误差;
利用最小的量化误差对应的待定步长,将输入特征量化为位宽小于16比特的整型数据。
可选的,计算模块,在用于将该网络层输出的输出特征量化为位宽小于16比特的整型数据时,具体用于:
获取多个预设的待定步长,其中,待定步长对应的位宽小于16比特;
分别计算利用各待定步长对输出特征进行量化的量化误差;
利用最小的量化误差对应的待定步长,将输出特征量化为位宽小于16比特的整型数据。
可选的,获取模块,具体用于:
获取人脸采集设备采集的人脸图片,或者,获取车辆采集设备采集的车辆图片;
计算模块,具体用于:
将人脸图片输入预先训练的目标深度学习模型,得到人脸图片中的目标人脸特征,或者,将车辆图片输入预先训练的目标深度学习模型,得到车辆图片中的目标车辆特征;
比较模块,具体用于:
将目标人脸特征与预设人脸特征进行比较,得到人脸识别结果,或者,将目标车辆特征与预设车辆特征进行比较,得到车辆识别结果。
第三方面,本申请实施例提供了一种计算机设备,包括处理器和机器可读存储介质,机器可读存储介质存储有能够被处理器执行的机器可执行指令,处理器被机器可执行指令促使:实现本申请实施例第一方面提供的方法。
第四方面,本申请实施例提供了一种机器可读存储介质,存储有机器可执行指令,在被处理器调用和执行时,实现本申请实施例第一方面提供的方法。
第五方面,本申请实施例提供了一种计算机程序产品,用于在运行时执行本申请实施例第一方面提供的方法。
本申请实施例提供的一种目标识别方法及装置,获取待识别图片,将待识别图片输入预先训练的目标深度学习模型,得到待识别图片中的目标特征,将该目标特征与预先标定的目标特征进行比较,得到待识别图片的目标识别结果。其中,目标深度学习模型在对输入的待识别图片进行运算时,针对目标深度学习模型中的每一个网络层,将输入该网络层的输入特征、该网络层的网络权重和该网络层输出的输出特征中的至少一项量化为位宽小于16比特的整型数据。针对目标深度学习模型中的各网络层,输入网络层的输入特征、网络层的网络权重或者网络层输出的输出特征被量化成位宽小于16比特的整型数据,这样,对于每个网络层,参与运算的是位宽较低的低比特整型数据,降低了参与运算的数据的位宽和数据量,能够提升目标深度学习模型进行目 标识别的运算速率,从而提高了目标识别的效率。
附图说明
为了更清楚地说明本申请实施例和现有技术的技术方案,下面对实施例和现有技术中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例的目标识别方法的流程示意图;
图2a为本申请实施例的卷积层的结构示意图;
图2b为本申请实施例的全连接层的结构示意图;
图2c为本申请实施例的池化层的结构示意图;
图2d为本申请实施例的批归一化层的结构示意图;
图2e为本申请实施例的合并层的结构示意图;
图2f为本申请实施例的拼接层的结构示意图;
图3为本申请实施例的目标识别装置的结构示意图;
图4为本申请实施例的计算机设备的结构示意图。
具体实施方式
为使本申请的目的、技术方案、及优点更加清楚明白,以下参照附图并举实施例,对本申请进一步详细说明。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
为了提高目标识别的效率,本申请实施例提供了一种目标识别方法、装置、计算机设备及机器可读存储介质。下面,首先对本申请实施例所提供的目标识别方法进行介绍。
本申请实施例所提供的一种目标识别方法的执行主体可以为具有目标识别功能的计算机设备,例如智能相机、目标识别器等,执行主体中至少包括具有数据处理能力的核心处理芯片。实现本申请实施例所提供的一种目标识别方法的方式可以为设置于执行主体中的软件、硬件电路和逻辑电路的至少一种方式。
如图1所示,为本申请实施例的目标识别方法的流程示意图,该方法可以 包括如下步骤。
S101,获取待识别图片。
待识别图片为包含有需要识别的目标的图片,例如,需要进行人脸目标识别,则待识别图片可以为智能相机对监控区域进行拍摄,在有行人进入监控区域时拍摄到的包含有人脸目标的图片,也可以是用户根据需求输入的包含有人脸目标的图片。本申请实施例所提及的目标不限于人脸目标,还可以为汽车、自行车、建筑等目标。
S102,将待识别图片输入预先训练的目标深度学习模型,得到待识别图片中的目标特征,其中,针对目标深度学习模型中的每一个网络层,将输入该网络层的输入特征、该网络层的网络权重和该网络层输出的输出特征中的至少一项量化为位宽小于16比特的整型数据。
获取到待识别图片后,将待识别图片输入到目标深度学习模型中,目标深度学习模型为深度学习网络模型,例如卷积神经网络模型、循环神经网络模型、长短期记忆网络模型等,经过目标深度学习模型中各网络层的运算,目标深度学习模型可以输出待识别图片中的目标特征。
目标深度学习模型是预先基于训练样本训练得到的,训练样本可以是预先标记了指定目标的样本图片,将训练样本输入到初始的网络模型中,利用BP(Back Propagation,反向传播)算法或者其他模型训练算法,对训练样本进行运算,将运算结果和设置的标称值进行比较,基于比较结果,对网络模型的网络权值进行调整。通过将不同的训练样本依次输入神经网络模型,迭代执行上述步骤,对网络权值的不断地进行调整,网络模型的输出会越来越逼近于标称值,直至网络模型的输出与标称值的差异足够小(小于预设阈值),或者网络模型的输出收敛时,则将最终的网络模型确定为目标深度学习模型。
在利用目标深度学习模型对输入的待识别图片进行运算的过程中,针对目标深度学习模型中的每一个网络层,将输入该网络层的输入特征、该网络层的网络权重和该网络层输出的输出特征中的至少一项量化为位宽小于16比特的整型数据,即将输入该网络层的输入特征、该网络层的网络权重和该网络层输出的输出特征中的至少一项量化为位宽小于16比特的低比特整型数据。 经过这样的量化处理,使得参与运算的数据从单精度浮点数据变为了位宽小于16比特的低比特整型数据,降低了参与运算的数据的位宽和数据量。
在本申请实施例的一种实现方式中,目标深度学习模型中的网络层可以包括:卷积层、全连接层、池化层、批归一化层、合并层和拼接层。
对于深度学习模型而言,具体可以包括如下网络层:Convolution层(卷积层)(如图2a所示)、InnerProduct层(全连接层)(如图2b所示)、Pooling层(池化层)(如图2c所示)、用于调节通道的尺度的BN层(Batch Normalization,批归一化层)(如图2d所示)、用于将两个输入相加合并的Eltwise层(合并层)(如图2e所示)和用于将两个输入拼接的Concat层(拼接层)(如图2f所示),深度学习模型中每个类型的网络层的数目不做限定。图2a至图2f中,I INTn表示n比特的整型输入特征,W INTn表示n比特的整型网络权重,O INTn表示n比特的整型输出特征,I1 INTn和I2 INTn表示2个分支的n比特的整型输入特征,这里的n小于16。一种可实现的方式中,上述各网络层类型中,网络权重、输入特征、输出特征均可以为n比特整型数据。
S103,将该目标特征与预先标定的目标特征进行比较,得到待识别图片的目标识别结果。
在利用目标深度学习模型计算得到待识别图片中的目标特征后,可以将该目标特征与预先标定的目标特征进行比较,通过特征值的一一比对,判断该目标特征是否为标定的目标特征,从而可以得到待识别图片中的目标是否为标定的目标、待识别图片中的目标是标定的目标的可能性有多大、目标在待识别图片中的位置等识别结果。具体的比较过程可以是逐特征点进行比较,判断每个特征点和标定的目标特征中相应的特征点是否相同,如果相同的特征点的数目超过阈值,则认为待识别图片中的目标为标定的目标。
应用本申请实施例,获取待识别图片,将待识别图片输入预先训练的目标深度学习模型,得到待识别图片中的目标特征,将该目标特征与预先标定的目标特征进行比较,得到待识别图片的目标识别结果。其中,目标深度学习模型在对输入的待识别图片进行运算时,针对目标深度学习模型中的每一个网络层,将输入该网络层的输入特征、该网络层的网络权重和该网络层输出的输出特征中的至少一项量化为位宽小于16比特的整型数据。针对目标深 度学习模型中的各网络层,输入网络层的输入特征、网络层的网络权重或者网络层输出的输出特征被量化成位宽小于16比特的整型数据,这样,对于每个网络层,参与运算的是位宽较低的低比特整型数据,降低了参与运算的数据的位宽和数据量,能够提升目标深度学习模型进行目标识别的运算速率,从而提高了目标识别的效率。
基于图1所示实施例,在人脸识别的场景下,人脸识别的执行过程主要包括:获取人脸采集设备采集的人脸图片;将人脸图片输入预先训练的目标深度学习模型,得到人脸图片中的目标人脸特征;将目标人脸特征与预设人脸特征进行比较,得到人脸识别结果。
在车辆识别的场景下,车辆识别的执行过程主要包括:获取车辆采集设备采集的车辆图片;将车辆图片输入预先训练的目标深度学习模型,得到车辆图片中的目标车辆特征;将目标车辆特征与预设车辆特征进行比较,得到车辆识别结果。
在本申请实施例的一种实现方式中,在图1所示实施例中,将网络层的网络权重量化为位宽小于16比特的整型数据的步骤,具体可以为:
针对网络层的各滤波器,读取该滤波器中绝对值最大的网络权重;根据绝对值最大的网络权重及小于16比特的预设位宽,计算该滤波器对应的量化步长;利用量化步长,将该滤波器中的各网络权重量化为预设位宽的整型数据。
一个网络层是由多个filter(滤波器)组成的,一个filter就是一个卷积核,一个filter中包括多个网络权重,针对于每一个filter,可以从该filter中读取出绝对值最大的网络权重W max,预设位宽是期望将网络权重量化到多大位宽,预设位宽小于16比特,根据绝对值最大的网络权重W max和小于16比特的预设位宽bitwidth,可以计算出该filter对应的量化步长step W,具体可以利用公式(1)进行计算:
step W=W max/(2 bitwidth-1)       (1)
在计算得到量化步长step W后,可以利用量化步长step W对滤波器中的各网络权重进行量化,量化得到预设位宽的低比特整型数据。
在本申请实施例的一种实现方式中,在图1所示实施例中,将输入该网络层的输入特征量化为位宽小于16比特的整型数据的步骤,具体可以为:获取多个预设的待定步长,其中,待定步长对应的位宽小于16比特;分别计算利用各待定步长对输入特征进行量化的量化误差;利用最小的量化误差对应的待定步长,将输入特征量化为位宽小于16比特的整型数据。
在本申请实施例的一种实现方式中,在图1所示实施例中,将该网络层输出的输出特征量化为位宽小于16比特的整型数据的步骤,具体可以为:获取多个预设的待定步长,其中,待定步长对应的位宽小于16比特;分别计算利用各待定步长对输出特征进行量化的量化误差;利用最小的量化误差对应的待定步长,将输出特征量化为位宽小于16比特的整型数据。
对于输入特征或者输出特征,可以利用多个预设的待定步长step分别对特征进行量化,可以得到该待定步长step下量化后的特征整型值A q,利用原本的特征浮点值A float减去待定步长step与特征整型值A q的乘积,得到该待定步长step下的量化误差,可以得到多个量化误差,利用公式(2),得到量化误差的最小值,该最小值所对应的待定步长即为输入特征/输出特征的量化步长Step a,利用该量化步长Step a对输入特征/输出特征进行量化,量化得到预设位宽的低比特整型数据。
step a=argmin step||A float-step*A q|| n       (2)
在利用步长,对输入特征、网络权重、输出特征进行量化时,具体可以利用公式(3)计算得到量化后的数据。
Q float=step*Q INTn       (3)
其中,Q float为原始的输入特征、网络权重或输出特征的浮点值,step为量化所使用的步长,Q INTn为量化后的数据。
在对网络层的网络权重进行量化时,量化参数可以是预先计算好的,也可以是在进行量化时实时计算的,不论是预先计算量化参数还是实时计算量化参数,都可以采用上述公式,预先计算量化参数的情况下,计算好的量化参数预先记录在一个缓存中,在进行量化时直接从缓存中读取量化参数进行量化。
在本申请实施例的一种实现方式中,本申请实施例在执行S102之前,还可以执行:对待识别图片进行预处理操作,得到预处理后的待识别图片,其中,预处理操作至少可以包括对待识别图片进行裁剪。
在获取到待识别图片后,由于原始的待识别图片过大、图片质量较差等问题,目标深度学习模型无法对原始的待识别图片直接进行运算,则需要对待识别图片先进行预处理操作,预处理操作至少包括对待识别图片进行裁剪,还可以包括对待识别图像进行灰度化、像素值归一化等操作。经过预处理操作,将待识别图像预处理为更易于目标深度学习模型运算的图片,再将预处理后的待识别图片输入目标深度学习模型,由目标深度学习模型对其进行运算。
相应于上述方法实施例,本申请实施例提供了一种目标识别装置,如图3所示,该装置可以包括:
获取模块310,用于获取待识别图片;
计算模块320,用于将待识别图片输入预先训练的目标深度学习模型,得到待识别图片中的目标特征,其中,针对目标深度学习模型中的每一个网络层,将输入该网络层的输入特征、该网络层的网络权重和该网络层输出的输出特征中的至少一项量化为位宽小于16比特的整型数据;
比较模块330,用于将该目标特征与预先标定的目标特征进行比较,得到待识别图片的目标识别结果。
在本申请实施例的一种实现方式中,该装置还可以包括:
预处理模块,用于对待识别图片进行预处理操作,得到预处理后的待识别图片,其中,预处理操作至少包括对待识别图片进行裁剪。
在本申请实施例的一种实现方式中,目标深度学习模型中的网络层可以包括:卷积层、全连接层、池化层、批归一化层、合并层和拼接层。
在本申请实施例的一种实现方式中,计算模块320,在用于将该网络层的网络权重量化为位宽小于16比特的整型数据时,具体可以用于:针对该网络 层的各滤波器,读取该滤波器中绝对值最大的网络权重;根据绝对值最大的网络权重及小于16比特的预设位宽,计算该滤波器对应的量化步长;利用量化步长,将该滤波器中的各网络权重量化为预设位宽的整型数据。
在本申请实施例的一种实现方式中,计算模块320,在用于将输入该网络层的输入特征量化为位宽小于16比特的整型数据时,具体可以用于:获取多个预设的待定步长,其中,待定步长对应的位宽小于16比特;分别计算利用各待定步长对输入特征进行量化的量化误差;利用最小的量化误差对应的待定步长,将输入特征量化为位宽小于16比特的整型数据。
在本申请实施例的一种实现方式中,计算模块320,在用于将该网络层输出的输出特征量化为位宽小于16比特的整型数据时,具体可以用于:获取多个预设的待定步长,其中,待定步长对应的位宽小于16比特;分别计算利用各待定步长对输出特征进行量化的量化误差;利用最小的量化误差对应的待定步长,将输出特征量化为位宽小于16比特的整型数据。
在本申请实施例的一种实现方式中,获取模块310,具体可以用于:获取人脸采集设备采集的人脸图片,或者,获取车辆采集设备采集的车辆图片;
计算模块320,具体可以用于:将人脸图片输入预先训练的目标深度学习模型,得到人脸图片中的目标人脸特征,或者,将车辆图片输入预先训练的目标深度学习模型,得到车辆图片中的目标车辆特征;
比较模块330,具体可以用于:将目标人脸特征与预设人脸特征进行比较,得到人脸识别结果,或者,将目标车辆特征与预设车辆特征进行比较,得到车辆识别结果。
应用本申请实施例,获取待识别图片,将待识别图片输入预先训练的目标深度学习模型,得到待识别图片中的目标特征,将该目标特征与预先标定的目标特征进行比较,得到待识别图片的目标识别结果。其中,目标深度学习模型在对输入的待识别图片进行运算时,针对目标深度学习模型中的每一个网络层,将输入该网络层的输入特征、该网络层的网络权重和该网络层输出的输出特征中的至少一项量化为位宽小于16比特的整型数据。针对目标深度学习模型中的各网络层,输入网络层的输入特征、网络层的网络权重或者 网络层输出的输出特征被量化成位宽小于16比特的整型数据,这样,对于每个网络层,参与运算的是位宽较低的低比特整型数据,降低了参与运算的数据的位宽和数据量,能够提升目标深度学习模型进行目标识别的运算速率,从而提高了目标识别的效率。
本申请实施例提供了一种计算机设备,如图4所示,可以包括处理器401和机器可读存储介质402,机器可读存储介质402存储有能够被处理器401执行的机器可执行指令,处理器401被机器可执行指令促使:实现如上述目标识别方法的步骤。
上述机器可读存储介质可以包括RAM(Random Access Memory,随机存取存储器),也可以包括NVM(Non-Volatile Memory,非易失性存储器),例如至少一个磁盘存储器。可选的,机器可读存储介质还可以是至少一个位于远离上述处理器的存储装置。
上述处理器可以是通用处理器,包括CPU(Central Processing Unit,中央处理器)、NP(Network Processor,网络处理器)等;还可以是DSP(Digital Signal Processing,数字信号处理器)、ASIC(Application Specific Integrated Circuit,专用集成电路)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。
机器可读存储介质402与处理器401之间可以通过有线连接或者无线连接的方式进行数据传输,并且计算机设备可以通过有线通信接口或者无线通信接口与其他的设备进行通信。图4所示的仅为处理器401与机器可读存储介质402之间通过总线进行数据传输的示例,不作为具体连接方式的限定。
本实施例中,处理器401通过读取机器可读存储介质402中存储的机器可执行指令,并通过运行该机器可执行指令,能够实现:获取待识别图片,将待识别图片输入预先训练的目标深度学习模型,得到待识别图片中的目标特征,将该目标特征与预先标定的目标特征进行比较,得到待识别图片的目标识别结果。其中,目标深度学习模型在对输入的待识别图片进行运算时,针对目标深度学习模型中的每一个网络层,将输入该网络层的输入特征、该网 络层的网络权重和该网络层输出的输出特征中的至少一项量化为位宽小于16比特的整型数据。针对目标深度学习模型中的各网络层,输入网络层的输入特征、网络层的网络权重或者网络层输出的输出特征被量化成位宽小于16比特的整型数据,这样,对于每个网络层,参与运算的是位宽较低的低比特整型数据,降低了参与运算的数据的位宽和数据量,能够提升目标深度学习模型进行目标识别的运算速率,从而提高了目标识别的效率。
本申请实施例还提供了一种机器可读存储介质,存储有机器可执行指令,在被处理器调用和执行时,实现如上述目标识别方法的步骤。
本实施例中,机器可读存储介质存储有在运行时执行本申请实施例所提供的目标识别方法的机器可执行指令,因此能够实现:获取待识别图片,将待识别图片输入预先训练的目标深度学习模型,得到待识别图片中的目标特征,将该目标特征与预先标定的目标特征进行比较,得到待识别图片的目标识别结果。其中,目标深度学习模型在对输入的待识别图片进行运算时,针对目标深度学习模型中的每一个网络层,将输入该网络层的输入特征、该网络层的网络权重和该网络层输出的输出特征中的至少一项量化为位宽小于16比特的整型数据。针对目标深度学习模型中的各网络层,输入网络层的输入特征、网络层的网络权重或者网络层输出的输出特征被量化成位宽小于16比特的整型数据,这样,对于每个网络层,参与运算的是位宽较低的低比特整型数据,降低了参与运算的数据的位宽和数据量,能够提升目标深度学习模型进行目标识别的运算速率,从而提高了目标识别的效率。
本申请实施例还提供一种计算机程序产品,用于在运行时执行上述目标识别方法的步骤。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在机器可读存储介质中,或者从一个机器可读存储介质向另一个机器可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同 轴电缆、光纤、DSL(Digital Subscriber Line,数字用户线))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述机器机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如软盘、硬盘、磁带)、光介质(例如DVD(Digital Versatile Disc,数字多功能光盘))、或者半导体介质(例如SSD(Solid State Disk,固态硬盘))等。
对于装置、计算机设备、机器可读存储介质和计算机程序产品实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
本领域普通技术人员可以理解实现上述方法实施方式中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,所述的程序可以存储于计算机可读取存储介质中,这里所称得的存储介质,如:ROM/RAM、磁碟、光盘等。
以上所述仅为本申请的较佳实施例,并非用于限定本申请的保护范围。凡在本申请的精神和原则之内所作的任何修改、等同替换、改进等,均包含在本申请的保护范围内。

Claims (17)

  1. 一种目标识别方法,其特征在于,所述方法包括:
    获取待识别图片;
    将所述待识别图片输入预先训练的目标深度学习模型,得到所述待识别图片中的目标特征,其中,针对所述目标深度学习模型中的每一个网络层,将输入该网络层的输入特征、该网络层的网络权重和该网络层输出的输出特征中的至少一项量化为位宽小于16比特的整型数据;
    将所述目标特征与预先标定的目标特征进行比较,得到所述待识别图片的目标识别结果。
  2. 根据权利要求1所述的方法,其特征在于,在所述将所述待识别图片输入预先训练的目标深度学习模型,得到所述待识别图片中的目标特征之前,所述方法还包括:
    对所述待识别图片进行预处理操作,得到预处理后的所述待识别图片,所述预处理操作至少包括对所述待识别图片进行裁剪。
  3. 根据权利要求1所述的方法,其特征在于,所述目标深度学习模型中的网络层包括:卷积层、全连接层、池化层、批归一化层、合并层和拼接层。
  4. 根据权利要求1所述的方法,其特征在于,所述将该网络层的网络权重量化为位宽小于16比特的整型数据,包括:
    针对该网络层的各滤波器,读取该滤波器中绝对值最大的网络权重;
    根据所述绝对值最大的网络权重及小于16比特的预设位宽,计算该滤波器对应的量化步长;
    利用所述量化步长,将该滤波器中的各网络权重量化为所述预设位宽的整型数据。
  5. 根据权利要求1所述的方法,其特征在于,所述将输入该网络层的输入特征量化为位宽小于16比特的整型数据,包括:
    获取多个预设的待定步长,所述待定步长对应的位宽小于16比特;
    分别计算利用各待定步长对所述输入特征进行量化的量化误差;
    利用最小的量化误差对应的待定步长,将所述输入特征量化为位宽小于16比特的整型数据。
  6. 根据权利要求1所述的方法,其特征在于,所述将该网络层输出的输出特征量化为位宽小于16比特的整型数据,包括:
    获取多个预设的待定步长,所述待定步长对应的位宽小于16比特;
    分别计算利用各待定步长对所述输出特征进行量化的量化误差;
    利用最小的量化误差对应的待定步长,将所述输出特征量化为位宽小于16比特的整型数据。
  7. 根据权利要求1所述的方法,其特征在于,所述获取待识别图片,包括:
    获取人脸采集设备采集的人脸图片,或者,获取车辆采集设备采集的车辆图片;
    所述将所述待识别图片输入预先训练的目标深度学习模型,得到所述待识别图片中的目标特征,包括:
    将所述人脸图片输入预先训练的目标深度学习模型,得到所述人脸图片中的目标人脸特征,或者,将所述车辆图片输入预先训练的目标深度学习模型,得到所述车辆图片中的目标车辆特征;
    所述将所述目标特征与预先标定的目标特征进行比较,得到所述待识别图片的目标识别结果,包括:
    将所述目标人脸特征与预设人脸特征进行比较,得到人脸识别结果,或者,将所述目标车辆特征与预设车辆特征进行比较,得到车辆识别结果。
  8. 一种目标识别装置,其特征在于,所述装置包括:
    获取模块,用于获取待识别图片;
    计算模块,用于将所述待识别图片输入预先训练的目标深度学习模型,得到所述待识别图片中的目标特征,其中,针对所述目标深度学习模型中的每一个网络层,将输入该网络层的输入特征、该网络层的网络权重和该网络层输出的输出特征中的至少一项量化为位宽小于16比特的整型数据;
    比较模块,用于将所述目标特征与预先标定的目标特征进行比较,得到所述待识别图片的目标识别结果。
  9. 根据权利要求8所述的装置,其特征在于,所述装置还包括:
    预处理模块,用于对所述待识别图片进行预处理操作,得到预处理后的所述待识别图片,所述预处理操作至少包括对所述待识别图片进行裁剪。
  10. 根据权利要求8所述的装置,其特征在于,所述目标深度学习模型中的网络层包括:卷积层、全连接层、池化层、批归一化层、合并层和拼接层。
  11. 根据权利要求8所述的装置,其特征在于,所述计算模块,在用于所述将该网络层的网络权重量化为位宽小于16比特的整型数据时,具体用于:
    针对该网络层的各滤波器,读取该滤波器中绝对值最大的网络权重;
    根据所述绝对值最大的网络权重及小于16比特的预设位宽,计算该滤波器对应的量化步长;
    利用所述量化步长,将该滤波器中的各网络权重量化为所述预设位宽的整型数据。
  12. 根据权利要求8所述的装置,其特征在于,所述计算模块,在用于所述将输入该网络层的输入特征量化为位宽小于16比特的整型数据时,具体用于:
    获取多个预设的待定步长,所述待定步长对应的位宽小于16比特;
    分别计算利用各待定步长对所述输入特征进行量化的量化误差;
    利用最小的量化误差对应的待定步长,将所述输入特征量化为位宽小于16比特的整型数据。
  13. 根据权利要求8所述的装置,其特征在于,所述计算模块,在用于所述将该网络层输出的输出特征量化为位宽小于16比特的整型数据时,具体用于:
    获取多个预设的待定步长,所述待定步长对应的位宽小于16比特;
    分别计算利用各待定步长对所述输出特征进行量化的量化误差;
    利用最小的量化误差对应的待定步长,将所述输出特征量化为位宽小于16比特的整型数据。
  14. 根据权利要求8所述的装置,其特征在于,所述获取模块,具体用于:
    获取人脸采集设备采集的人脸图片,或者,获取车辆采集设备采集的车辆图片;
    所述计算模块,具体用于:
    将所述人脸图片输入预先训练的目标深度学习模型,得到所述人脸图片中的目标人脸特征,或者,将所述车辆图片输入预先训练的目标深度学习模型,得到所述车辆图片中的目标车辆特征;
    所述比较模块,具体用于:
    将所述目标人脸特征与预设人脸特征进行比较,得到人脸识别结果,或 者,将所述目标车辆特征与预设车辆特征进行比较,得到车辆识别结果。
  15. 一种计算机设备,其特征在于,包括处理器和机器可读存储介质,所述机器可读存储介质存储有能够被所述处理器执行的机器可执行指令,所述处理器被所述机器可执行指令促使:实现权利要求1至7中任意一项所述的方法。
  16. 一种机器可读存储介质,其特征在于,所述机器可读存储介质内存储有机器可执行指令,在被处理器调用和执行时,实现权利要求1至7中任意一项所述的方法。
  17. 一种计算机程序产品,其特征在于,用于在运行时执行:权利要求1至7中任意一项所述的方法。
PCT/CN2020/128171 2019-11-13 2020-11-11 一种目标识别方法及装置 WO2021093780A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911108141.4A CN112800813B (zh) 2019-11-13 2019-11-13 一种目标识别方法及装置
CN201911108141.4 2019-11-13

Publications (1)

Publication Number Publication Date
WO2021093780A1 true WO2021093780A1 (zh) 2021-05-20

Family

ID=75803382

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/128171 WO2021093780A1 (zh) 2019-11-13 2020-11-11 一种目标识别方法及装置

Country Status (2)

Country Link
CN (1) CN112800813B (zh)
WO (1) WO2021093780A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114140754A (zh) * 2021-11-30 2022-03-04 北京比特易湃信息技术有限公司 一种基于深度学习的改装车识别方法

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113408704A (zh) * 2021-06-29 2021-09-17 深圳市商汤科技有限公司 数据处理方法、装置、设备及计算机可读存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104992167A (zh) * 2015-07-28 2015-10-21 中国科学院自动化研究所 一种基于卷积神经网络的人脸检测方法及装置
US20180018555A1 (en) * 2016-07-15 2018-01-18 Alexander Sheung Lai Wong System and method for building artificial neural network architectures
CN110245577A (zh) * 2019-05-23 2019-09-17 复钧智能科技(苏州)有限公司 目标车辆识别方法、装置及车辆实时监控系统
CN110309692A (zh) * 2018-03-27 2019-10-08 杭州海康威视数字技术股份有限公司 人脸识别方法、装置及系统、模型训练方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104992167A (zh) * 2015-07-28 2015-10-21 中国科学院自动化研究所 一种基于卷积神经网络的人脸检测方法及装置
US20180018555A1 (en) * 2016-07-15 2018-01-18 Alexander Sheung Lai Wong System and method for building artificial neural network architectures
CN110309692A (zh) * 2018-03-27 2019-10-08 杭州海康威视数字技术股份有限公司 人脸识别方法、装置及系统、模型训练方法及装置
CN110245577A (zh) * 2019-05-23 2019-09-17 复钧智能科技(苏州)有限公司 目标车辆识别方法、装置及车辆实时监控系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JACOB BENOIT; KLIGYS SKIRMANTAS; CHEN BO; ZHU MENGLONG; TANG MATTHEW; HOWARD ANDREW; ADAM HARTWIG; KALENICHENKO DMITRY: "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference", 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, IEEE, 18 June 2018 (2018-06-18), pages 2704 - 2713, XP033476237, DOI: 10.1109/CVPR.2018.00286 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114140754A (zh) * 2021-11-30 2022-03-04 北京比特易湃信息技术有限公司 一种基于深度学习的改装车识别方法

Also Published As

Publication number Publication date
CN112800813A (zh) 2021-05-14
CN112800813B (zh) 2024-06-07

Similar Documents

Publication Publication Date Title
US11373087B2 (en) Method and apparatus for generating fixed-point type neural network
TWI682325B (zh) 辨識系統及辨識方法
CN110799994B (zh) 神经网络的自适应位宽缩减
CN108197652B (zh) 用于生成信息的方法和装置
US10726573B2 (en) Object detection method and system based on machine learning
KR20190125141A (ko) 뉴럴 네트워크의 파라미터들을 양자화하는 방법 및 장치
WO2022078002A1 (zh) 一种图像处理方法、装置、设备及可读存储介质
WO2021093780A1 (zh) 一种目标识别方法及装置
WO2019062721A1 (zh) 语音身份特征提取器、分类器训练方法及相关设备
US11741708B2 (en) Image recognition method and system based on deep learning
CN108962231B (zh) 一种语音分类方法、装置、服务器及存储介质
CN106778910B (zh) 基于本地训练的深度学习系统和方法
CN110941964A (zh) 双语语料筛选方法、装置及存储介质
KR20210083935A (ko) 뉴럴 네트워크의 파라미터들을 양자화하는 방법 및 장치
CN109378014A (zh) 一种基于卷积神经网络的移动设备源识别方法及系统
KR20220130565A (ko) 키워드 검출 방법 및 장치
US20220366262A1 (en) Method and apparatus for training neural network model
WO2019091401A1 (zh) 深度神经网络的网络模型压缩方法、装置及计算机设备
WO2022213825A1 (zh) 基于神经网络的端到端语音增强方法、装置
WO2021037174A1 (zh) 一种神经网络模型训练方法及装置
US20230078246A1 (en) Centralized Management of Distributed Data Sources
WO2022246986A1 (zh) 数据处理方法、装置、设备及计算机可读存储介质
CN108847251B (zh) 一种语音去重方法、装置、服务器及存储介质
CN116992946B (zh) 模型压缩方法、装置、存储介质和程序产品
CN117173269A (zh) 一种人脸图像生成方法、装置、电子设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20886779

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20886779

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 19.05.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20886779

Country of ref document: EP

Kind code of ref document: A1