WO2021093780A1 - 一种目标识别方法及装置 - Google Patents
一种目标识别方法及装置 Download PDFInfo
- Publication number
- WO2021093780A1 WO2021093780A1 PCT/CN2020/128171 CN2020128171W WO2021093780A1 WO 2021093780 A1 WO2021093780 A1 WO 2021093780A1 CN 2020128171 W CN2020128171 W CN 2020128171W WO 2021093780 A1 WO2021093780 A1 WO 2021093780A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- target
- picture
- feature
- recognized
- bit width
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 238000013136 deep learning model Methods 0.000 claims abstract description 81
- 238000013139 quantization Methods 0.000 claims description 58
- 238000004364 calculation method Methods 0.000 claims description 34
- 238000007781 pre-processing Methods 0.000 claims description 19
- 238000010606 normalization Methods 0.000 claims description 9
- 238000011176 pooling Methods 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 7
- 238000010586 diagram Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 8
- 238000012549 training Methods 0.000 description 6
- 238000012545 processing Methods 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 3
- 238000007667 floating Methods 0.000 description 3
- 238000003062 neural network model Methods 0.000 description 3
- 210000004556 brain Anatomy 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000015654 memory Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 230000006403 short-term memory Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000007787 long-term memory Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Definitions
- This application relates to the field of machine learning technology, and in particular to a target recognition method and device.
- Deep neural network as an emerging field in machine learning research, analyzes data by imitating the mechanism of the human brain. It is an intelligent model for analyzing and learning by establishing and simulating the human brain.
- deep learning models such as convolutional neural network models, recurrent neural network models, and long- and short-term memory network models, have become mainstream application methods in image classification, target detection, and speech recognition.
- the picture to be recognized is input into the trained target deep learning model, and the network layers in the target deep learning model are calculated, and the target in the picture to be recognized can be identified based on the result of the calculation.
- the data involved in the operation is single-precision floating-point data. Because single-precision floating-point data has a higher bit width, the amount of data involved in the operation is large, which leads to the target recognition. The efficiency is low.
- the purpose of the embodiments of the present application is to provide a target recognition method and device to improve the efficiency of target recognition.
- the specific technical solutions are as follows:
- an embodiment of the present application provides a target recognition method, which includes:
- the input characteristics of the network layer and the network of the network layer will be input
- At least one of the weight and the output feature output by the network layer is quantized into integer data with a bit width less than 16 bits;
- the target feature is compared with the pre-calibrated target feature to obtain the target recognition result of the picture to be recognized.
- the method before the step of inputting the picture to be recognized into a pre-trained target deep learning model to obtain target features in the picture to be recognized, the method further includes:
- a preprocessing operation is performed on the picture to be recognized to obtain a preprocessed picture to be recognized, wherein the preprocessing operation includes at least cropping the picture to be recognized.
- the network layer in the target deep learning model includes: a convolutional layer, a fully connected layer, a pooling layer, a batch normalization layer, a merging layer, and a splicing layer.
- the step of converting the network weight of the network layer into integer data with a bit width less than 16 bits includes:
- each network weight in the filter is weighted into integer data with a preset bit width.
- the step of quantizing the input features input to the network layer into integer data with a bit width less than 16 bits includes:
- the input feature is quantized into integer data with a bit width less than 16 bits.
- the step of quantizing the output feature output by the network layer into integer data with a bit width less than 16 bits includes:
- the output feature is quantized into integer data with a bit width less than 16 bits.
- the step of obtaining the picture to be recognized includes:
- the step of inputting the picture to be recognized into the pre-trained target deep learning model to obtain the target feature in the picture to be recognized includes:
- the steps of comparing the target feature with the pre-calibrated target feature to obtain the target identification result of the picture to be recognized include:
- the target face feature is compared with the preset face feature to obtain the face recognition result, or the target vehicle feature is compared with the preset vehicle feature to obtain the vehicle recognition result.
- an embodiment of the present application provides a target recognition device, which includes:
- the obtaining module is used to obtain the picture to be recognized
- the calculation module is used to input the picture to be recognized into the pre-trained target deep learning model to obtain the target features in the picture to be recognized. For each network layer in the target deep learning model, the input characteristics of the network layer will be input, At least one of the network weight of the network layer and the output feature output by the network layer is quantized as integer data with a bit width less than 16 bits;
- the comparison module is used to compare the target feature with the pre-calibrated target feature to obtain the target recognition result of the picture to be recognized.
- the device further includes:
- the preprocessing module is used to perform a preprocessing operation on the picture to be recognized to obtain the preprocessed picture to be recognized, wherein the preprocessing operation includes at least cropping the picture to be recognized.
- the network layer in the target deep learning model includes: a convolutional layer, a fully connected layer, a pooling layer, a batch normalization layer, a merging layer, and a splicing layer.
- the calculation module when used to quantify the network weight of the network layer into integer data with a bit width less than 16 bits, it is specifically used for:
- each network weight in the filter is weighted into integer data with a preset bit width.
- calculation module when used to quantify the input features input to the network layer into integer data with a bit width less than 16 bits, it is specifically used to:
- the input feature is quantized into integer data with a bit width less than 16 bits.
- calculation module when used to quantify the output feature output by the network layer into integer data with a bit width less than 16 bits, it is specifically used to:
- the output feature is quantized into integer data with a bit width less than 16 bits.
- Calculation module specifically used for:
- Comparison module specifically used for:
- the target face feature is compared with the preset face feature to obtain the face recognition result, or the target vehicle feature is compared with the preset vehicle feature to obtain the vehicle recognition result.
- an embodiment of the present application provides a computer device, including a processor and a machine-readable storage medium.
- the machine-readable storage medium stores machine-executable instructions that can be executed by the processor, and the processor is executed by the machine-executable instructions.
- Prompt to implement the method provided in the first aspect of the embodiments of the present application.
- an embodiment of the present application provides a machine-readable storage medium that stores machine-executable instructions that, when called and executed by a processor, implement the method provided in the first aspect of the embodiments of the present application.
- an embodiment of the present application provides a computer program product for executing the method provided in the first aspect of the embodiment of the present application at runtime.
- the target recognition method and device obtained by the embodiments of the present application obtain a picture to be recognized, and input the picture to be recognized into a pre-trained target deep learning model to obtain the target feature in the picture to be recognized, and compare the target feature with a pre-calibrated target The features are compared, and the target recognition result of the picture to be recognized is obtained.
- the target deep learning model performs operations on the input image to be recognized, for each network layer in the target deep learning model, it will input the input characteristics of the network layer, the network weight of the network layer, and the output of the network layer. At least one item of the output characteristics is quantized into integer data with a bit width less than 16 bits.
- the input characteristics of the input network layer, the network weight of the network layer, or the output characteristics of the network layer output are quantized into integer data with a bit width less than 16 bits.
- FIG. 1 is a schematic flowchart of a target recognition method according to an embodiment of the application
- 2a is a schematic diagram of the structure of a convolutional layer according to an embodiment of the application.
- 2b is a schematic diagram of the structure of a fully connected layer according to an embodiment of the application.
- FIG. 2c is a schematic structural diagram of a pooling layer according to an embodiment of the application.
- 2d is a schematic diagram of the structure of a batch normalization layer according to an embodiment of the application.
- FIG. 2e is a schematic diagram of the structure of a merge layer in an embodiment of the application.
- 2f is a schematic diagram of the structure of the splicing layer in an embodiment of the application.
- FIG. 3 is a schematic structural diagram of a target recognition device according to an embodiment of the application.
- Fig. 4 is a schematic structural diagram of a computer device according to an embodiment of the application.
- embodiments of the present application provide a target recognition method, device, computer equipment, and machine-readable storage medium.
- the target recognition method provided by the embodiment of the present application will be introduced first.
- the execution subject of the target recognition method provided by the embodiments of the present application may be a computer device with a target recognition function, such as a smart camera, a target recognizer, etc., and the execution subject includes at least a core processing chip with data processing capabilities.
- the method for implementing the target recognition method provided by the embodiment of the present application may be at least one of software, hardware circuit, and logic circuit provided in the execution subject.
- Fig. 1 it is a schematic flow chart of the target recognition method according to the embodiment of this application.
- the method may include the following steps.
- the picture to be recognized is a picture that contains a target that needs to be recognized.
- the picture to be recognized can be a smart camera to shoot the surveillance area.
- a pedestrian enters the surveillance area it contains a face target.
- the picture can also be a picture that contains a human face target entered by the user according to requirements.
- the targets mentioned in the embodiments of the present application are not limited to face targets, and may also be targets such as cars, bicycles, and buildings.
- the target deep learning model is a deep learning network model, such as convolutional neural network model, recurrent neural network model, long-term short-term memory network model, etc., after the target The operation of each network layer in the deep learning model, the target deep learning model can output the target features in the picture to be recognized.
- the target deep learning model is pre-trained based on training samples.
- the training samples can be sample pictures with specified targets pre-marked.
- Input the training samples into the initial network model using BP (Back Propagation) algorithm or other
- the model training algorithm performs calculations on the training samples, compares the calculation results with the set nominal values, and adjusts the network weights of the network models based on the comparison results.
- the input characteristics of the network layer, the network weight of the network layer, and the output of the network layer At least one of the output features of is quantized as integer data with a bit width less than 16 bits, that is, at least one of the input features of the input network layer, the network weight of the network layer, and the output feature output of the network layer is quantified as Low-bit integer data with a bit width less than 16 bits.
- the data involved in the operation changes from single-precision floating point data to low-bit integer data with a bit width less than 16 bits, which reduces the bit width and data volume of the data involved in the operation.
- the network layer in the target deep learning model may include: a convolutional layer, a fully connected layer, a pooling layer, a batch normalization layer, a merging layer, and a splicing layer.
- the deep learning model can specifically include the following network layers: Convolution layer (convolutional layer) (as shown in Figure 2a), InnerProduct layer (fully connected layer) (as shown in Figure 2b), Pooling layer (pooling layer) ) (Shown in Figure 2c), the BN layer (Batch Normalization) used to adjust the scale of the channel (shown in Figure 2d), and the Eltwise layer used to add and merge the two inputs (combined Layer) (as shown in Fig. 2e) and Concat layer (splicing layer) for splicing two inputs (as shown in Fig. 2f), the number of each type of network layer in the deep learning model is not limited.
- I INTn represents an n-bit integer input feature
- W INTn represents an n-bit integer network weight
- O INTn represents an n-bit integer output feature
- I1 INTn and I2 INTn represent two branches n-bit integer input feature, where n is less than 16.
- the network weight, input feature, and output feature may all be n-bit integer data.
- the target feature can be compared with the pre-calibrated target feature, and the feature value is compared one by one to determine whether the target feature is the calibrated target feature In this way, it is possible to obtain the recognition results such as whether the target in the picture to be recognized is a calibrated target, how likely the target in the picture to be recognized is the calibrated target, and the position of the target in the picture to be recognized.
- the specific comparison process can be to compare feature points by feature point to determine whether each feature point is the same as the corresponding feature point in the calibrated target feature. If the number of the same feature points exceeds the threshold, the target in the image to be identified is considered to be calibrated The goal.
- the target deep learning model performs operations on the input image to be recognized, for each network layer in the target deep learning model, it will input the input characteristics of the network layer, the network weight of the network layer, and the output of the network layer. At least one item of the output characteristics is quantized into integer data with a bit width less than 16 bits.
- the input characteristics of the input network layer, the network weight of the network layer, or the output characteristics of the network layer output are quantized into integer data with a bit width less than 16 bits.
- the execution process of face recognition mainly includes: obtaining a face picture collected by a face collection device; inputting the face picture into a pre-trained target deep learning model to obtain The target face features in the face picture; compare the target face features with the preset face features to obtain the face recognition result.
- the execution process of vehicle recognition mainly includes: obtaining vehicle pictures collected by vehicle collection equipment; inputting the vehicle pictures into the pre-trained target deep learning model to obtain the target vehicle characteristics in the vehicle pictures; combining the target vehicle characteristics with The preset vehicle features are compared, and the vehicle recognition result is obtained.
- the step of converting the network weight of the network layer into integer data with a bit width less than 16 bits may specifically be:
- each filter in the network layer For each filter in the network layer, read the network weight with the largest absolute value in the filter; calculate the quantization step size corresponding to the filter according to the network weight with the largest absolute value and the preset bit width less than 16 bits; use quantization Step size, weighting each network weight in the filter into integer data with a preset bit width.
- a network layer is composed of multiple filters (filters).
- a filter is a convolution kernel.
- a filter includes multiple network weights. For each filter, the largest absolute value can be read from the filter.
- the network weight W max the preset bit width is the desired bit width of the network weight.
- the preset bit width is less than 16 bits. It can be calculated based on the network weight W max with the largest absolute value and the preset bit width less than 16 bits.
- the quantization step size step W corresponding to the filter can be calculated by using formula (1):
- step W W max /(2 bitwidth -1) (1)
- the quantization step size can be utilized for each step W weighting filters in the network re-quantizes the quantized low bit integer data of a predetermined bit width.
- the step of quantizing the input feature input to the network layer into integer data with a bit width less than 16 bits may specifically be: obtaining multiple The preset undetermined step size, where the bit width corresponding to the undetermined step size is less than 16 bits; respectively calculate the quantization error for quantizing the input feature with each undetermined step size; use the undetermined step size corresponding to the smallest quantization error to convert the input feature
- the quantization is integer data with a bit width less than 16 bits.
- the step of quantizing the output feature output by the network layer into integer data with a bit width less than 16 bits may specifically be: obtaining multiple The preset undetermined step size, where the bit width corresponding to the undetermined step size is less than 16 bits; the quantization error for quantizing the output feature with each undetermined step size is calculated separately; the undetermined step size corresponding to the smallest quantization error is used to output the feature
- the quantization is integer data with a bit width less than 16 bits.
- a plurality of preset undetermined step size steps can be used to quantize the features respectively, and the quantized feature integer value A q under the undetermined step size step can be obtained, and the original feature floating point value A can be used float minus the product of the undetermined step size step and the characteristic integer value A q to obtain the quantization error under the undetermined step size step.
- Multiple quantization errors can be obtained.
- the minimum value of the quantization error corresponding to the value determined step input is the feature / output characteristic of the quantization step step a, using the quantization step step a feature of the input / output characteristics of the quantization, quantization preset low bit integer data bits wide.
- step a argmin step
- the quantized data can be specifically calculated using formula (3).
- Q float is the floating point value of the original input feature, network weight or output feature
- step is the step size used for quantization
- Q INTn is the quantized data.
- the quantization parameter When quantizing the network weight of the network layer, the quantization parameter can be pre-calculated or calculated in real time during quantization.
- the above formula can be used whether it is pre-calculated or real-time calculation of quantization parameters.
- the calculated quantization parameter In the case of calculating the quantization parameter, the calculated quantization parameter is pre-recorded in a buffer, and the quantization parameter is directly read from the buffer for quantization during quantization.
- the embodiment of the present application may also perform: perform a preprocessing operation on the picture to be recognized to obtain the preprocessed picture to be recognized, wherein the preprocessing operation may at least include Crop the picture to be recognized.
- the preprocessing operation includes at least cropping the image to be recognized, and may also include operations such as graying and pixel value normalization on the image to be recognized.
- an embodiment of the present application provides a target recognition device.
- the device may include:
- the obtaining module 310 is used to obtain the picture to be recognized
- the calculation module 320 is used to input the picture to be recognized into the pre-trained target deep learning model to obtain the target features in the picture to be recognized. For each network layer in the target deep learning model, the input characteristics of the network layer will be input , At least one of the network weight of the network layer and the output feature output by the network layer is quantized as integer data with a bit width less than 16 bits;
- the comparison module 330 is configured to compare the target feature with the pre-calibrated target feature to obtain the target recognition result of the picture to be recognized.
- the device may further include:
- the preprocessing module is used to perform a preprocessing operation on the picture to be recognized to obtain the preprocessed picture to be recognized, wherein the preprocessing operation includes at least cropping the picture to be recognized.
- the network layer in the target deep learning model may include: a convolutional layer, a fully connected layer, a pooling layer, a batch normalization layer, a merging layer, and a splicing layer.
- the calculation module 320 when used to quantify the network weight of the network layer into integer data with a bit width less than 16 bits, it can be specifically used to: Filter, read the network weight with the largest absolute value in the filter; calculate the quantization step size corresponding to the filter according to the network weight with the largest absolute value and the preset bit width less than 16 bits; use the quantization step size to Each network weight in the filter is converted into integer data with a preset bit width.
- the calculation module 320 when used to quantify the input feature input to the network layer into integer data with a bit width less than 16 bits, it can be specifically used to: obtain multiple presets The undetermined step size of, where the bit width corresponding to the undetermined step size is less than 16 bits; respectively calculate the quantization error for quantizing the input feature using each undetermined step size; use the undetermined step size corresponding to the smallest quantization error to quantize the input feature Integer data with a bit width less than 16 bits.
- the calculation module 320 when used to quantify the output feature output by the network layer into integer data with a bit width less than 16 bits, it can be specifically used to: obtain multiple presets The undetermined step size of, where the bit width corresponding to the undetermined step size is less than 16 bits; respectively calculate the quantization error for quantizing the output feature with each undetermined step size; use the undetermined step size corresponding to the smallest quantization error to quantize the output feature Integer data with a bit width less than 16 bits.
- the obtaining module 310 may be specifically used to: obtain a face picture collected by a face collection device, or obtain a vehicle picture collected by a vehicle collection device;
- the calculation module 320 can be specifically used to: input the face image into the pre-trained target deep learning model to obtain the target face features in the face image, or input the vehicle image into the pre-trained target deep learning model to obtain the vehicle image Characteristics of the target vehicle in
- the comparison module 330 may be specifically used to compare the target face feature with a preset face feature to obtain a face recognition result, or compare the target vehicle feature with a preset vehicle feature to obtain a vehicle recognition result.
- the target deep learning model performs operations on the input image to be recognized, for each network layer in the target deep learning model, it will input the input characteristics of the network layer, the network weight of the network layer, and the output of the network layer. At least one item of the output characteristics is quantized into integer data with a bit width less than 16 bits.
- the input characteristics of the input network layer, the network weight of the network layer, or the output characteristics of the network layer output are quantized into integer data with a bit width less than 16 bits.
- An embodiment of the present application provides a computer device. As shown in FIG. 4, it may include a processor 401 and a machine-readable storage medium 402.
- the machine-readable storage medium 402 stores machine executable instructions that can be executed by the processor 401.
- the processor 401 is prompted by machine-executable instructions to implement the steps of the above-mentioned target recognition method.
- the above-mentioned machine-readable storage medium may include RAM (Random Access Memory, random access memory), and may also include NVM (Non-Volatile Memory, non-volatile memory), for example, at least one disk storage.
- NVM Non-Volatile Memory, non-volatile memory
- the machine-readable storage medium may also be at least one storage device located far away from the foregoing processor.
- the above-mentioned processor may be a general-purpose processor, including CPU (Central Processing Unit), NP (Network Processor, network processor), etc.; it may also be DSP (Digital Signal Processing, digital signal processor), ASIC (Application Specific Integrated Circuit, FPGA (Field-Programmable Gate Array, Field Programmable Gate Array) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
- CPU Central Processing Unit
- NP Network Processor, network processor
- DSP Digital Signal Processing, digital signal processor
- ASIC Application Specific Integrated Circuit
- FPGA Field-Programmable Gate Array, Field Programmable Gate Array
- other programmable logic devices discrete gates or transistor logic devices, discrete hardware components.
- the machine-readable storage medium 402 and the processor 401 may perform data transmission through a wired connection or a wireless connection, and the computer device may communicate with other devices through a wired communication interface or a wireless communication interface. What is shown in FIG. 4 is only an example of data transmission between the processor 401 and the machine-readable storage medium 402 via a bus, and is not intended to limit the specific connection manner.
- the processor 401 reads the machine-executable instructions stored in the machine-readable storage medium 402 and runs the machine-executable instructions to achieve: obtain the picture to be recognized, and input the picture to be recognized into the pre-trained
- the target deep learning model obtains the target feature in the picture to be recognized, compares the target feature with the pre-calibrated target feature, and obtains the target recognition result of the picture to be recognized.
- the target deep learning model performs operations on the input image to be recognized, for each network layer in the target deep learning model, it will input the input characteristics of the network layer, the network weight of the network layer, and the output of the network layer. At least one item of the output characteristics is quantized into integer data with a bit width less than 16 bits.
- the input characteristics of the input network layer, the network weight of the network layer, or the output characteristics of the network layer output are quantized into integer data with a bit width less than 16 bits.
- the embodiment of the present application also provides a machine-readable storage medium that stores machine-executable instructions, which, when called and executed by a processor, implement the steps of the above-mentioned target identification method.
- the machine-readable storage medium stores machine executable instructions that execute the target recognition method provided by the embodiment of this application at runtime, so it can be achieved: to obtain a picture to be recognized, and input the picture to be recognized into a pre-trained target
- the deep learning model obtains the target feature in the picture to be recognized, compares the target feature with the pre-calibrated target feature, and obtains the target recognition result of the picture to be recognized.
- the target deep learning model calculates the input image to be recognized, for each network layer in the target deep learning model, it will input the input characteristics of the network layer, the network weight of the network layer, and the output of the network layer. At least one of the output characteristics is quantized into integer data with a bit width less than 16 bits.
- the input characteristics of the input network layer, the network weight of the network layer, or the output characteristics of the network layer output are quantized into integer data with a bit width less than 16 bits.
- the embodiment of the present application also provides a computer program product, which is used to execute the steps of the above-mentioned target recognition method at runtime.
- the computer may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
- software it can be implemented in the form of a computer program product in whole or in part.
- the computer program product includes one or more computer instructions.
- the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
- the computer instructions may be stored in a machine-readable storage medium, or transmitted from one machine-readable storage medium to another machine-readable storage medium. For example, the computer instructions may be sent from a website, computer, server, or data center.
- the machine-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or data center integrated with one or more available media.
- the usable medium may be a magnetic medium (such as a floppy disk, a hard disk, a magnetic tape), an optical medium (such as a DVD (Digital Versatile Disc)), or a semiconductor medium (such as an SSD (Solid State Disk)), etc. .
- the program can be stored in a computer readable storage medium, which is referred to herein as Storage media, such as ROM/RAM, magnetic disks, optical disks, etc.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (17)
- 一种目标识别方法,其特征在于,所述方法包括:获取待识别图片;将所述待识别图片输入预先训练的目标深度学习模型,得到所述待识别图片中的目标特征,其中,针对所述目标深度学习模型中的每一个网络层,将输入该网络层的输入特征、该网络层的网络权重和该网络层输出的输出特征中的至少一项量化为位宽小于16比特的整型数据;将所述目标特征与预先标定的目标特征进行比较,得到所述待识别图片的目标识别结果。
- 根据权利要求1所述的方法,其特征在于,在所述将所述待识别图片输入预先训练的目标深度学习模型,得到所述待识别图片中的目标特征之前,所述方法还包括:对所述待识别图片进行预处理操作,得到预处理后的所述待识别图片,所述预处理操作至少包括对所述待识别图片进行裁剪。
- 根据权利要求1所述的方法,其特征在于,所述目标深度学习模型中的网络层包括:卷积层、全连接层、池化层、批归一化层、合并层和拼接层。
- 根据权利要求1所述的方法,其特征在于,所述将该网络层的网络权重量化为位宽小于16比特的整型数据,包括:针对该网络层的各滤波器,读取该滤波器中绝对值最大的网络权重;根据所述绝对值最大的网络权重及小于16比特的预设位宽,计算该滤波器对应的量化步长;利用所述量化步长,将该滤波器中的各网络权重量化为所述预设位宽的整型数据。
- 根据权利要求1所述的方法,其特征在于,所述将输入该网络层的输入特征量化为位宽小于16比特的整型数据,包括:获取多个预设的待定步长,所述待定步长对应的位宽小于16比特;分别计算利用各待定步长对所述输入特征进行量化的量化误差;利用最小的量化误差对应的待定步长,将所述输入特征量化为位宽小于16比特的整型数据。
- 根据权利要求1所述的方法,其特征在于,所述将该网络层输出的输出特征量化为位宽小于16比特的整型数据,包括:获取多个预设的待定步长,所述待定步长对应的位宽小于16比特;分别计算利用各待定步长对所述输出特征进行量化的量化误差;利用最小的量化误差对应的待定步长,将所述输出特征量化为位宽小于16比特的整型数据。
- 根据权利要求1所述的方法,其特征在于,所述获取待识别图片,包括:获取人脸采集设备采集的人脸图片,或者,获取车辆采集设备采集的车辆图片;所述将所述待识别图片输入预先训练的目标深度学习模型,得到所述待识别图片中的目标特征,包括:将所述人脸图片输入预先训练的目标深度学习模型,得到所述人脸图片中的目标人脸特征,或者,将所述车辆图片输入预先训练的目标深度学习模型,得到所述车辆图片中的目标车辆特征;所述将所述目标特征与预先标定的目标特征进行比较,得到所述待识别图片的目标识别结果,包括:将所述目标人脸特征与预设人脸特征进行比较,得到人脸识别结果,或者,将所述目标车辆特征与预设车辆特征进行比较,得到车辆识别结果。
- 一种目标识别装置,其特征在于,所述装置包括:获取模块,用于获取待识别图片;计算模块,用于将所述待识别图片输入预先训练的目标深度学习模型,得到所述待识别图片中的目标特征,其中,针对所述目标深度学习模型中的每一个网络层,将输入该网络层的输入特征、该网络层的网络权重和该网络层输出的输出特征中的至少一项量化为位宽小于16比特的整型数据;比较模块,用于将所述目标特征与预先标定的目标特征进行比较,得到所述待识别图片的目标识别结果。
- 根据权利要求8所述的装置,其特征在于,所述装置还包括:预处理模块,用于对所述待识别图片进行预处理操作,得到预处理后的所述待识别图片,所述预处理操作至少包括对所述待识别图片进行裁剪。
- 根据权利要求8所述的装置,其特征在于,所述目标深度学习模型中的网络层包括:卷积层、全连接层、池化层、批归一化层、合并层和拼接层。
- 根据权利要求8所述的装置,其特征在于,所述计算模块,在用于所述将该网络层的网络权重量化为位宽小于16比特的整型数据时,具体用于:针对该网络层的各滤波器,读取该滤波器中绝对值最大的网络权重;根据所述绝对值最大的网络权重及小于16比特的预设位宽,计算该滤波器对应的量化步长;利用所述量化步长,将该滤波器中的各网络权重量化为所述预设位宽的整型数据。
- 根据权利要求8所述的装置,其特征在于,所述计算模块,在用于所述将输入该网络层的输入特征量化为位宽小于16比特的整型数据时,具体用于:获取多个预设的待定步长,所述待定步长对应的位宽小于16比特;分别计算利用各待定步长对所述输入特征进行量化的量化误差;利用最小的量化误差对应的待定步长,将所述输入特征量化为位宽小于16比特的整型数据。
- 根据权利要求8所述的装置,其特征在于,所述计算模块,在用于所述将该网络层输出的输出特征量化为位宽小于16比特的整型数据时,具体用于:获取多个预设的待定步长,所述待定步长对应的位宽小于16比特;分别计算利用各待定步长对所述输出特征进行量化的量化误差;利用最小的量化误差对应的待定步长,将所述输出特征量化为位宽小于16比特的整型数据。
- 根据权利要求8所述的装置,其特征在于,所述获取模块,具体用于:获取人脸采集设备采集的人脸图片,或者,获取车辆采集设备采集的车辆图片;所述计算模块,具体用于:将所述人脸图片输入预先训练的目标深度学习模型,得到所述人脸图片中的目标人脸特征,或者,将所述车辆图片输入预先训练的目标深度学习模型,得到所述车辆图片中的目标车辆特征;所述比较模块,具体用于:将所述目标人脸特征与预设人脸特征进行比较,得到人脸识别结果,或 者,将所述目标车辆特征与预设车辆特征进行比较,得到车辆识别结果。
- 一种计算机设备,其特征在于,包括处理器和机器可读存储介质,所述机器可读存储介质存储有能够被所述处理器执行的机器可执行指令,所述处理器被所述机器可执行指令促使:实现权利要求1至7中任意一项所述的方法。
- 一种机器可读存储介质,其特征在于,所述机器可读存储介质内存储有机器可执行指令,在被处理器调用和执行时,实现权利要求1至7中任意一项所述的方法。
- 一种计算机程序产品,其特征在于,用于在运行时执行:权利要求1至7中任意一项所述的方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911108141.4A CN112800813B (zh) | 2019-11-13 | 2019-11-13 | 一种目标识别方法及装置 |
CN201911108141.4 | 2019-11-13 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021093780A1 true WO2021093780A1 (zh) | 2021-05-20 |
Family
ID=75803382
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/128171 WO2021093780A1 (zh) | 2019-11-13 | 2020-11-11 | 一种目标识别方法及装置 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112800813B (zh) |
WO (1) | WO2021093780A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114140754A (zh) * | 2021-11-30 | 2022-03-04 | 北京比特易湃信息技术有限公司 | 一种基于深度学习的改装车识别方法 |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113408704A (zh) * | 2021-06-29 | 2021-09-17 | 深圳市商汤科技有限公司 | 数据处理方法、装置、设备及计算机可读存储介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104992167A (zh) * | 2015-07-28 | 2015-10-21 | 中国科学院自动化研究所 | 一种基于卷积神经网络的人脸检测方法及装置 |
US20180018555A1 (en) * | 2016-07-15 | 2018-01-18 | Alexander Sheung Lai Wong | System and method for building artificial neural network architectures |
CN110245577A (zh) * | 2019-05-23 | 2019-09-17 | 复钧智能科技(苏州)有限公司 | 目标车辆识别方法、装置及车辆实时监控系统 |
CN110309692A (zh) * | 2018-03-27 | 2019-10-08 | 杭州海康威视数字技术股份有限公司 | 人脸识别方法、装置及系统、模型训练方法及装置 |
-
2019
- 2019-11-13 CN CN201911108141.4A patent/CN112800813B/zh active Active
-
2020
- 2020-11-11 WO PCT/CN2020/128171 patent/WO2021093780A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104992167A (zh) * | 2015-07-28 | 2015-10-21 | 中国科学院自动化研究所 | 一种基于卷积神经网络的人脸检测方法及装置 |
US20180018555A1 (en) * | 2016-07-15 | 2018-01-18 | Alexander Sheung Lai Wong | System and method for building artificial neural network architectures |
CN110309692A (zh) * | 2018-03-27 | 2019-10-08 | 杭州海康威视数字技术股份有限公司 | 人脸识别方法、装置及系统、模型训练方法及装置 |
CN110245577A (zh) * | 2019-05-23 | 2019-09-17 | 复钧智能科技(苏州)有限公司 | 目标车辆识别方法、装置及车辆实时监控系统 |
Non-Patent Citations (1)
Title |
---|
JACOB BENOIT; KLIGYS SKIRMANTAS; CHEN BO; ZHU MENGLONG; TANG MATTHEW; HOWARD ANDREW; ADAM HARTWIG; KALENICHENKO DMITRY: "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference", 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, IEEE, 18 June 2018 (2018-06-18), pages 2704 - 2713, XP033476237, DOI: 10.1109/CVPR.2018.00286 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114140754A (zh) * | 2021-11-30 | 2022-03-04 | 北京比特易湃信息技术有限公司 | 一种基于深度学习的改装车识别方法 |
Also Published As
Publication number | Publication date |
---|---|
CN112800813A (zh) | 2021-05-14 |
CN112800813B (zh) | 2024-06-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11373087B2 (en) | Method and apparatus for generating fixed-point type neural network | |
TWI682325B (zh) | 辨識系統及辨識方法 | |
CN110799994B (zh) | 神经网络的自适应位宽缩减 | |
CN108197652B (zh) | 用于生成信息的方法和装置 | |
US10726573B2 (en) | Object detection method and system based on machine learning | |
KR20190125141A (ko) | 뉴럴 네트워크의 파라미터들을 양자화하는 방법 및 장치 | |
WO2022078002A1 (zh) | 一种图像处理方法、装置、设备及可读存储介质 | |
WO2021093780A1 (zh) | 一种目标识别方法及装置 | |
WO2019062721A1 (zh) | 语音身份特征提取器、分类器训练方法及相关设备 | |
US11741708B2 (en) | Image recognition method and system based on deep learning | |
CN108962231B (zh) | 一种语音分类方法、装置、服务器及存储介质 | |
CN106778910B (zh) | 基于本地训练的深度学习系统和方法 | |
CN110941964A (zh) | 双语语料筛选方法、装置及存储介质 | |
KR20210083935A (ko) | 뉴럴 네트워크의 파라미터들을 양자화하는 방법 및 장치 | |
CN109378014A (zh) | 一种基于卷积神经网络的移动设备源识别方法及系统 | |
KR20220130565A (ko) | 키워드 검출 방법 및 장치 | |
US20220366262A1 (en) | Method and apparatus for training neural network model | |
WO2019091401A1 (zh) | 深度神经网络的网络模型压缩方法、装置及计算机设备 | |
WO2022213825A1 (zh) | 基于神经网络的端到端语音增强方法、装置 | |
WO2021037174A1 (zh) | 一种神经网络模型训练方法及装置 | |
US20230078246A1 (en) | Centralized Management of Distributed Data Sources | |
WO2022246986A1 (zh) | 数据处理方法、装置、设备及计算机可读存储介质 | |
CN108847251B (zh) | 一种语音去重方法、装置、服务器及存储介质 | |
CN116992946B (zh) | 模型压缩方法、装置、存储介质和程序产品 | |
CN117173269A (zh) | 一种人脸图像生成方法、装置、电子设备和存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20886779 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20886779 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 19.05.2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20886779 Country of ref document: EP Kind code of ref document: A1 |