CN112800813B - Target identification method and device - Google Patents

Target identification method and device Download PDF

Info

Publication number
CN112800813B
CN112800813B CN201911108141.4A CN201911108141A CN112800813B CN 112800813 B CN112800813 B CN 112800813B CN 201911108141 A CN201911108141 A CN 201911108141A CN 112800813 B CN112800813 B CN 112800813B
Authority
CN
China
Prior art keywords
target
picture
network layer
bits
bit width
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911108141.4A
Other languages
Chinese (zh)
Other versions
CN112800813A (en
Inventor
杨希超
张渊
谢迪
浦世亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN201911108141.4A priority Critical patent/CN112800813B/en
Priority to PCT/CN2020/128171 priority patent/WO2021093780A1/en
Publication of CN112800813A publication Critical patent/CN112800813A/en
Application granted granted Critical
Publication of CN112800813B publication Critical patent/CN112800813B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application provides a target recognition method and device, which are used for acquiring a picture to be recognized, inputting the picture to be recognized into a pre-trained target deep learning model to obtain target characteristics in the picture to be recognized, and comparing the target characteristics with target characteristics calibrated in advance to obtain a target recognition result of the picture to be recognized. When the target deep learning model operates on the input pictures to be identified, at least one of the input features of the network layer, the network weights of the network layer and the output features of the network layer is quantized into low-bit integer data with the bit width smaller than 16 bits aiming at each network layer in the target deep learning model. For each network layer, low-bit integer data with lower bit width participates in operation, so that the bit width and the data quantity of the data participating in operation are reduced, the operation rate of target deep learning model for target recognition can be improved, and the efficiency of target recognition is improved.

Description

Target identification method and device
Technical Field
The present application relates to the field of machine learning technologies, and in particular, to a target recognition method and apparatus.
Background
Deep neural networks, which are an emerging field in machine learning research, analyze data by mimicking the mechanisms of the human brain, and are an intelligent model for analytical learning by building and simulating the human brain. At present, deep learning models, such as convolutional neural network models, cyclic neural network models, long-term and short-term memory network models, and the like, have become mainstream application methods in aspects of image classification, target detection, target tracking, voice recognition, face recognition, and the like.
At present, in a scene of target recognition, a picture to be recognized is input into a trained target deep learning model, and each network layer in the target deep learning model is operated, so that a target in the picture to be recognized can be recognized based on an operation result. When each network layer in the target deep learning model performs operation, the data participating in the operation is single-precision floating point data, and the efficiency of target identification is lower because the single-precision floating point data has higher bit width and the data quantity participating in the operation is larger.
Disclosure of Invention
The embodiment of the application aims to provide a target identification method and device so as to improve the efficiency of target identification. The specific technical scheme is as follows:
in a first aspect, an embodiment of the present application provides a target recognition method, including:
acquiring a picture to be identified;
Inputting a picture to be identified into a pre-trained target deep learning model to obtain target features in the picture to be identified, wherein at least one of the input features input into the network layer, the network weights of the network layer and the output features output by the network layer is quantized into low-bit integer data with bit width smaller than 16 bits aiming at each network layer in the target deep learning model;
And comparing the target characteristics with target characteristics calibrated in advance to obtain a target recognition result of the picture to be recognized.
Optionally, before the step of inputting the picture to be identified into the pre-trained target deep learning model to obtain the target feature in the picture to be identified, the method further includes:
and carrying out preprocessing operation on the picture to be identified to obtain the preprocessed picture to be identified, wherein the preprocessing operation at least comprises cutting the picture to be identified.
Optionally, the network layer in the target deep learning model includes: convolution layer, full connection layer, pooling layer, batch normalization layer, merging layer and splicing layer.
Optionally, the step of weighting the network weight of the network layer into low-bit integer data with bit width smaller than 16 bits includes:
for each filter of the network layer, reading the network weight with the largest absolute value in the filter;
Calculating a quantization step length corresponding to the filter according to the network weight with the maximum absolute value and a preset bit width smaller than 16 bits;
And quantizing each network weight in the filter into low-bit integer data with a preset bit width by using a quantization step length.
Optionally, the step of quantifying the input features input to the network layer into low-bit integer data having a bit width of less than 16 bits, includes:
obtaining a plurality of preset undetermined step sizes, wherein the bit width corresponding to the undetermined step sizes is smaller than 16 bits;
respectively calculating quantization errors for quantizing the input features by using each undetermined step length;
and quantizing the input characteristic into low-bit integer data with bit width smaller than 16 bits by using the undetermined step length corresponding to the minimum quantization error.
Optionally, the step of quantifying the output characteristics of the network layer output into low-bit integer data with bit width smaller than 16 bits includes:
obtaining a plurality of preset undetermined step sizes, wherein the bit width corresponding to the undetermined step sizes is smaller than 16 bits;
Respectively calculating quantization errors for quantizing the output characteristics by using each undetermined step length;
And quantizing the output characteristic into low-bit integer data with bit width smaller than 16 bits by using the undetermined step length corresponding to the minimum quantization error.
Optionally, the step of acquiring the picture to be identified includes:
acquiring a face picture acquired by a face acquisition device or acquiring a vehicle picture acquired by a vehicle acquisition device;
Inputting a picture to be identified into a pre-trained target deep learning model to obtain target features in the picture to be identified, wherein the method comprises the following steps of:
Inputting a face picture into a pre-trained target deep learning model to obtain target face features in the face picture, or inputting a vehicle picture into the pre-trained target deep learning model to obtain target vehicle features in the vehicle picture;
Comparing the target characteristics with target characteristics calibrated in advance to obtain a target recognition result of the picture to be recognized, wherein the method comprises the following steps:
And comparing the target face characteristics with preset face characteristics to obtain a face recognition result, or comparing the target vehicle characteristics with preset vehicle characteristics to obtain a vehicle recognition result.
In a second aspect, an embodiment of the present application provides an object recognition apparatus, including:
The acquisition module is used for acquiring the picture to be identified;
the computing module is used for inputting the picture to be identified into a pre-trained target deep learning model to obtain target characteristics in the picture to be identified, wherein at least one of the input characteristics of the network layer, the network weight of the network layer and the output characteristics output by the network layer is quantized into low-bit integer data with the bit width smaller than 16 bits aiming at each network layer in the target deep learning model;
and the comparison module is used for comparing the target characteristics with target characteristics calibrated in advance to obtain a target recognition result of the picture to be recognized.
Optionally, the apparatus further comprises:
The preprocessing module is used for preprocessing the picture to be recognized to obtain a preprocessed picture to be recognized, wherein the preprocessing operation at least comprises cutting the picture to be recognized.
Optionally, the network layer in the target deep learning model includes: convolution layer, full connection layer, pooling layer, batch normalization layer, merging layer and splicing layer.
Optionally, the calculating module, when used for weighting the network weight of the network layer into low-bit integer data with bit width smaller than 16 bits, is specifically used for:
for each filter of the network layer, reading the network weight with the largest absolute value in the filter;
Calculating a quantization step length corresponding to the filter according to the network weight with the maximum absolute value and a preset bit width smaller than 16 bits;
And quantizing each network weight in the filter into low-bit integer data with a preset bit width by using a quantization step length.
Optionally, the calculating module, when configured to quantize the input feature input to the network layer into low-bit integer data with a bit width of less than 16 bits, is specifically configured to:
obtaining a plurality of preset undetermined step sizes, wherein the bit width corresponding to the undetermined step sizes is smaller than 16 bits;
respectively calculating quantization errors for quantizing the input features by using each undetermined step length;
and quantizing the input characteristic into low-bit integer data with bit width smaller than 16 bits by using the undetermined step length corresponding to the minimum quantization error.
Optionally, the calculating module, when configured to quantize the output characteristic output by the network layer into low-bit integer data with a bit width smaller than 16 bits, is specifically configured to:
obtaining a plurality of preset undetermined step sizes, wherein the bit width corresponding to the undetermined step sizes is smaller than 16 bits;
Respectively calculating quantization errors for quantizing the output characteristics by using each undetermined step length;
And quantizing the output characteristic into low-bit integer data with bit width smaller than 16 bits by using the undetermined step length corresponding to the minimum quantization error.
Optionally, the acquiring module is specifically configured to:
acquiring a face picture acquired by a face acquisition device or acquiring a vehicle picture acquired by a vehicle acquisition device;
the computing module is specifically used for:
Inputting a face picture into a pre-trained target deep learning model to obtain target face features in the face picture, or inputting a vehicle picture into the pre-trained target deep learning model to obtain target vehicle features in the vehicle picture;
the comparison module is specifically used for:
And comparing the target face characteristics with preset face characteristics to obtain a face recognition result, or comparing the target vehicle characteristics with preset vehicle characteristics to obtain a vehicle recognition result.
In a third aspect, embodiments of the present application provide a computer device comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor, the processor being caused by the machine-executable instructions to: the method provided by the first aspect of the embodiment of the application is realized.
In a fourth aspect, embodiments of the present application provide a machine-readable storage medium storing machine-executable instructions that, when invoked and executed by a processor, implement the method provided by the first aspect of embodiments of the present application.
According to the target recognition method and device provided by the embodiment of the application, the picture to be recognized is acquired, the picture to be recognized is input into the pre-trained target deep learning model to obtain the target characteristics in the picture to be recognized, and the target characteristics are compared with the pre-calibrated target characteristics to obtain the target recognition result of the picture to be recognized. When the target deep learning model operates on the input pictures to be identified, at least one of the input features of the network layer, the network weights of the network layer and the output features of the network layer is quantized into low-bit integer data with the bit width smaller than 16 bits aiming at each network layer in the target deep learning model. For each network layer in the target deep learning model, the input characteristics of the input network layer, the network weights of the network layer or the output characteristics of the output network layer are quantized into low-bit integer data with the bit width smaller than 16 bits, so that for each network layer, the low-bit integer data with the lower bit width participates in the operation, the bit width and the data quantity of the data participating in the operation are reduced, the operation rate of target recognition by the target deep learning model can be improved, and the efficiency of target recognition is improved.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a target recognition method according to an embodiment of the application;
FIG. 2a is a schematic diagram of a convolutional layer according to an embodiment of the present application;
FIG. 2b is a schematic diagram of a fully connected layer according to an embodiment of the present application;
FIG. 2c is a schematic diagram of a pooling layer according to an embodiment of the present application;
FIG. 2d is a schematic diagram of a batch normalization layer according to an embodiment of the present application;
FIG. 2e is a schematic diagram of a merging layer according to an embodiment of the present application;
FIG. 2f is a schematic diagram of a splice layer according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a target recognition device according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
In order to improve the efficiency of object identification, the embodiment of the application provides an object identification method, an object identification device, computer equipment and a machine-readable storage medium. The following describes a target recognition method provided by the embodiment of the present application.
The execution subject of the object recognition method provided by the embodiment of the application can be a computer device with an object recognition function, such as an intelligent camera, an object recognizer and the like, and the execution subject at least comprises a core processing chip with data processing capability. The mode of implementing the target recognition method provided by the embodiment of the application can be at least one mode of software, hardware circuits and logic circuits arranged in an execution body.
As shown in fig. 1, the method for identifying an object according to an embodiment of the present application may include the following steps.
S101, acquiring a picture to be identified.
The picture to be identified is a picture containing a target to be identified, for example, the picture to be identified needs to identify the face target, the picture to be identified can be a picture containing the face target which is shot by an intelligent camera when a pedestrian enters the monitoring area, or a picture containing the face target which is input by a user according to the requirement. The targets mentioned in the embodiment of the application are not limited to face targets, but can be targets of automobiles, bicycles, buildings and the like.
S102, inputting a picture to be identified into a pre-trained target deep learning model to obtain target features in the picture to be identified, wherein at least one of the input features input into the network layer, the network weights of the network layer and the output features output by the network layer is quantized into low-bit integer data with bit width smaller than 16 bits aiming at each network layer in the target deep learning model.
After the picture to be identified is obtained, the picture to be identified is input into a target deep learning model, wherein the target deep learning model is a deep learning network model, such as a convolutional neural network model, a cyclic neural network model, a long-term and short-term memory network model and the like, and the target deep learning model can output target characteristics in the picture to be identified through operation of each network layer in the target deep learning model.
The target deep learning model is obtained by training based on a training sample in advance, the training sample can be a sample picture marked with a specified target in advance, the training sample is input into an initial network model, the training sample is operated by using a Back Propagation (BP) algorithm or other model training algorithms, an operation result is compared with a set nominal value, and the network weight of the network model is adjusted based on the comparison result. By sequentially inputting different training samples into the neural network model, iteratively executing the steps, continuously adjusting the network weight, and gradually approaching the output of the network model to the nominal value until the difference between the output of the network model and the nominal value is small enough or the output of the network model converges, determining the final network model as a target deep learning model.
In the process of calculating the input picture to be identified by utilizing the target deep learning model, at least one of the input characteristics of the network layer, the network weight of the network layer and the output characteristics of the network layer is quantized into low-bit integer data with the bit width smaller than 16 bits aiming at each network layer in the target deep learning model. By such quantization processing, the data participating in the operation is changed from single-precision floating point data to low-bit integer data with bit width smaller than 16 bits, and the bit width and data quantity of the data participating in the operation are reduced.
Optionally, the network layer in the target deep learning model may include: convolution layer, full connection layer, pooling layer, batch normalization layer, merging layer and splicing layer.
For the deep learning model, the following network layers may be specifically included: the number of network layers of each type in the deep learning model is not limited, as are the Convolution layers (convolutional layers) (shown in fig. 2 a), the InnerProduct layers (fully connected layers) (shown in fig. 2 b), the Pooling layers (pooled layers) (shown in fig. 2 c), the BN layers (Batch Normalization, batch normalization layers) (shown in fig. 2 d) for adjusting the scale of the channel, the Eltwise layers (merge layers) (shown in fig. 2 e) for additively merging two inputs, and the Concat layers (splice layers) (shown in fig. 2 f) for splicing two inputs. In fig. 2a to 2f, I INTn represents an n-bit integer input feature, W INTn represents an n-bit integer network weight, O INTn represents an n-bit integer output feature, I1 INTn and I2 INTn represent 2-branch n-bit integer input features, where n is less than 16. In one implementation manner, in each network layer type, the network weight, the input feature and the output feature may be n-bit integer data.
And S103, comparing the target characteristics with target characteristics calibrated in advance to obtain a target recognition result of the picture to be recognized.
After the target feature in the picture to be identified is obtained by calculation through the target deep learning model, the target feature can be compared with a target feature calibrated in advance, and whether the target feature is a target feature calibrated or not is judged through meaning comparison of feature values, so that identification results of whether the target in the picture to be identified is a target calibrated or not, how likely the target in the picture to be identified is the target calibrated, the position of the target in the picture to be identified and the like can be obtained. The specific comparison process may be to compare the feature points one by one, determine whether each feature point is the same as the corresponding feature point in the calibrated target feature, and if the number of the same feature points exceeds the threshold, consider the target in the picture to be identified as the calibrated target.
By applying the embodiment of the application, the picture to be identified is obtained, the picture to be identified is input into a pre-trained target depth learning model to obtain the target characteristics in the picture to be identified, and the target characteristics are compared with the target characteristics calibrated in advance to obtain the target identification result of the picture to be identified. When the target deep learning model operates on the input pictures to be identified, at least one of the input features of the network layer, the network weights of the network layer and the output features of the network layer is quantized into low-bit integer data with the bit width smaller than 16 bits aiming at each network layer in the target deep learning model. For each network layer in the target deep learning model, the input characteristics of the input network layer, the network weights of the network layer or the output characteristics of the output network layer are quantized into low-bit integer data with the bit width smaller than 16 bits, so that for each network layer, the low-bit integer data with the lower bit width participates in the operation, the bit width and the data quantity of the data participating in the operation are reduced, the operation rate of target recognition by the target deep learning model can be improved, and the efficiency of target recognition is improved.
Based on the embodiment shown in fig. 1, in the face recognition scenario, the face recognition implementation process mainly includes: acquiring a face picture acquired by face acquisition equipment; inputting the face picture into a pre-trained target deep learning model to obtain target face characteristics in the face picture; and comparing the target face characteristics with preset face characteristics to obtain a face recognition result.
In the scene of vehicle identification, the execution process of vehicle identification mainly comprises the following steps: acquiring a vehicle picture acquired by a vehicle acquisition device; inputting the vehicle picture into a pre-trained target deep learning model to obtain target vehicle characteristics in the vehicle picture; and comparing the target vehicle characteristics with preset vehicle characteristics to obtain a vehicle identification result.
Optionally, in the embodiment shown in fig. 1, the step of weighting the network weight of the network layer into low-bit integer data with a bit width smaller than 16 bits may specifically be:
For each filter of the network layer, reading the network weight with the largest absolute value in the filter; calculating a quantization step length corresponding to the filter according to the network weight with the maximum absolute value and a preset bit width smaller than 16 bits; and quantizing each network weight in the filter into low-bit integer data with a preset bit width by using a quantization step length.
One network layer is composed of a plurality of filters (filters), one filter is a convolution kernel, one filter includes a plurality of network weights, for each filter, the network weight W max with the largest absolute value can be read from the filter, the preset bit width is how large the network weight is expected to be quantized, the preset bit width is smaller than 16 bits, and according to the network weight W max with the largest absolute value and the preset bit width bitwidth smaller than 16 bits, the quantization step size step W corresponding to the filter can be calculated, and specifically, the calculation can be performed by using the formula (1):
stepW=Wmax/(2bitwidth-1) (1)
After calculating the quantization step W, the quantization step W may be used to quantize each network weight in the filter to obtain low-bit integer data with a preset bit width.
Optionally, in the embodiment shown in fig. 1, the step of quantifying the input feature input to the network layer into low-bit integer data with bit width smaller than 16 bits may specifically be:
Obtaining a plurality of preset undetermined step sizes, wherein the bit width corresponding to the undetermined step sizes is smaller than 16 bits; respectively calculating quantization errors for quantizing the input features by using each undetermined step length; and quantizing the input characteristic into low-bit integer data with bit width smaller than 16 bits by using the undetermined step length corresponding to the minimum quantization error.
Optionally, in the embodiment shown in fig. 1, the step of quantifying the output characteristics of the network layer output into low-bit integer data with bit width smaller than 16 bits may specifically be:
Obtaining a plurality of preset undetermined step sizes, wherein the bit width corresponding to the undetermined step sizes is smaller than 16 bits; respectively calculating quantization errors for quantizing the output characteristics by using each undetermined step length; and quantizing the output characteristic into low-bit integer data with bit width smaller than 16 bits by using the undetermined step length corresponding to the minimum quantization error.
For input features or output features, a plurality of preset undetermined Step sizes can be utilized to quantize the features respectively, a feature integer value A q quantized under the undetermined Step sizes can be obtained, the original feature floating point value A float is utilized to subtract the product of the undetermined Step sizes and the feature integer value A q to obtain quantization errors under the undetermined Step sizes, a plurality of quantization errors can be obtained, a formula (2) is utilized to obtain the minimum value of the quantization errors, the undetermined Step sizes corresponding to the minimum value are the quantization Step sizes a of the input features/output features, the quantization Step sizes Step a are utilized to quantize the input features/output features, and low-bit integer data with preset bit widths are obtained through quantization.
stepa=argminstep||Afloat-step*Aq||n (2)
When the step length is utilized to quantize the input feature, the network weight and the output feature, the quantized data can be obtained by calculating according to the formula (3).
Qfloat=step*QINTn (3)
Where Q float is the floating point value of the original input feature, network weight, or output feature, step is the step size used for quantization, and Q INTn is the quantized data.
When the network weight of the network layer is quantized, the quantization parameter may be pre-calculated, or may be calculated in real time when the electronic device in the embodiment of the present invention performs quantization, both the pre-calculation of the quantization parameter and the real-time calculation of the quantization parameter may be performed by adopting the above formula, where the calculated quantization parameter is pre-recorded in a buffer, and the quantization parameter is directly read from the buffer to perform quantization when the quantization is performed.
Optionally, before executing S102, the embodiment of the present application may further execute: and carrying out preprocessing operation on the picture to be identified to obtain the preprocessed picture to be identified, wherein the preprocessing operation at least can comprise cutting the picture to be identified.
After the picture to be identified is obtained, because the original picture to be identified is oversized, the picture quality is poor and the like, the target deep learning model cannot directly operate the original picture to be identified, and then the picture to be identified needs to be subjected to pretreatment operation, wherein the pretreatment operation at least comprises cutting the picture to be identified, and can also comprise operations such as graying the picture to be identified, normalizing pixel values and the like. Preprocessing an image to be recognized into a picture easier to operate by a target deep learning model through preprocessing operation, inputting the preprocessed picture to be recognized into the target deep learning model, and operating the picture by the target deep learning model.
Corresponding to the above method embodiment, the embodiment of the present application provides an object recognition device, as shown in fig. 3, which may include:
an obtaining module 310, configured to obtain a picture to be identified;
the computing module 320 is configured to input a picture to be identified into a pre-trained target deep learning model to obtain target features in the picture to be identified, where, for each network layer in the target deep learning model, at least one of the input features input into the network layer, the network weights of the network layer, and the output features output from the network layer is quantized into low-bit integer data with a bit width less than 16 bits;
And the comparison module 330 is configured to compare the target feature with a target feature calibrated in advance to obtain a target recognition result of the picture to be recognized.
Optionally, the apparatus may further include:
The preprocessing module is used for preprocessing the picture to be recognized to obtain a preprocessed picture to be recognized, wherein the preprocessing operation at least comprises cutting the picture to be recognized.
Optionally, the network layer in the target deep learning model may include: convolution layer, full connection layer, pooling layer, batch normalization layer, merging layer and splicing layer.
Optionally, the calculating module 320, when configured to weight the network weight of the network layer into low-bit integer data with a bit width of less than 16 bits, may be specifically configured to:
For each filter of the network layer, reading the network weight with the largest absolute value in the filter; calculating a quantization step length corresponding to the filter according to the network weight with the maximum absolute value and a preset bit width smaller than 16 bits; and quantizing each network weight in the filter into low-bit integer data with a preset bit width by using a quantization step length.
Optionally, the calculating module 320, when configured to quantize the input feature input to the network layer into low-bit integer data with a bit width of less than 16 bits, may specifically be configured to:
Obtaining a plurality of preset undetermined step sizes, wherein the bit width corresponding to the undetermined step sizes is smaller than 16 bits; respectively calculating quantization errors for quantizing the input features by using each undetermined step length; and quantizing the input characteristic into low-bit integer data with bit width smaller than 16 bits by using the undetermined step length corresponding to the minimum quantization error.
Optionally, the calculating module 320, when configured to quantize the output characteristic of the network layer output into low-bit integer data with a bit width of less than 16 bits, may specifically be configured to:
Obtaining a plurality of preset undetermined step sizes, wherein the bit width corresponding to the undetermined step sizes is smaller than 16 bits; respectively calculating quantization errors for quantizing the output characteristics by using each undetermined step length; and quantizing the output characteristic into low-bit integer data with bit width smaller than 16 bits by using the undetermined step length corresponding to the minimum quantization error.
Optionally, the obtaining module 310 may specifically be configured to: acquiring a face picture acquired by a face acquisition device or acquiring a vehicle picture acquired by a vehicle acquisition device;
The computing module 320 may be specifically configured to: inputting a face picture into a pre-trained target deep learning model to obtain target face features in the face picture, or inputting a vehicle picture into the pre-trained target deep learning model to obtain target vehicle features in the vehicle picture;
The comparison module 330 may specifically be configured to: and comparing the target face characteristics with preset face characteristics to obtain a face recognition result, or comparing the target vehicle characteristics with preset vehicle characteristics to obtain a vehicle recognition result.
By applying the embodiment of the application, the picture to be identified is obtained, the picture to be identified is input into a pre-trained target depth learning model to obtain the target characteristics in the picture to be identified, and the target characteristics are compared with the target characteristics calibrated in advance to obtain the target identification result of the picture to be identified. When the target deep learning model operates on the input pictures to be identified, at least one of the input features of the network layer, the network weights of the network layer and the output features of the network layer is quantized into low-bit integer data with the bit width smaller than 16 bits aiming at each network layer in the target deep learning model. For each network layer in the target deep learning model, the input characteristics of the input network layer, the network weights of the network layer or the output characteristics of the output network layer are quantized into low-bit integer data with the bit width smaller than 16 bits, so that for each network layer, the low-bit integer data with the lower bit width participates in the operation, the bit width and the data quantity of the data participating in the operation are reduced, the operation rate of target recognition by the target deep learning model can be improved, and the efficiency of target recognition is improved.
Embodiments of the present application provide a computer device, as shown in fig. 4, may include a processor 401 and a machine-readable storage medium 402, the machine-readable storage medium 402 storing machine-executable instructions capable of being executed by the processor 401, the processor 401 being caused by the machine-executable instructions to: all steps of the object recognition method as described above are implemented.
The machine-readable storage medium may include RAM (Random Access Memory ) or NVM (Non-Volatile Memory), such as at least one magnetic disk Memory. In the alternative, the machine-readable storage medium may also be at least one memory device located remotely from the processor.
The processor may be a general-purpose processor, including a CPU (Central Processing Unit ), NP (Network Processor, network processor), etc.; but may also be a DSP (DIGITAL SIGNAL Processing), ASIC (Application SPECIFIC INTEGRATED Circuit), FPGA (Field-Programmable gate array) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components.
The machine-readable storage medium 402 and the processor 401 may be in data transmission by way of a wired connection or a wireless connection, and the computer device may communicate with other devices through a wired communication interface or a wireless communication interface. The example of data transfer between the processor 401 and the machine-readable storage medium 402 via a bus is shown in fig. 4 and is not intended to be limiting.
In this embodiment, the processor 401 can implement by reading machine executable instructions stored in the machine readable storage medium 402 and by executing the machine executable instructions: and acquiring a picture to be identified, inputting the picture to be identified into a pre-trained target depth learning model to obtain target characteristics in the picture to be identified, and comparing the target characteristics with target characteristics calibrated in advance to obtain a target identification result of the picture to be identified. When the target deep learning model operates on the input pictures to be identified, at least one of the input features of the network layer, the network weights of the network layer and the output features of the network layer is quantized into low-bit integer data with the bit width smaller than 16 bits aiming at each network layer in the target deep learning model. For each network layer in the target deep learning model, the input characteristics of the input network layer, the network weights of the network layer or the output characteristics of the output network layer are quantized into low-bit integer data with the bit width smaller than 16 bits, so that for each network layer, the low-bit integer data with the lower bit width participates in the operation, the bit width and the data quantity of the data participating in the operation are reduced, the operation rate of target recognition by the target deep learning model can be improved, and the efficiency of target recognition is improved.
The embodiments of the present application also provide a machine-readable storage medium storing machine-executable instructions that, when invoked and executed by a processor, perform all the steps of the object recognition method described above.
In this embodiment, the machine-readable storage medium stores machine-executable instructions for executing the object recognition method provided by the embodiment of the present application at runtime, so that it can implement: and acquiring a picture to be identified, inputting the picture to be identified into a pre-trained target depth learning model to obtain target characteristics in the picture to be identified, and comparing the target characteristics with target characteristics calibrated in advance to obtain a target identification result of the picture to be identified. When the target deep learning model operates on the input pictures to be identified, at least one of the input features of the network layer, the network weights of the network layer and the output features of the network layer is quantized into low-bit integer data with the bit width smaller than 16 bits aiming at each network layer in the target deep learning model. For each network layer in the target deep learning model, the input characteristics of the input network layer, the network weights of the network layer or the output characteristics of the output network layer are quantized into low-bit integer data with the bit width smaller than 16 bits, so that for each network layer, the low-bit integer data with the lower bit width participates in the operation, the bit width and the data quantity of the data participating in the operation are reduced, the operation rate of target recognition by the target deep learning model can be improved, and the efficiency of target recognition is improved.
For computer devices and machine-readable storage medium embodiments, the description is relatively simple, as the method content involved is substantially similar to the method embodiments described above, with reference to portions of the method embodiments being relevant.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for apparatus, computer devices, and machine-readable storage medium embodiments, the description is relatively simple as it is substantially similar to method embodiments, with reference to the accompanying descriptive matter in which there are various aspects of method embodiments.
The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims (10)

1. A method of target identification, the method comprising:
acquiring a picture to be identified;
Inputting the picture to be identified into a pre-trained target deep learning model to obtain target features in the picture to be identified, wherein at least one of the input features input into the network layer and the output features output by the network layer is quantized into low-bit integer data with bit width smaller than 16 bits aiming at each network layer in the target deep learning model;
Comparing the target characteristics with target characteristics calibrated in advance to obtain a target recognition result of the picture to be recognized;
The said quantizing the input features to the network layer into low bit integer data with bit width less than 16 bits includes: acquiring a plurality of preset undetermined step sizes, wherein the bit width corresponding to the undetermined step sizes is smaller than 16 bits; respectively calculating quantization errors for quantizing the input features by using each undetermined step length; quantizing the input characteristic into low-bit integer data with bit width smaller than 16 bits by using a undetermined step length corresponding to the minimum quantization error;
And/or the number of the groups of groups,
The said quantization of the output characteristics of the network layer output into low bit integer data with bit width smaller than 16 bits includes: acquiring a plurality of preset undetermined step sizes, wherein the bit width corresponding to the undetermined step sizes is smaller than 16 bits; respectively calculating quantization errors for quantizing the output features by using each undetermined step length; and quantizing the output characteristic into low-bit integer data with bit width smaller than 16 bits by using a pending step length corresponding to the minimum quantization error.
2. The method of claim 1, wherein prior to said inputting the picture to be identified into a pre-trained target deep learning model to obtain target features in the picture to be identified, the method further comprises:
And preprocessing the picture to be identified to obtain the preprocessed picture to be identified, wherein the preprocessing at least comprises cutting the picture to be identified.
3. The method of claim 1, wherein the network layer in the target deep learning model comprises: convolution layer, full connection layer, pooling layer, batch normalization layer, merging layer and splicing layer.
4. The method of claim 1, wherein for each network layer in the target deep learning model, the network weights for that network layer are also weighted to low bit integer data having a bit width of less than 16 bits; the step of weighting the network weight of the network layer into low-bit integer data with bit width smaller than 16 bits comprises the following steps:
for each filter of the network layer, reading the network weight with the largest absolute value in the filter;
Calculating a quantization step length corresponding to the filter according to the network weight with the maximum absolute value and a preset bit width smaller than 16 bits;
And weighting each network weight in the filter into low-bit integer data with the preset bit width by utilizing the quantization step length.
5. The method according to claim 1, wherein the obtaining the picture to be identified comprises:
acquiring a face picture acquired by a face acquisition device or acquiring a vehicle picture acquired by a vehicle acquisition device;
Inputting the picture to be identified into a pre-trained target deep learning model to obtain target features in the picture to be identified, wherein the method comprises the following steps:
Inputting the face picture into a pre-trained target deep learning model to obtain target face features in the face picture, or inputting the vehicle picture into a pre-trained target deep learning model to obtain target vehicle features in the vehicle picture;
Comparing the target feature with a target feature calibrated in advance to obtain a target recognition result of the picture to be recognized, wherein the method comprises the following steps:
and comparing the target face characteristics with preset face characteristics to obtain a face recognition result, or comparing the target vehicle characteristics with preset vehicle characteristics to obtain a vehicle recognition result.
6. An object recognition apparatus, characterized in that the apparatus comprises:
The acquisition module is used for acquiring the picture to be identified;
the computing module is used for inputting the picture to be identified into a pre-trained target deep learning model to obtain target features in the picture to be identified, wherein for each network layer in the target deep learning model, at least one of the input features input into the network layer and the output features output by the network layer is quantized into low-bit integer data with the bit width smaller than 16 bits;
the comparison module is used for comparing the target characteristics with target characteristics calibrated in advance to obtain a target recognition result of the picture to be recognized;
The calculation module, when used for quantizing the input characteristic input to the network layer into low-bit integer data with bit width smaller than 16 bits, is specifically used for:
Acquiring a plurality of preset undetermined step sizes, wherein the bit width corresponding to the undetermined step sizes is smaller than 16 bits;
respectively calculating quantization errors for quantizing the input features by using each undetermined step length;
quantizing the input characteristic into low-bit integer data with bit width smaller than 16 bits by using a undetermined step length corresponding to the minimum quantization error;
the calculation module, when used for quantizing the output characteristic output by the network layer into low-bit integer data with bit width smaller than 16 bits, is specifically used for:
Acquiring a plurality of preset undetermined step sizes, wherein the bit width corresponding to the undetermined step sizes is smaller than 16 bits;
respectively calculating quantization errors for quantizing the output features by using each undetermined step length;
and quantizing the output characteristic into low-bit integer data with bit width smaller than 16 bits by using a pending step length corresponding to the minimum quantization error.
7. The apparatus of claim 6, wherein the apparatus further comprises:
The preprocessing module is used for preprocessing the picture to be identified to obtain the preprocessed picture to be identified, and the preprocessing operation at least comprises cutting the picture to be identified.
8. The apparatus of claim 6, wherein the network layer in the target deep learning model comprises: convolution layer, full connection layer, pooling layer, batch normalization layer, merging layer and splicing layer.
9. The apparatus of claim 6, wherein the computing module is further configured to, for each network layer in the target deep learning model, further weight network weights for that network layer into low bit integer data having a bit width of less than 16 bits; the calculation module, when used for the weighting of the network weight of the network layer into the low-bit integer data with the bit width smaller than 16 bits, is specifically used for:
for each filter of the network layer, reading the network weight with the largest absolute value in the filter;
Calculating a quantization step length corresponding to the filter according to the network weight with the maximum absolute value and a preset bit width smaller than 16 bits;
And weighting each network weight in the filter into low-bit integer data with the preset bit width by utilizing the quantization step length.
10. The apparatus of claim 6, wherein the obtaining module is specifically configured to:
acquiring a face picture acquired by a face acquisition device or acquiring a vehicle picture acquired by a vehicle acquisition device;
the computing module is specifically configured to:
Inputting the face picture into a pre-trained target deep learning model to obtain target face features in the face picture, or inputting the vehicle picture into a pre-trained target deep learning model to obtain target vehicle features in the vehicle picture;
the comparison module is specifically configured to:
and comparing the target face characteristics with preset face characteristics to obtain a face recognition result, or comparing the target vehicle characteristics with preset vehicle characteristics to obtain a vehicle recognition result.
CN201911108141.4A 2019-11-13 2019-11-13 Target identification method and device Active CN112800813B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201911108141.4A CN112800813B (en) 2019-11-13 2019-11-13 Target identification method and device
PCT/CN2020/128171 WO2021093780A1 (en) 2019-11-13 2020-11-11 Target identification method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911108141.4A CN112800813B (en) 2019-11-13 2019-11-13 Target identification method and device

Publications (2)

Publication Number Publication Date
CN112800813A CN112800813A (en) 2021-05-14
CN112800813B true CN112800813B (en) 2024-06-07

Family

ID=75803382

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911108141.4A Active CN112800813B (en) 2019-11-13 2019-11-13 Target identification method and device

Country Status (2)

Country Link
CN (1) CN112800813B (en)
WO (1) WO2021093780A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104992167A (en) * 2015-07-28 2015-10-21 中国科学院自动化研究所 Convolution neural network based face detection method and apparatus
CN110245577A (en) * 2019-05-23 2019-09-17 复钧智能科技(苏州)有限公司 Target vehicle recognition methods, device and Vehicular real time monitoring system
CN110309692A (en) * 2018-03-27 2019-10-08 杭州海康威视数字技术股份有限公司 Face identification method, apparatus and system, model training method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180018555A1 (en) * 2016-07-15 2018-01-18 Alexander Sheung Lai Wong System and method for building artificial neural network architectures

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104992167A (en) * 2015-07-28 2015-10-21 中国科学院自动化研究所 Convolution neural network based face detection method and apparatus
CN110309692A (en) * 2018-03-27 2019-10-08 杭州海康威视数字技术股份有限公司 Face identification method, apparatus and system, model training method and device
CN110245577A (en) * 2019-05-23 2019-09-17 复钧智能科技(苏州)有限公司 Target vehicle recognition methods, device and Vehicular real time monitoring system

Also Published As

Publication number Publication date
CN112800813A (en) 2021-05-14
WO2021093780A1 (en) 2021-05-20

Similar Documents

Publication Publication Date Title
CN109754066B (en) Method and apparatus for generating a fixed-point neural network
CN107690660B (en) Image recognition method and device
CN108171203B (en) Method and device for identifying vehicle
CN110909784B (en) Training method and device of image recognition model and electronic equipment
CN110929785B (en) Data classification method, device, terminal equipment and readable storage medium
CN110135505B (en) Image classification method and device, computer equipment and computer readable storage medium
CN113065525B (en) Age identification model training method, face age identification method and related device
CN111105017A (en) Neural network quantization method and device and electronic equipment
CN113011532A (en) Classification model training method and device, computing equipment and storage medium
CN113487223B (en) Risk assessment method and system based on information fusion
CN110889290B (en) Text encoding method and apparatus, text encoding validity checking method and apparatus
CN112561050B (en) Neural network model training method and device
CN112800813B (en) Target identification method and device
CN105740916B (en) Characteristics of image coding method and device
CN116258873A (en) Position information determining method, training method and device of object recognition model
CN112699809B (en) Vaccinia category identification method, device, computer equipment and storage medium
CN111882046B (en) Multimedia data identification method, device, equipment and computer storage medium
CN116258190A (en) Quantization method, quantization device and related equipment
CN113989632A (en) Bridge detection method and device for remote sensing image, electronic equipment and storage medium
CN111382761B (en) CNN-based detector, image detection method and terminal
CN113239075A (en) Construction data self-checking method and system
CN114648646B (en) Image classification method and device
CN110942179A (en) Automatic driving route planning method and device and vehicle
CN113762403B (en) Image processing model quantization method, device, electronic equipment and storage medium
CN112668702B (en) Fixed-point parameter optimization method, system, terminal and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant