CN110991229B

CN110991229B - Three-dimensional object identification method based on DSP chip and quantization model

Info

Publication number: CN110991229B
Application number: CN201911018559.6A
Authority: CN
Inventors: 王资; 朝红阳
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2019-10-24
Filing date: 2019-10-24
Publication date: 2023-04-28
Anticipated expiration: 2039-10-24
Also published as: CN110991229A

Abstract

The invention relates to a three-dimensional object identification method based on a DSP chip and a quantization model. The three-dimensional data acquisition device is an RGB-D camera, depth information of an object in a picture is obtained after shooting, and finally point cloud data is synthesized; inputting the point cloud data into a three-dimensional data feature extractor, wherein a quantization parameter model storage module in the feature extractor is used for storing storage parameters of a quantization model, and the acceleration characteristic of a DSP parallel computing acceleration module is utilized to rapidly complete convolution, pooling and residual operation in a deep neural network so as to obtain the features of the input data; and the feature decoder reversely decodes the feature according to the mode of encrypting the feature during model training to obtain a required feature format. The feature extractor in the invention can extract the features of the three-dimensional data, and can accelerate the speed of feature extraction by a data structure optimization and hardware acceleration method.

Description

Three-dimensional object identification method based on DSP chip and quantization model

Technical Field

The invention belongs to the field of computer vision, and in particular relates to a three-dimensional object identification method based on a DSP chip and a quantization model.

Background

The model parameter quantization representation is called a quantization model, the size of a model file can be reduced by about 3/4, meanwhile, floating point operation can be converted into integer operation, and the speed of feature extraction can be improved theoretically.

Compared with the CPU of ARM architecture, the existing DSP-based feature extractor can finish the operations of various layers in the depth network, including a convolution layer, a pooling layer, a residual layer and the like, more quickly by utilizing the advantage that a DSP chip can perform parallel computation.

The invention uses the point cloud data to carry out object recognition, and the object recognition method has a plurality of methods, namely two-dimensional data and three-dimensional point cloud data are used, the invention uses pattern recognition or adopts a deep learning method to carry out feature extraction on the corresponding object data, and finally the recognition task is completed.

The input of the three-dimensional data is not traditional image data (comprising xy plane coordinates and the color of each pixel point), but three-dimensional data such as point cloud (comprising xyz three-dimensional coordinates and possibly color and other information), unlike planar data, point cloud data is continuous data, if discretized, the point cloud data is very sparse, so that the existing efficient method is often to directly operate on original point cloud information to obtain the characteristic value or characteristic vector of the input point cloud.

The prior inventions have only the above parts respectively, and they have not been used in combination, although the fields and directions of these technologies are not the same. The prior art has neither quantized three-dimensional data feature extractors nor DSP-based three-dimensional data feature extractors. There is no quantized three-dimensional data feature extractor because the existing network quantization structure is mostly based on processing the traditional two-dimensional image design, while the quantization structure of the network structure to process the three-dimensional data needs some variants, such as the quantization calculation mode of mixing 32-bit shaping number and 8-bit shaping number and general operation except convolution, and no unified processing mode exists nowadays. The DSP-based three-dimensional data feature extractor is not yet available, is not yet provided with a general, shaped and open-source deep learning framework based on a DSP chip, has larger development threshold and difficulty, and can utilize the advantage of hardware acceleration of DSP parallel operation by accurately writing memory data to be accelerated into a DSP cache.

Disclosure of Invention

The invention provides a three-dimensional object recognition method based on a DSP chip and a quantization model, which aims to overcome the defects in the prior art, so that the feature extractor can extract the features of three-dimensional data and can accelerate the feature extraction speed through the data structure optimization and hardware acceleration methods.

In order to solve the technical problems, the invention adopts the following technical scheme: the three-dimensional object recognition method based on the DSP chip and the quantization model comprises a three-dimensional data collector, a three-dimensional data feature extractor and a feature decoder, wherein the three-dimensional data feature extractor comprises a quantization parameter model storage module and a DSP parallel computing acceleration module; the three-dimensional data acquisition device is an RGB-D camera, depth information of an object in a picture is obtained after shooting, and finally the three-dimensional data acquisition device is synthesized into point cloud data; inputting the point cloud data into a three-dimensional data feature extractor, wherein a quantization parameter model storage module in the feature extractor is used for storing storage parameters of a quantization model, and the stored parameters are 8bit integer parameters; the acceleration characteristic of the DSP parallel computing acceleration module is utilized, 32 instructions under the x86 frame can be completed in parallel at most, convolution, pooling and residual operation in the deep neural network can be completed rapidly, and finally the characteristics of input data are obtained; the feature decoder decodes the features in a reverse mode according to the mode of encrypting the features during model training to obtain a required feature format, for example, the recognition task is to decode the features to obtain a feature vector with the length of K and an index of the maximum feature value, and the index-label table is compared to obtain the label of the data.

In the invention, the three-dimensional data acquisition device acquires three-dimensional data, inputs the three-dimensional data into the three-dimensional data feature extraction device to acquire features, and finally inputs the features into the feature decoder to finish the tasks of computer vision, such as recognition, detection and segmentation. The feature extractor of the three-dimensional data is added with the hardware acceleration characteristic of DSP parallel computation and the characteristic of feature extractor model quantization, so that the feature extraction rate can be accelerated, and meanwhile, the parameter model file can be reduced, so that the feature extractor can be directly operated on mobile terminals such as mobile phones. The purpose of the feature extractor using the three-dimensional point cloud data is that the method inputs and saves the three-dimensional data of the object instead of the two-dimensional pixel data, so that the privacy of the user can be protected, the two-dimensional image data can be easily read out some contents possibly with privacy, such as texture information of the object and possibly with privacy content in the background, and the like through the self-reading capability of human beings, the three-dimensional object information has fewer details than the two-dimensional information, and the three-dimensional data acquisition device with the acquisition range set can not acquire the background information. 7. Further, when the quantized parameter model storage module is used for storing parameters, a quantized model needs to be acquired, and the acquisition method of the quantized model comprises quantization during training and quantization after training;

the step of quantifying is when training comprises: firstly, calling a tf.control.size.create_training_graph interface of a Tensorflow during model training, realizing the function of adding a pseudo quantization node after general operation, wherein after all operation, the pseudo quantization node stores the maximum value and the minimum value of the last node, and using the two values, and utilizing a mapping formula: q=r/scale+zero_pt, i.e. a manner of storing parameters from 32-bit floating point numbers to a manner of storing parameters with 8-bit integer numbers, where R represents 32-bit floating point numbers, Q represents 8-bit integer numbers, scale is mapping scaling, scale= (Vmax-Vmin)/255, zero_pt is mapping zero point, zero_pt= -255 x Vmin/(Vmax-Vmin) = -Vmin/scale, where Vmax and Vmin are obtained from pseudo-quantization nodes, and 255 is a maximum value representable by 8-bit unsigned integer numbers;

then, when the model parameters are fixed, calling the tf.control.size.create_eval_graph interface and realizing the function of adding a pseudo quantization node after the normal operation, which is not supported by the interface. Then, fixing model parameters, calling a toco overt script after the parameters are fixed, and combining information of a general node and a pseudo quantization node corresponding to the general node to obtain a quantization model;

the steps when quantized after training include: firstly, a general training method is used in a model training stage;

then, when model parameters are fixed, manually adding pseudo-quantization nodes after nodes such as weight, activation, matrix multiplication and addition, and then feeding part of training data into the model to obtain the maximum value and the minimum value which need to be recorded by the pseudo-quantization nodes and are related to the last node; and then fixing the model parameters, calling a toco overt script after the parameters are fixed, and combining the information of the general nodes and the corresponding pseudo quantization nodes to obtain a quantization model.

Further, the method for accelerating the DSP parallel in the DSP parallel computing acceleration module comprises the following steps: when a DSP instruction is used, a direct memory access DMA technology is used, data needing to be operated is directly put into a DSP chip memory, and DMA is used again to extract the data from the DSP memory to a device memory after the operation is finished, wherein a CPU (Central processing Unit) using an ARM framework and the DSP chip work simultaneously, namely, when the DSP calculates, the CPU extracts the result of the last DSP operation at the same time in the DMA, and prepares to write the data needing to be used by the next DSP into the DSP memory, once the DSP chip finishes the calculation, DMA writing is immediately executed, so that the next calculation can be immediately started, and the utilization rate of the DSP chip is improved.

Further, the three-dimensional object identification method based on the DSP chip and the quantization model specifically comprises the following steps:

s1, acquiring training data, wherein the acquisition method is to acquire depth information of an object by using a three-dimensional data acquisition device, and acquire point cloud data of the object after background and noise information are removed;

s2, training a model, namely training the training data by using a deep learning network structure capable of processing point cloud data end to obtain the model; the training data are the data collected in the step S1, and the network structure is a baseline variant of DGCNN;

s3, quantizing the model, wherein a pseudo quantizing node is used in the step S2, the nature of the pseudo quantizing node is floating point operation, namely a non-quantizing model, the information of the pseudo quantizing node is combined to the previous node in the step S3, the model is truly quantized, the representing method of model parameters is changed from 32-bit floating point numbers to 8-bit integers, the size of the model is compressed, and meanwhile the forward propagation calculation efficiency is improved;

s4, establishing an index-label table, wherein the category of the data is identified at the same time when the data is collected in the step S1, and the label is represented by using a natural number index to obtain a table with one-to-one correspondence between indexes and labels;

s5, object point cloud data acquisition is basically consistent with the step S1, but does not have category labels of the data, and single data are acquired one by one;

s6, based on the feature extraction of the quantized deep learning model of the DSP chip, the model is quantized in the step S3, the weight is represented by using an 8-bit integer, the feature value of each layer is represented by using an 8-bit integer in the forward propagation of the model, the mapping relation between the model and the original floating point representation is R=S (Q-Z), R represents 32-bit floating point numbers, Q represents 8-bit integer numbers, S is mapping scaling, and Z is mapping zero offset;

convolution multiplication, matrix multiplication and addition operation are carried out in the calculation of model forward propagation inference, and a vsswmac5 instruction in a DSP instruction set is called to complete matrix multiplication with the parallelism of 32 times; the 1x1 convolution can be directly converted into a matrix multiplication operation, so this instruction is also used to implement the 1x1 convolution; other convolution operations use the vswmac5 instruction, with 8 times parallelism; residual addition and general addition use vadd instruction, have 32 times parallelism; in general, a DSP parallel computing acceleration module is applied to improve the computing parallelism of forward propagation of the model;

inputting the data acquired in the step S5 into a three-dimensional data feature extractor, and performing forward propagation calculation with 8-32 times of parallelism to obtain the features of the data;

s7, decoding the features calculated in the step S6 by using a feature decoder according to a computer vision recognition task method, namely obtaining indexes of elements with maximum feature values;

s8, inquiring the index-label table to obtain the label of the identified object, and inquiring the index value obtained in the step S7 in the index-label table established in the step S4 to obtain the label of the object acquired in the step S5.

Further, in the step S2, if a quantization method during training is used, then an tf.control.size_training_graph interface is called, and the interface automatically adds pseudo quantization nodes after some specific structures, including convolution, matrix multiplication, activation and residual addition; for other operations, including general addition, multiplication and matrix multiplication without parameters, the interface does not add pseudo-quantization nodes after the operations, so that the implementation of the interface needs to be modified, so that the interface can correctly add the pseudo-quantization nodes after all operation nodes, the interface only supports convolution, full connection, residual error and other operations currently, the quantization implementation of general addition, matrix multiplication and the like is not realized, and the implementation of quantization convolution needs to be custom modified according to the implementation of the quantization convolution; if a post-training quantization method is used, a general training method without pseudo quantization nodes is used.

Further, in the step S3, if a quantization method during training is used, then an tf.confrib.size.create_eval_graph interface is called, and quantization implementation of general addition and matrix multiplication which are not supported by the interface needs custom modification; then the weight is fixed, and then a toco overt script is used to convert the model into a TFLITE format, and the parameter type is QUANTIZED_UINT8, so as to obtain a QUANTIZED model; if a method of quantization after training is used, a pseudo-quantization node is manually added after weight, activation, residual addition, general addition and matrix multiplication nodes, a plurality of data are fed into the model to obtain maximum and minimum values of the activation and residual addition nodes, the corresponding pseudo-quantization nodes are written, then the weight is immobilized, and then a toco overt script is used to convert the model into a TFLITE format, and the parameter type is QUANTIZED_UINT8, so that the quantization model is obtained.

Compared with the prior art, the beneficial effects are that: the three-dimensional object identification method based on the DSP chip and the quantization model combines the advantages of the existing model quantization and the characteristic extractor based on the DSP chip, combines the model file size of the storage parameters in three-dimensional point cloud data applied to an object, shortens the forward propagation deducing speed by 3/4, and accelerates the highest parallel ratio of instructions by 32 times, namely the CPU instructions of 32 x86 frames can be completed on the DSP chip by 1 instruction, so that the highest speed-up ratio is 32 times in theory, and can be transplanted to mobile terminals such as mobile phones or some embedded chip devices. By applying the method provided by the invention, only the equipment is provided with the data collector for collecting the point cloud, the simple CPU processor and the DSP chip, the feature extraction of the three-dimensional point cloud data can be completed in the equipment, and the feature can be decoded to complete the visual task. In addition, the privacy can be protected by using the three-dimensional point cloud data of the object rather than the two-dimensional image data, because the point cloud data cannot restore the texture details of the object, the collector with the set collection distance cannot collect background information, the two-dimensional image contains the information, and the collection of redundant privacy information is difficult to avoid.

Drawings

FIG. 1 is a flow chart of the overall method of the present invention.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the invention; for the purpose of better illustrating the embodiments, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the actual product dimensions; it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted. The positional relationship described in the drawings are for illustrative purposes only and are not to be construed as limiting the invention.

As shown in fig. 1, a three-dimensional object recognition method based on a DSP chip and a quantization model includes the following steps:

step 1, acquiring training data, wherein the acquisition method is to acquire depth information of an object by using a camera with RGB-D function, and obtain point cloud data of the object after background and noise information are removed.

And 2, training the model, namely training the training data by using a deep learning network structure capable of processing point cloud data end to obtain the model. The training data are the data collected in step 1, and the network structure is the baseline variant of DGCNN.

Step 2.1. If the quantization-in-training method is used, then the tf. Control. Quantization. Create_training_graph interface is called, which automatically adds pseudo-quantization nodes after some specific structures, including convolution, matrix multiplication, activation, residual addition, etc. For other operations, such as general addition, multiplication, matrix multiplication without parameters, etc., the interface does not join the pseudo-quantization node after these operations, and therefore the implementation of the interface needs to be modified so that the interface can join the pseudo-quantization node correctly after all the operation nodes. The interface only supports convolution, full connection, residual error and other operations at present, and does not realize the quantization realization of general addition, matrix multiplication and the like, and the realization of quantization convolution needs to be customized and modified according to the quantization realization.

If the method of quantization after training is used, a general training mode without pseudo quantization nodes is used.

And 3, quantizing the model, wherein in the step 2, a pseudo quantizing node is used, and the nature of the pseudo quantizing node is also a floating point operation, namely a non-quantizing model, in the step, the information of the pseudo quantizing node is combined to the previous node, so that the model is truly quantized, the representing method of the model parameters is changed from 32-bit floating point numbers to 8-bit integers, the model size is compressed, and meanwhile, the forward propagation calculation efficiency is improved.

Step 3.1. If the quantization method during training is used, the tf.confrib.size.create_eval_graph interface is called at this time, and furthermore, the quantization implementation of general addition, matrix multiplication and the like which are not supported by the interface needs to be customized modification. And then fixing the weight, and converting the model into a TFLITE format by using a toco overt script, wherein the parameter type is QUANTIZED_UINT8, so as to obtain the QUANTIZED model.

And 3.2, if a quantization method after training is used, manually adding pseudo quantization nodes after nodes such as weight, activation, residual error addition, general addition, matrix multiplication and the like, feeding a plurality of data into the model to obtain the maximum value and the minimum value of the nodes such as the activation, the residual error addition and the like, and writing the maximum value and the minimum value into the corresponding pseudo quantization nodes. And then fixing the weight, and converting the model into a TFLITE format by using a toco overt script, wherein the parameter type is QUANTIZED_UINT8, so as to obtain the QUANTIZED model.

And 4, establishing an index-label table, wherein the category of the data is identified at the same time when the data is acquired in the step 1, and the label is represented by using a natural number index to obtain a table with one-to-one correspondence between indexes and labels.

And 5, acquiring object point cloud data, wherein the object point cloud data is basically consistent with the object point cloud data in the step 1, but does not have category labels of the data, and is acquired one by single data.

And 6, based on the feature extraction of the quantized deep learning model of the DSP chip, the model is quantized in the step 3, the weight is represented by using an 8-bit integer, each layer of feature value is represented by using an 8-bit integer in the forward propagation of the model, the mapping relation between the model and the original floating point representation is R=S (Q-Z), R represents 32-bit floating point numbers, Q represents 8-bit integer numbers, S is mapping scaling, and Z is mapping zero point offset.

The calculation of model forward propagation inference includes convolution multiplication, matrix multiplication, addition and other operations, and the matrix multiplication with the parallelism of 32 times can be completed by calling the vsswmac5 instruction in the DSP instruction set; the 1x1 convolution can be directly converted into a matrix multiplication operation, so this instruction is also used to implement the 1x1 convolution; other convolution operations use the vswmac5 instruction, with 8 times parallelism; residual addition uses vadd instructions with 32 times parallelism with normal addition. In general, by applying the DSP parallel computing acceleration module, the computing parallelism of the model forward propagation is improved by 8 to 32 times.

The data acquired in the step 5 is input into the feature extractor, and the features of the data are obtained through forward propagation calculation with 8-32 times of parallelism.

And 7, decoding the features calculated in the step 6 according to a computer vision recognition task method, namely obtaining the index of the element with the largest feature value.

And 8, inquiring the index-label table to obtain the label of the identified object, and inquiring the index value obtained in the step 7 in the index-label table established in the step 4 to obtain the label of the object acquired in the step 5.

It is to be understood that the above examples of the present invention are provided by way of illustration only and not by way of limitation of the embodiments of the present invention. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.

Claims

1. The three-dimensional object recognition method based on the DSP chip and the quantization model is characterized by comprising a three-dimensional data acquisition device, a three-dimensional data feature extractor and a feature decoder, wherein the three-dimensional data feature extractor comprises a quantization parameter model storage module and a DSP parallel computing acceleration module; the three-dimensional data acquisition device is an RGB-D camera, depth information of an object in a picture is obtained after shooting, and finally the three-dimensional data acquisition device is synthesized into point cloud data; inputting the point cloud data into a three-dimensional data feature extractor, wherein a quantization parameter model storage module in the feature extractor is used for storing storage parameters of a quantization model, and quickly completing convolution, pooling and residual operation in a deep neural network by utilizing acceleration characteristics of a DSP parallel computing acceleration module to finally obtain features of input data; the feature decoder reversely decodes the feature in a feature encryption mode according to model training to obtain a required feature format;

when the quantization parameter model storage module is used for storing parameters, a quantization model needs to be acquired, and the acquisition method of the quantization model comprises quantization during training and quantization after training;

then, when the model parameters are fixed, calling the tf.control.size.create_eval_graph interface and realizing the function of adding a pseudo quantization node after the normal operation, which is not supported by the interface; then, fixing model parameters, calling a toco overt script after the parameters are fixed, and combining information of a general node and a pseudo quantization node corresponding to the general node to obtain a quantization model;

2. The three-dimensional object recognition method based on the DSP chip and the quantization model according to claim 1, wherein the method for DSP parallel acceleration in the DSP parallel computing acceleration module comprises the following steps: when a DSP instruction is used, a direct memory access DMA technology is used, data needing to be operated are directly put into a DSP chip memory, and after the operation is finished, DMA is used again to extract the data from the DSP memory to a device memory, wherein a CPU (Central processing Unit) using an ARM framework and the DSP chip work simultaneously, namely, when the DSP calculates, the CPU simultaneously extracts the result of the last DSP operation in the DMA, the data needing to be used by the next DSP is ready to be written into the DSP memory, and once the DSP chip finishes the calculation, DMA writing is immediately executed.

3. The three-dimensional object recognition method based on the DSP chip and the quantization model according to claim 2, wherein the three-dimensional object recognition method based on the DSP chip and the quantization model specifically comprises the following steps:

4. A method for three-dimensional object recognition based on DSP chips and quantization models according to claim 3, wherein in step S2, if a quantization method during training is used, then the tf.confrib.quate.create_training_graph interface is called, and the interface automatically adds pseudo quantization nodes after some specific structures, including convolution, matrix multiplication, activation and residual addition; for other operations, including general addition, multiplication, matrix multiplication without parameters, the interface will not join the pseudo-quantization node after these operations, so the implementation of the interface needs to be modified so that the interface can join the pseudo-quantization node correctly after all operation nodes; if a post-training quantization method is used, a general training method without pseudo quantization nodes is used.

5. The three-dimensional object recognition method based on the DSP chip and the quantization model according to claim 3, wherein in the step S3, if a quantization method during training is used, then a tf.control.quantization.create_eval_graph interface is called, and the quantization implementation of general addition and matrix multiplication which are not supported by the interface needs to be customized; then the weight is fixed, and then a toco overt script is used to convert the model into a TFLITE format, and the parameter type is QUANTIZED_UINT8, so as to obtain a QUANTIZED model; if a method of quantization after training is used, a pseudo-quantization node is manually added after weight, activation, residual addition, general addition and matrix multiplication nodes, a plurality of data are fed into the model to obtain maximum and minimum values of the activation and residual addition nodes, the corresponding pseudo-quantization nodes are written, then the weight is immobilized, and then a toco overt script is used to convert the model into a TFLITE format, and the parameter type is QUANTIZED_UINT8, so that the quantization model is obtained.