CN113822212A

CN113822212A - Embedded object identification method and device

Info

Publication number: CN113822212A
Application number: CN202111138968.7A
Authority: CN
Inventors: 李广明; 张红良; 余晨晖; 张红; 罗嘉琦
Original assignee: Dongguan University of Technology
Current assignee: Dongguan University of Technology
Priority date: 2021-09-27
Filing date: 2021-09-27
Publication date: 2021-12-21
Anticipated expiration: 2041-09-27
Also published as: CN113822212B

Abstract

The invention discloses an embedded object identification method and device, relating to the technical field of embedded artificial intelligence, wherein the method comprises the following steps: collecting and processing a color image of an object to obtain processed image data; training and testing the constructed combination-MobileNet neural network by using the processed image data to obtain a trained combination-MobileNet neural network; storing and loading the trained combination-MobileNet neural network on an embedded platform; and inputting the image data of the object to be recognized into the embedded platform, and performing real-time reasoning on the category of the object to be recognized to obtain a recognition result. The combination-MobileNet neural network constructed by the invention has the advantages of simple structure, low calculation cost and high accuracy, and can realize accurate identification of objects under the environment of low resource and low cost when being loaded on an embedded platform.

Description

Embedded object identification method and device

Technical Field

The invention relates to the technical field of embedded artificial intelligence, in particular to an embedded object identification method and device.

Background

The embedded artificial intelligence is a technical concept of applying an artificial intelligence algorithm to terminal equipment, so that various equipment can complete functions of environment perception, man-machine interaction and the like under the condition of no networking. The embedded system is an important bearing platform of artificial intelligence technology, and in the field of logistics service, automatic sorting robots, automatic delivery vehicles and the like, face recognition, fingerprint recognition, intelligent cameras and the like of safety precaution neighborhoods, automatic parking, automatic vehicle recognition, intelligent parking lots and the like of urban traffic neighborhoods, case diagnosis of medical services, intelligent disinfection robots and the like are all new applications generated by combining artificial intelligence and embedding. Deep neural networks for embedded systems have very high requirements on computing power and resources, resulting in an increase in system power consumption; processors that support deep neural network acceleration are typically complex socs that integrate multiple architectures, which can be very costly to use. The existing embedded object recognition adopts a chip with extremely high complexity and high cost, and is not beneficial to learning and use; the method is difficult to realize on a chip with low resources and low cost, is not suitable for a single task, and is easy to cause resource waste; the embedded object recognition can be realized on a low-resource chip, the adopted model is simple, and the training strategy and the evaluation strategy are single, so that the accuracy of the embedded object recognition is lower.

Chinese patent application CN113138789A published on 20/7/2021 provides an embedded object recognition system, comprising: the system comprises a program updating module, a camera module, a display screen module, a three-color lamp module and a main control chip; the main control chip is respectively connected with the program updating module, the camera module, the display screen module and the three-color lamp module; and the main control chip updates the program according to the input of the program updating module, receives the image data acquired by the camera module, performs image compression, input standardization and image identification, and displays the result on the display screen module. The method can only identify the number of 0-9, has large limitation, and needs high storage environment and calculation cost when identifying other complex objects, otherwise, accurate identification cannot be realized.

Disclosure of Invention

The invention provides an embedded object identification method and device for overcoming the defect that the existing embedded object identification technology cannot accurately identify objects in a low-resource and low-cost environment, and can accurately identify the objects in the low-resource and low-cost environment.

In order to solve the technical problems, the technical scheme of the invention is as follows:

the invention provides an embedded object identification method, which comprises the following steps:

s1: collecting a color image of an object;

s2: processing the color image to obtain processed image data;

s3: training and testing the constructed combination-MobileNet neural network by using the processed image data to obtain a trained combination-MobileNet neural network;

s4: storing and loading the trained combination-MobileNet neural network on an embedded platform;

s5: and inputting the image data of the object to be recognized into the embedded platform, and performing real-time reasoning on the category of the object to be recognized to obtain a recognition result.

Preferably, in step S2, the specific steps of processing the color image are as follows:

s2.1: converting the color image into a gray image;

s2.2: randomly dividing the gray level image into a training image and a test image;

s2.3: performing data enhancement operation on the training image to obtain an enhanced training image;

s2.4: and carrying out downsampling on the enhanced training image to obtain a downsampled training image.

Preferably, in step S2.3, the data enhancement operation performed on the training image includes: rotation, clipping, translation, and gaussian noise.

And each training image is subjected to rotation, cutting, translation and Gaussian noise operation, and a plurality of enhanced training images are enhanced from one training image, so that the scale and complexity of training data are increased, and the accuracy of the network is improved.

Preferably, in step S2.4, the down-sampling operation performed on the enhanced training images includes performing a tie pooling operation and a max pooling operation on the enhanced training images in sequence.

And performing a tie pooling operation and a maximum pooling operation on each enhanced training image in sequence, so as to reduce the size of training data and reduce the calculation cost.

Preferably, in step S3, the method for training and testing the constructed combination-MobileNet neural network by using the processed image data to obtain the trained combination-MobileNet neural network includes:

s3.1: setting a loss function, an optimal loss function value replacement time threshold and a maximum training time of the combination-MobileNet neural network;

s3.2: inputting the downsampling training image into a combination-MobileNet neural network, and calculating a loss function value loss of the downsampling training image by using cross entropy;

s3.3: setting an early-stopping strategy, namely comparing the loss function value of the down-sampling training image with the optimal loss function value, replacing the optimal loss function value with the loss function value when the loss function value is larger than the optimal loss function value, and recording the replacement times;

s3.4: comparing the replacement times with the replacement time threshold value of the optimal loss function value, and performing the next round of training when the replacement times are smaller than the replacement time threshold value of the optimal loss function value; otherwise, completing the training of the combination-MobileNet neural network;

s3.5: inputting the test image into the trained combination-MobileNet neural network for testing to obtain the trained combination-MobileNet neural network.

Preferably, in step S3, the combination-MobileNet neural network includes a first standard convolutional layer, a second standard convolutional layer, a first depth separable convolutional layer, a second depth separable convolutional layer, a first fully-connected layer, a second fully-connected layer, a feature fusion layer, an average pooling layer, and a third fully-connected layer;

the output end of the first standard convolution layer is connected with the input end of the second standard convolution layer, and the output end of the second standard convolution layer is respectively connected with the input ends of the first depth separable convolution layer and the second depth separable convolution layer;

the output end of the first depth separable convolution layer is connected with the input end of the first full connection layer, and the output end of the first full connection layer is connected with the input end of the characteristic fusion layer; the output end of the second depth separable convolution layer is connected with the input end of the second full-connection layer, and the output end of the second full-connection layer is connected with the input end of the characteristic fusion layer;

the output end of the characteristic fusion layer is connected with the input end of the average pooling layer, and the output end of the average pooling layer is connected with the input end of the third full-connection layer.

Inputting the downsampled training image into a combination-MobileNet neural network, and obtaining a first feature v through a first standard convolutional layer of 3x3₁(ii) a First characteristic v₁Obtaining a second feature v after passing through a second standard convolution layer of 1x1₂(ii) a Second characteristic v₂Obtaining a third feature v by a first depth separable convolutional layer of 3x3 with a step size of 2₃While also obtaining a fourth feature v by a second depth separable convolutional layer of 3x3 with a step size of 1₄(ii) a Third feature v₃Input to the feature fusion layer through the first fully-connected layer, fourth feature v₄Input into the feature fusion layer through the second full connection layer and are merged into a fifth feature v₅(ii) a Last fifth feature v₅Outputting a sixth feature v after averaging the pooling layer and the third fully-connected layer by 3x3₆Sixth feature v₆Is the number of classes that identify the object.

Preferably, in step S4, the specific method for storing and loading the trained combination-MobileNet neural network on the embedded platform is as follows:

s4.1: saving the trained combination-MobileNet neural network as an H5 file;

s4.2: analyzing the H5 file to obtain the matrixing network parameter of the combination-MobileNet neural network;

s4.3: creating two c-language files of model _ init.c and model _ init.h, and writing the matrixing network parameters into the model _ init.h file according to the form of data stream;

s4.4: and replacing the corresponding old file in the embedded platform engineering file by the model _ init.c and the model _ init.h.

Image processing and network training are performed at a PC (personal computer) end, while the network parameters of the combination-Mobile Net neural network cannot be processed by the embedded platform, and the network parameters need to be converted into a matrix form which can be processed by the embedded platform. The data format in the H5 file is a tree structure, and is divided into weight and bias, wherein the expression form of the convolution kernel element a in the H row and w column of the nth dimension of the layer1 network is: the data of the k-th bias term b of the layer1 network is expressed as follows: layer1(bias, k). And after replacing the corresponding old file in the engineering file of the embedded platform by using the model _ init.c and the model _ init.h, modifying the object class name defined in the engineering file of the embedded platform into the currently trained object class name.

Preferably, in step S5, performing real-time inference on the object to be recognized based on the selection policy to obtain a recognition result; when the class of an object is inferred, image data of the object at different times is acquired to conduct multiple inference, and the inference result with the largest occurrence frequency or probability is selected as the identification result of the object.

And the election strategy uses time to exchange for the accuracy strategy, and comprehensively evaluates a plurality of inference results, so that the influence of human factors and equipment on the inference accuracy is reduced.

Preferably, the embedded platform is an STM 32-based embedded platform.

The present invention also provides an embedded object recognition device, comprising:

the data acquisition module is used for acquiring a color image of an object;

the data processing module is used for processing the color image to obtain processed image data;

the network training test module is used for training and testing the constructed combination-MobileNet neural network by utilizing the processed image data to obtain a trained combination-MobileNet neural network;

the network loading module is used for storing and loading the trained combination-MobileNet neural network to the embedded platform;

and the reasoning identification module is used for inputting the image data of the object to be identified into the embedded platform, and performing real-time reasoning on the category of the object to be identified to obtain an identification result.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

according to the method, the color image is processed, the processed image data is used as training data, the scale and complexity of the data are increased, and the accuracy of the combination-MobileNet neural network is improved during training; the constructed combination-MobileNet neural network has simple structure, low calculation cost and high accuracy; the trained combination-MobileNet neural network is stored and loaded on the embedded platform, and accurate identification of objects can be realized in a low-resource and low-cost environment.

Drawings

Fig. 1 is a flowchart of an embedded object identification method according to embodiment 1.

FIG. 2 is a block diagram of the combination-MobileNet neural network described in example 1.

Fig. 3 is a structural diagram of an embedded object recognition device according to embodiment 2.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the patent;

for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product;

it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.

Example 1

The invention provides an embedded object identification method, as shown in fig. 1, comprising:

s1: collecting a color image of an object;

the collected color image of the object is a 320 multiplied by 240 RGB256 format color image, each pixel point occupies a storage space of 16 bits, and the red, the green and the blue respectively occupy the 0-4 bit, the 5-10 bit and the 11-15 bit;

s2: processing the color image to obtain processed image data;

the specific steps for processing the color image are

S2.1: converting the color image into a gray image;

converting the three-channel three-dimensional color image into a single-channel gray image according to a calculation mode of Y being 0.3R +0.59G +0.11B, wherein Y represents gray, R represents red, G represents green, B represents blue, the start represents the beginning of a gray picture, and the end represents the end of a gray picture;

in this embodiment, the grayscale image is set to 8: 2, randomly dividing the ratio into a training image and a test image;

the data enhancement operation includes: rotating, cutting, translating and Gaussian noise, and enhancing a plurality of enhanced training images from one training image, so that the scale and the complexity of training data are increased, and the accuracy of subsequent network training is improved;

s2.4: down-sampling the enhanced training image to obtain a down-sampled training image;

and performing one-time tie pooling operation and one-time maximum pooling operation on the enhanced training images in sequence, so that the size of training data is reduced, and the calculation cost is reduced.

as shown in fig. 2, the combination-MobileNet neural network includes a first standard convolutional layer, a second standard convolutional layer, a first depth separable convolutional layer, a second depth separable convolutional layer, a first fully-connected layer, a second fully-connected layer, a feature fusion layer, an average pooling layer, and a third fully-connected layer;

the output end of the characteristic fusion layer is connected with the input end of the average pooling layer, and the output end of the average pooling layer is connected with the input end of the third full-connection layer;

the specific method for obtaining the trained combination-MobileNet neural network comprises the following steps:

s3.1: setting a loss function, an optimal loss function value replacement time threshold and a maximum training time of the combination-MobileNet neural network; in this embodiment, the threshold of the number of times of replacement of the optimal loss function value is 10 times, the maximum number of times of training is 10000 times, and the optimal loss function value is set as required;

s3.4: comparing the replacement times with the replacement time threshold value of the optimal loss function value, and performing the next round of training when the replacement times are smaller than the replacement time threshold value of the optimal loss function value; otherwise, completing the training of the combination-MobileNet neural network; in this embodiment, when the loss function value does not continue to decrease after 10 times, the training is completed;

s3.5: inputting the test image into the trained combination-MobileNet neural network for testing to obtain the trained combination-MobileNet neural network;

a combination-Mobile Net model improved based on MobileNet-V2 is built by adopting a Pythrch framework, deep separable convolution is built, features extracted by the deep separable convolution with the step length of 1 and the step length of 2 in the MobileNet-V2 are fused, 2 the two features are fully utilized, the identification accuracy is improved, and meanwhile, the average pooling layer of 7x7 of the MobileNet-V2 is changed into the average pooling of 3x3, so that the calculation cost is reduced.

the loading step comprises the following steps:

s4.1: saving the trained combination-MobileNet neural network as an H5 file;

Image processing and network training are performed at a PC (personal computer) end, while the network parameters of the combination-Mobile Net neural network cannot be processed by the embedded platform, and the network parameters need to be converted into a matrix form which can be processed by the embedded platform. The data format in the H5 file is a tree structure, which is divided into weight and bias, and after analysis, the matrixing network parameters are obtained: the behavior of the convolution kernel element a in the h row w column of the nth dimension of the layer1 network is as follows: a-layer 1(n, h, w), the data of the k-th bias term b of the layer1 network is represented as: layer1(bias, k). And after replacing the corresponding old file in the engineering file of the embedded platform by using the model _ init.c and the model _ init.h, modifying the object class name defined in the engineering file of the embedded platform into the currently trained object class name.

And (4) reasoning the object to be recognized in real time based on a selection strategy to obtain a recognition result. The selection strategy is a strategy of using time to exchange accuracy, and the influence of human factors and equipment on the inference accuracy is reduced by comprehensively evaluating the multiple inference results; specifically, each time the category of an object is inferred, image data of the object at different times is acquired for a plurality of inferences, and the inference result with the largest occurrence frequency or probability is selected as the identification result of the object. In this embodiment, the image data of the object at three different times is acquired to perform multiple inference, and the three inference results with the highest occurrence frequency or the highest probability are selected as the final recognition result, so that the object recognition result is more accurate.

In the actual operation process, installing Visual Studio 2019 and building an image acquisition system; installing AHL-GEC-IDE (4.08) and building an embedded engineering development platform; a TT-USB serial port (CH340) driver is installed to realize the communication between the embedded platform and the pc end; jetbrake pycharm 2019.1.1x64 was installed to implement the object authentication system function. Firstly, importing an engineering file, and importing the modified engineering file into a compiler; compiling the engineering file; connecting the embedded platform and the PC end through a port; loading the compiled engineering file to an embedded platform through a port; and reasoning the acquired image data of the identified object, and displaying the identification result on a display screen of the embedded platform.

Example 2

The present embodiment provides an embedded object recognition apparatus, as shown in fig. 3, including:

the data acquisition module is used for acquiring a color image of an object;

It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. An embedded object recognition method, comprising:

s1: collecting a color image of an object;

s2: processing the color image to obtain processed image data;

2. The embedded object recognition method of claim 1, wherein in step S2, the specific steps of processing the color image are:

s2.1: converting the color image into a gray image;

3. The embedded object recognition method of claim 2, wherein in step S2.3, the data enhancement operation performed on the training image comprises: rotation, clipping, translation, and gaussian noise.

4. The method according to claim 3, wherein the downsampling operation on the enhanced training image in step S2.4 comprises performing a tie pooling operation and a max pooling operation on the enhanced training image in sequence.

5. The method for recognizing an embedded object according to claim 4, wherein in step S3, the constructed combination-MobileNet neural network is trained and tested by using the processed image data, and the specific method for obtaining the trained combination-MobileNet neural network is as follows:

6. The embedded object recognition method of claim 1, wherein in step S3, the combination-MobileNet neural network comprises a first standard convolutional layer, a second standard convolutional layer, a first depth separable convolutional layer, a second depth separable convolutional layer, a first fully-connected layer, a second fully-connected layer, a feature fusion layer, an average pooling layer, and a third fully-connected layer;

7. The method for recognizing an embedded object according to claim 1, wherein in step S4, the specific method for saving and loading the trained combination-MobileNet neural network on the embedded platform is as follows:

s4.1: saving the trained combination-MobileNet neural network as an H5 file;

8. The embedded object recognition method of claim 1, wherein in step S5, the classification of the object to be recognized is inferred in real time based on a selection strategy to obtain a recognition result; when the class of an object is inferred, image data of the object at different times is acquired to conduct multiple inference, and the inference result with the largest occurrence frequency or probability is selected as the identification result of the object.

9. The embedded object recognition method of claim 1, wherein the embedded platform is an STM 32-based embedded platform.

10. An embedded object recognition device, comprising:

the data acquisition module is used for acquiring a color image of an object;

and the reasoning identification module is used for inputting the image data of the object to be identified into the embedded platform, and performing real-time reasoning on the category of the object to be identified to obtain an identification result of the object to be identified.