CN113822212B

CN113822212B - Embedded object recognition method and device

Info

Publication number: CN113822212B
Application number: CN202111138968.7A
Authority: CN
Inventors: 张红良; 李广明; 余晨晖; 张红; 罗嘉琦
Original assignee: Dongguan University of Technology
Current assignee: Dongguan University of Technology
Priority date: 2021-09-27
Filing date: 2021-09-27
Publication date: 2024-01-05
Anticipated expiration: 2041-09-27
Also published as: CN113822212A

Abstract

The invention discloses an embedded object identification method and device, which relate to the technical field of embedded artificial intelligence, and the method comprises the following steps: collecting and processing color images of objects to obtain processed image data; training and testing the constructed Combine-MobileNet neural network by using the processed image data to obtain a trained Combine-MobileNet neural network; storing and loading the trained Combine-MobileNet neural network on an embedded platform; inputting the image data of the object to be identified into the embedded platform, and carrying out real-time reasoning on the category of the object to be identified to obtain an identification result. The Combine-MobileNet neural network constructed by the invention has the advantages of simple structure, low calculation cost and high accuracy, is loaded on an embedded platform, and can realize accurate identification of objects in low-resource and low-cost environments.

Description

Embedded object recognition method and device

Technical Field

The invention relates to the technical field of embedded artificial intelligence, in particular to an embedded object identification method and device.

Background

The embedded artificial intelligence is a technical concept of applying an artificial intelligence algorithm to terminal equipment, so that various equipment can finish the functions of environment sensing, man-machine interaction and the like under the condition of no networking. The embedded system is an important bearing platform for artificial intelligence technology, is used in automatic sorting robots, automatic delivery vehicles and the like in the field of logistics service, and is used in the new application of combination of artificial intelligence and embedded type, such as face recognition, fingerprint recognition, intelligent cameras and the like in the safety precaution neighborhood, automatic parking, automatic vehicle recognition, intelligent parking lots and the like in the urban traffic neighborhood, and medical service case diagnosis, intelligent disinfection robots and the like. The deep neural network for the embedded system has very high requirements on computing capacity and resources, resulting in increased system power consumption; processors supporting deep neural network acceleration are typically complex socs integrating multiple architectures, with very high usage costs. The existing embedded object identification adopts a chip with extremely high complexity and high cost, which is not beneficial to learning and use; the method is difficult to realize on a low-resource and low-cost chip, is not suitable for a single task, and is easy to cause resource waste; the embedded object recognition can be realized on a chip with low resources, and the adopted model is simple, the training strategy and the evaluation strategy are single, so that the accuracy of the embedded object recognition is lower.

Published chinese patent application CN113138789a, 7 in 2021 and 20, provides an embedded object recognition system comprising: the system comprises a program updating module, a camera module, a display screen module, a tri-color lamp module and a main control chip; the main control chip is respectively connected with the program updating module, the camera module, the display screen module and the tri-color lamp module; and the main control chip performs program updating according to the input of the program updating module, receives the image data acquired by the camera module, performs image compression, input standardization and image recognition, and displays the result on the display screen module. The invention can only identify the numbers 0-9, has larger limitation, needs high storage environment and calculation cost when identifying other complex objects, and cannot realize accurate identification otherwise.

Disclosure of Invention

The invention provides an embedded object recognition method and device for overcoming the defect that the existing embedded object recognition technology cannot accurately recognize objects in a low-resource and low-cost environment, and can accurately recognize objects in the low-resource and low-cost environment.

In order to solve the technical problems, the technical scheme of the invention is as follows:

the invention provides an embedded object identification method, which comprises the following steps:

s1: collecting a color image of an object;

s2: processing the color image to obtain processed image data;

s3: training and testing the constructed Combine-MobileNet neural network by using the processed image data to obtain a trained Combine-MobileNet neural network;

s4: storing and loading the trained Combine-MobileNet neural network on an embedded platform;

s5: inputting the image data of the object to be identified into the embedded platform, and carrying out real-time reasoning on the category of the object to be identified to obtain an identification result.

Preferably, in the step S2, the specific steps of processing the color image are:

s2.1: converting the color image into a gray scale image;

s2.2: randomly dividing the gray level image into a training image and a test image;

s2.3: performing data enhancement operation on the training image to obtain an enhanced training image;

s2.4: and (5) downsampling the enhanced training image to obtain a downsampled training image.

Preferably, in the step S2.3, the data enhancement operation performed on the training image includes: rotation, clipping, translation, and gaussian noise.

And (3) performing rotation, cutting, translation and Gaussian noise operation on each training image, and enhancing a plurality of enhanced training images by one training image, so that the scale and complexity of training data are increased, and the accuracy of a network is improved.

Preferably, in step S2.4, the downsampling operation of the enhanced training image includes sequentially performing a tie pooling operation and a max pooling operation on the enhanced training image.

And carrying out one tie pooling operation and one maximum pooling operation on each enhanced training image in sequence, so that the size of training data is reduced, and the calculation cost is reduced.

Preferably, in the step S3, training and testing the constructed Combine-MobileNet neural network by using the processed image data, and the specific method for obtaining the trained Combine-MobileNet neural network is as follows:

s3.1: setting a loss function, an optimal loss function value replacement frequency threshold and a maximum training frequency of a Combine-MobileNet neural network;

s3.2: inputting the downsampled training image into a Combine-MobileNet neural network, and calculating a loss function value loss of the downsampled training image by using cross entropy;

s3.3: setting an early-stop strategy, namely comparing the loss function value of the downsampled training image with the optimal loss function value, replacing the optimal loss function value with the loss function value when the loss function value is larger than the optimal loss function value, and recording the replacement times;

s3.4: comparing the replacement times with the optimal loss function value replacement times threshold, and performing the next training when the replacement times are smaller than the optimal loss function value replacement times threshold; otherwise, completing the training of the Combine-MobileNet neural network;

s3.5: and inputting the test image into the trained Combine-MobileNet neural network for testing, and obtaining the trained Combine-MobileNet neural network.

Preferably, in the step S3, the Combine-MobileNet neural network includes a first standard convolution layer, a second standard convolution layer, a first depth separable convolution layer, a second depth separable convolution layer, a first full-connection layer, a second full-connection layer, a feature fusion layer, an average pooling layer, and a third full-connection layer;

the output end of the first standard convolution layer is connected with the input end of the second standard convolution layer, and the output end of the second standard convolution layer is respectively connected with the input ends of the first depth separable convolution layer and the second depth separable convolution layer;

the output end of the first depth separable convolution layer is connected with the input end of the first full-connection layer, and the output end of the first full-connection layer is connected with the input end of the feature fusion layer; the output end of the second depth separable convolution layer is connected with the input end of the second full-connection layer, and the output end of the second full-connection layer is connected with the input end of the feature fusion layer;

the output end of the characteristic fusion layer is connected with the input end of the average pooling layer, and the output end of the average pooling layer is connected with the input end of the third full-connection layer.

Inputting the downsampled training image into a Combine-MobileNet neural network, and obtaining a first characteristic v through a first standard convolution layer of 3x3 ₁ The method comprises the steps of carrying out a first treatment on the surface of the First feature v ₁ After passing through a second standard convolution layer of 1x1, a second characteristic v is obtained ₂ The method comprises the steps of carrying out a first treatment on the surface of the Second characteristic v ₂ Obtaining a third feature v by a 3x3 first depth separable convolutional layer of step size 2 ₃ At the same time, a fourth feature v is obtained by a 3x3 second depth separable convolution layer with a step size of 1 ₄ The method comprises the steps of carrying out a first treatment on the surface of the Third feature v ₃ Input to the feature fusion layer through the first full connection layer, the fourth feature v ₄ Input to the feature fusion layer through the second full connection layer, and merged into a fifth feature v ₅ The method comprises the steps of carrying out a first treatment on the surface of the Final fifth feature v ₅ Outputting a sixth feature v after passing through the 3x3 average pooling layer and the third full connection layer ₆ Sixth feature v ₆ Is the number of categories that identify the object.

Preferably, in the step S4, the specific method for storing and loading the trained Combine-MobileNet neural network onto the embedded platform is as follows:

s4.1: saving the trained Combine-MobileNet neural network as an H5 file;

s4.2: analyzing the H5 file to obtain matrixing network parameters of the Combine-MobileNet neural network;

s4.3: creating two c-language files of a model_init.c and a model_init.h, and writing matrixing network parameters into the model_init.h file according to a data stream form;

s4.4: the corresponding old file in the embedded platform engineering file is replaced with "model_init.c" and "model_init.h".

The image processing and the network training are carried out at the PC end, but the network parameters of the Combine-MobileNet neural network cannot be processed by the embedded platform, and the network parameters need to be converted into a matrix form which can be processed by the embedded platform. The data format in the H5 file is a tree structure and is divided into weight and bias, wherein the expression form of a convolution kernel element a of an H row and a w column of an n-th dimension of the layer1 network is as follows: a=layer1 (n, h, w), the data representation of the kth offset b of the layer1 network is: b=layer1 (bias, k). And after replacing the corresponding old file in the engineering file of the embedded platform by the model_init.c and the model_init.h, modifying the object class name defined in the engineering file of the embedded platform into the current trained object class name.

Preferably, in the step S5, real-time reasoning is performed on the object to be identified based on the carefully chosen strategy, so as to obtain an identification result; when the category of an object is inferred, image data of the object at different times are acquired for multiple inferences, and an inference result with the largest occurrence number or probability is selected as an object identification result.

And the competitive strategy uses a strategy of time exchange accuracy, and the influence of human factors and equipment on the reasoning accuracy is reduced by comprehensively evaluating the multiple reasoning results.

Preferably, the embedded platform is an STM 32-based embedded platform.

The invention also provides an embedded object recognition device, which comprises:

the data acquisition module is used for acquiring color images of the object;

the data processing module is used for processing the color image to obtain processed image data;

the network training test module is used for training and testing the constructed Combine-MobileNet neural network by using the processed image data to obtain a trained Combine-MobileNet neural network;

the network loading module is used for storing and loading the trained Combine-MobileNet neural network onto the embedded platform;

the reasoning and identifying module is used for inputting the image data of the object to be identified into the embedded platform, and carrying out real-time reasoning on the category of the object to be identified to obtain an identification result.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

the invention processes the color image, uses the processed image data as training data, increases the scale and complexity of the data, and is helpful for improving the accuracy of the Combine-MobileNet neural network during training; the constructed Combine-MobileNet neural network has the advantages of simple structure, low calculation cost and high accuracy; the trained Combine-MobileNet neural network is stored and loaded on the embedded platform, so that the object can be accurately identified in a low-resource and low-cost environment.

Drawings

Fig. 1 is a flowchart of an embedded object recognition method according to embodiment 1.

FIG. 2 is a block diagram of a Combine-MobileNet neural network as described in example 1.

Fig. 3 is a structural diagram of an embedded object recognition device according to embodiment 2.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the present patent;

for the purpose of better illustrating the embodiments, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the actual product dimensions;

it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The technical scheme of the invention is further described below with reference to the accompanying drawings and examples.

Example 1

The invention provides an embedded object identification method, as shown in figure 1, comprising the following steps:

s1: collecting a color image of an object;

the collected color image of the object is a 320×240 RGB256 format color image, each pixel occupies 16-bit storage space, and red, green and blue respectively occupy 0-4 bits, 5-10 bits and 11-15 bits;

s2: processing the color image to obtain processed image data;

the specific steps of processing the color image are as follows

S2.1: converting the color image into a gray scale image;

converting the three-dimensional color image into a single-channel gray image according to a calculation mode of Y=0.3 R+0.59G+0.11B, wherein Y represents gray, R represents red, G represents green, B represents blue, and using "start" to represent the beginning of one gray picture, and "end" to represent the end of one gray picture;

in this embodiment, the gray scale image is set according to 8:2, randomly dividing the ratio into a training image and a test image;

the data enhancement operation includes: rotation, cutting, translation and Gaussian noise, a plurality of enhanced training images are enhanced by one training image, the scale and complexity of training data are increased, and the accuracy of the subsequent training of the network is improved;

s2.4: downsampling the enhanced training image to obtain a downsampled training image;

and carrying out one tie pooling operation and one maximum pooling operation on the enhanced training image in sequence, so that the size of training data is reduced, and the calculation cost is reduced.

as shown in fig. 2, the Combine-MobileNet neural network includes a first standard convolution layer, a second standard convolution layer, a first depth separable convolution layer, a second depth separable convolution layer, a first full-connection layer, a second full-connection layer, a feature fusion layer, an average pooling layer, and a third full-connection layer;

the output end of the characteristic fusion layer is connected with the input end of the average pooling layer, and the output end of the average pooling layer is connected with the input end of the third full-connection layer;

the specific method for obtaining the trained Combine-MobileNet neural network comprises the following steps:

s3.1: setting a loss function, an optimal loss function value replacement frequency threshold and a maximum training frequency of a Combine-MobileNet neural network; in this embodiment, the threshold value of the optimal loss function value is 10 times, the maximum training time is 10000 times, and the optimal loss function value is set according to the need;

s3.4: comparing the replacement times with the optimal loss function value replacement times threshold, and performing the next training when the replacement times are smaller than the optimal loss function value replacement times threshold; otherwise, completing the training of the Combine-MobileNet neural network; in this embodiment, when the loss function value does not decrease further after 10 times, training is completed;

s3.5: inputting the test image into the trained Combine-MobileNet neural network for testing to obtain a trained Combine-MobileNet neural network;

the Pytorch frame is adopted to build a Combine-MobileNet model which is improved based on MobileNet-V2, depth separable convolution is adopted to build the model, features extracted by the depth separable convolution with the step length of 1 and the step length of 2 in MobileNet-V2 are fused, two features are fully utilized by 2, the identification accuracy is improved, and meanwhile, the 7x7 average pooling layer of MobileNet-V2 is replaced by 3x3 average pooling, so that the calculation cost is reduced.

the loading step comprises the following steps:

s4.1: saving the trained Combine-MobileNet neural network as an H5 file;

The image processing and the network training are carried out at the PC end, but the network parameters of the Combine-MobileNet neural network cannot be processed by the embedded platform, and the network parameters need to be converted into a matrix form which can be processed by the embedded platform. The data format in the H5 file is in a tree structure and is divided into weight and bias, and matrixing network parameters are obtained after analysis: the h row w column of the nth dimension of the layer1 network's convolution kernel element a behaves as: a=layer1 (n, h, w), the data of the kth offset term b of the layer1 network appears as: b=layer1 (bias, k). And after replacing the corresponding old file in the engineering file of the embedded platform by the model_init.c and the model_init.h, modifying the object class name defined in the engineering file of the embedded platform into the current trained object class name.

And carrying out real-time reasoning on the object to be identified based on the carefully chosen strategy to obtain an identification result. The carefully chosen strategy is a strategy for using time to exchange accuracy, and the influence of human factors and equipment on the reasoning accuracy is reduced by comprehensively evaluating the multiple reasoning results; specifically, when each time a category of an object is inferred, image data of the object at different times is acquired for multiple inferences, and an inference result with the largest occurrence number or probability is selected as a recognition result of the object. In this embodiment, image data of three different times of the object is acquired to perform multiple reasoning, and three reasoning results with the largest occurrence number or the largest probability are selected as final recognition results, so that the object recognition results are more accurate.

In the actual operation process, installing a Visual Studio 2019 and building an image acquisition system; installing AHL-GEC-IDE (4.08) and constructing an embedded engineering development platform; installing a TT-USB serial port (CH 340) driver to realize communication between the embedded platform and a pc end; jettheinspycharm 2019.1.1x64 is installed to implement object authentication system functions. Firstly, importing an engineering file, and importing the modified engineering file into a compiler; compiling the engineering file; the embedded platform is connected with the PC end through a port; loading the compiled engineering file to the embedded platform through the port; and reasoning is carried out on the acquired image data of the identification object, and the identification result is displayed on a display screen of the embedded platform.

Example 2

The present embodiment provides an embedded object recognition apparatus, as shown in fig. 3, including:

the data acquisition module is used for acquiring color images of the object;

It is to be understood that the above examples of the present invention are provided by way of illustration only and not by way of limitation of the embodiments of the present invention. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.

Claims

1. An embedded object recognition method, comprising:

s1: collecting a color image of an object;

s2: processing the color image to obtain processed image data;

the Combine-MobileNet neural network comprises a first standard convolution layer, a second standard convolution layer, a first depth separable convolution layer, a second depth separable convolution layer, a first full-connection layer, a second full-connection layer, a feature fusion layer, an average pooling layer and a third full-connection layer;

2. The embedded object recognition method according to claim 1, wherein in the step S2, the specific steps of processing the color image are:

s2.1: converting the color image into a gray scale image;

3. The embedded object recognition method according to claim 2, wherein in the step S2.3, the data enhancement operation performed on the training image includes: rotation, clipping, translation, and gaussian noise.

4. The embedded object recognition method according to claim 3, wherein the step S2.4 of downsampling the enhanced training image comprises sequentially performing a tie pooling operation and a max pooling operation on the enhanced training image.

5. The embedded object recognition method according to claim 4, wherein in the step S3, the method for training and testing the constructed Combine-MobileNet neural network by using the processed image data comprises the following steps:

6. The embedded object recognition method according to claim 1, wherein in the step S4, the specific method for saving and loading the trained Combine-MobileNet neural network onto the embedded platform is as follows:

s4.1: saving the trained Combine-MobileNet neural network as an H5 file;

7. The embedded object recognition method according to claim 1, wherein in the step S5, real-time reasoning is performed on the category of the object to be recognized based on the selection strategy, so as to obtain a recognition result; when the category of an object is inferred, image data of the object at different times are acquired for multiple inferences, and an inference result with the largest occurrence number or probability is selected as an object identification result.

8. The embedded object recognition method of claim 1, wherein the embedded platform is an STM 32-based embedded platform.

9. An embedded object recognition device, comprising:

the data acquisition module is used for acquiring color images of the object;

the reasoning and identifying module is used for inputting the image data of the object to be identified into the embedded platform, and carrying out real-time reasoning on the category of the object to be identified to obtain the identification result of the object to be identified.