CN114332666A

CN114332666A - A method and system for image target detection based on lightweight neural network model

Info

Publication number: CN114332666A
Application number: CN202210234758.6A
Authority: CN
Inventors: 刘海英; 孙凤乾; 邓立霞; 郑太恒; 王超平
Original assignee: Qilu University of Technology
Current assignee: Qilu University of Technology
Priority date: 2022-03-11
Filing date: 2022-03-11
Publication date: 2022-04-12

Abstract

The invention proposes an image target detection method and system based on a lightweight neural network model, which belongs to the technical field of image target detection, and solves the problem of inaccurate image recognition at present, including: inputting the path of the picture or video to be detected; The quantitative neural network model calculates the relevant confidence levels of all categories in the received current image, and obtains the final recognition frame by selecting the highest confidence level and drawing it in the original image to complete the detection process. Under the condition of ensuring the accuracy of the model, the present invention greatly improves the running speed of the model, so that the model can be smoothly deployed and applied on small devices and mobile terminals, and meets the real-time and accuracy of smoking detection in daily scenarios.

Description

A method and system for image target detection based on lightweight neural network model

技术领域technical field

本发明属于图像目标检测技术领域，尤其涉及一种基于轻量化神经网络模型的图像目标检测方法及系统。The invention belongs to the technical field of image target detection, and in particular relates to an image target detection method and system based on a lightweight neural network model.

背景技术Background technique

本部分的陈述仅仅是提供了与本发明相关的背景技术信息，不必然构成在先技术。The statements in this section merely provide background information related to the present invention and do not necessarily constitute prior art.

目标检测（Object Detection）的任务是找出图像中所有感兴趣的目标（物体），确定它们的类别和位置，是机器视觉领域的核心问题之一。由于各类物体有不同的外观、形状和姿态，加上成像时光照、遮挡等因素的干扰，目标检测一直是机器视觉领域最具有挑战性的问题。The task of Object Detection is to find out all the objects of interest (objects) in the image and determine their category and position, which is one of the core problems in the field of machine vision. Due to the different appearance, shape and pose of various objects, coupled with the interference of factors such as illumination and occlusion during imaging, object detection has always been the most challenging problem in the field of machine vision.

已有的研究表明，可靠的目标检测算法是实现对复杂场景进行自动分析与理解的基础。因此，图像目标检测是计算机视觉领域的基础任务，其性能好坏将直接影响后续的目标跟踪、动作识别以及行为理解等中高层任务的性能，进而决定了人脸检测、行为描述、交通场景物体识别、基于内容的互联网图像检索等后续 AI（人工智能）应用的性能。随着这些AI 应用渗透到人们生产和生活的方方面面，目标检测技术在一定程度上减轻了人的负担，改变了人类的生活方式。Existing studies have shown that reliable target detection algorithms are the basis for automatic analysis and understanding of complex scenes. Therefore, image target detection is a basic task in the field of computer vision, and its performance will directly affect the performance of subsequent middle and high-level tasks such as target tracking, action recognition, and behavior understanding, which in turn determines face detection, behavior description, and traffic scene objects. Performance of subsequent AI (artificial intelligence) applications such as recognition, content-based Internet image retrieval, etc. As these AI applications penetrate into all aspects of people's production and life, target detection technology has reduced the burden on people to a certain extent and changed the way of life of people.

近年来，随着具备高性能运算能力的GPU（图形处理器）的不断更新迭代，基于深度学习的目标检测算法的发展极其迅猛。以yolo为代表的单目标检测算法，通过神经网络与深度学习的加持，获得了极强的图像特征提取和融合能力，相对传统的滑窗式目标检测算法，具备更强的性能、稳定性和泛化能力。In recent years, with the continuous update and iteration of GPUs (graphics processing units) with high-performance computing capabilities, the development of target detection algorithms based on deep learning has been extremely rapid. The single target detection algorithm represented by yolo, through the blessing of neural network and deep learning, has obtained strong image feature extraction and fusion capabilities. Compared with the traditional sliding window target detection algorithm, it has stronger performance, stability and stability. Generalization.

然而高性能GPU的造价相对高昂，且相对不能移动，只适用于模型的训练，但不适用于模型的部署和实际生产生活的应用。However, high-performance GPUs are relatively expensive and cannot be moved. They are only suitable for model training, but not for model deployment and actual production and life applications.

发明内容SUMMARY OF THE INVENTION

为克服上述现有技术的不足，本发明提供了一种基于轻量化神经网络模型的图像目标检测方法，在保证模型精度的情况下，极大的提升了模型的运行速度，使得模型可以在小设备和移动端进行流畅的部署和应用，满足日常场景下检测的实时性和准确性。In order to overcome the above-mentioned shortcomings of the prior art, the present invention provides an image target detection method based on a lightweight neural network model, which greatly improves the running speed of the model under the condition of ensuring the accuracy of the model, so that the model can be used in small Devices and mobile terminals are deployed and applied smoothly to meet the real-time and accuracy of detection in daily scenarios.

为实现上述目的，本发明的一个或多个实施例提供了如下技术方案：To achieve the above object, one or more embodiments of the present invention provide the following technical solutions:

第一方面，公开了一种基于轻量化神经网络模型的图像目标检测方法，包括：In a first aspect, an image target detection method based on a lightweight neural network model is disclosed, including:

输入待检测图片或视频的路径；Enter the path of the image or video to be detected;

利用轻量化神经网络模型计算出接收的待检测图片中所有分类的相关置信度，通过选择最高的置信度得到最终的识别框并在待检测图片中绘制，完成检测过程。The light weight neural network model is used to calculate the relevant confidence levels of all categories in the received picture to be detected, and the final recognition frame is obtained by selecting the highest confidence level and drawn in the to-be-detected picture to complete the detection process.

进一步的技术方案，所述轻量化神经网络模型包括主干网络和特征融合网络；In a further technical solution, the lightweight neural network model includes a backbone network and a feature fusion network;

所述主干网络利用卷积对待检测图片处理生成真实特征层，再对所述真实特征层进行线性变换得到幻象特征层；The backbone network uses convolution to process the image to be detected to generate a real feature layer, and then performs linear transformation on the real feature layer to obtain a phantom feature layer;

所述特征融合网络将幻象特征层中处理的待检测图像信息进一步处理融合，生成特征金字塔，所述特征金字塔用于对于不同缩放尺度对象进行增强检测，进行识别不同大小和尺度的同一个物体。The feature fusion network further processes and fuses the to-be-detected image information processed in the phantom feature layer to generate a feature pyramid, and the feature pyramid is used for enhanced detection of objects of different scales and identification of the same object of different sizes and scales.

进一步的技术方案，所述轻量化神经网络模型训练过程为：In a further technical solution, the lightweight neural network model training process is:

获取包含目标行为的数据集；Get a dataset containing the target behavior;

其中，所述包含目标行为的数据集来源于网络下载的开源数据集，对于已经获得的未标注的数据集，使用标注工具进行标注：框选通过学习和训练后所能识别的主体，然后将主体相对于图像尺寸的位置和长宽数据保存在相应的XML文件中，文件名与图像文件对应；Among them, the data set containing the target behavior comes from the open source data set downloaded from the network. For the unlabeled data set that has been obtained, use the labeling tool to label: select the subjects that can be identified after learning and training, and then The position and length and width data of the main body relative to the image size are stored in the corresponding XML file, and the file name corresponds to the image file;

将标注后的数据集进行随机划分为训练数据集及测试数据集，训练数据集用于轻量化神经网络模型的训练，测试数据集用于轻量化神经网络模型的测试。The labeled data set is randomly divided into training data set and test data set. The training data set is used for training the lightweight neural network model, and the test data set is used for the testing of the lightweight neural network model.

进一步的技术方案，将训练数据集的图片用固定的大小和格式封装后传入构造好的主干网络和特征融合网络中，得出轻量化神经网络模型的预测结果；A further technical solution is to encapsulate the images of the training data set in a fixed size and format and then transfer them into the constructed backbone network and feature fusion network to obtain the prediction result of the lightweight neural network model;

计算轻量化神经网络模型的输出与真实值的损失，计算损失值的梯度，最后用梯度下降算法更新轻量化神经网络模型参数，通过寻找损失函数的最优解来调整模型参数。Calculate the loss of the output of the lightweight neural network model and the real value, calculate the gradient of the loss value, and finally update the parameters of the lightweight neural network model with the gradient descent algorithm, and adjust the model parameters by finding the optimal solution of the loss function.

进一步的技术方案，针对标注后的测试数据集进行处理，将XML格式的文件转化为能够正确读取的txt格式文件，调整测试数据集中相应的文件名称和目录。A further technical solution is to process the marked test data set, convert the XML format file into a txt format file that can be read correctly, and adjust the corresponding file name and directory in the test data set.

进一步的技术方案，所述主干网络和特征融合网络用于运行yolov5目标检测算法。In a further technical solution, the backbone network and the feature fusion network are used to run the yolov5 target detection algorithm.

进一步的技术方案，所述轻量化神经网络模型应用于日常生活抽烟场景的实时检测。In a further technical solution, the lightweight neural network model is applied to real-time detection of smoking scenes in daily life.

进一步的技术方案，所述特征融合网络通路的每个阶段都将前一阶段的特征映射作为输入，并用卷积层处理，输出通过侧位连接被添加到自上而下通路的同一阶段特征图中，这些特征图为下一阶段的处理提供信息；不同层的特征图通过自适应池化操作后融合在一起，然后进行检测，生成待检测图像中每一种预测类别的相关置信度和预测边框位置信息。In a further technical solution, each stage of the feature fusion network path takes the feature map of the previous stage as an input, and processes it with a convolution layer, and the output is added to the same stage feature map of the top-down path through side connections. , these feature maps provide information for the next stage of processing; the feature maps of different layers are fused together after adaptive pooling operation, and then detected to generate the relevant confidence and prediction of each prediction category in the image to be detected. Border position information.

进一步的技术方案，所述主干网络主要由Conv模块和CSPNet模块组成，通过控制网络结构的宽度和深度，调整Conv模块和CSPNet模块在整个神经网络中的数量。In a further technical solution, the backbone network is mainly composed of Conv modules and CSPNet modules, and the number of Conv modules and CSPNet modules in the entire neural network is adjusted by controlling the width and depth of the network structure.

第二方面，公开了一种基于轻量化神经网络模型的图像目标检测系统，包括：In a second aspect, an image target detection system based on a lightweight neural network model is disclosed, including:

数据输入模块，被配置为：输入待检测图片或视频的路径；The data input module is configured to: input the path of the picture or video to be detected;

目标检测模块，被配置为：利用轻量化神经网络模型计算出接收的待检测图片中所有分类的相关置信度，通过选择最高的置信度得到最终的识别框并在待检测图片中绘制，完成检测过程。The target detection module is configured to: use a lightweight neural network model to calculate the relevant confidence levels of all categories in the received picture to be detected, and obtain the final recognition frame by selecting the highest confidence level and draw it in the to-be-detected picture to complete the detection. process.

以上一个或多个技术方案存在以下有益效果：One or more of the above technical solutions have the following beneficial effects:

本发明将yolov5目标检测算法并应用于日常生活抽烟场景的实时检测。对yolov5默认网络结构的宽度（width_multiple）和深度（depth_multiple）进行调整，针对抽烟检测轻量化整个模型大小，提高模型运行速度。同时融合了Ghost-Conv和C3-Ghost两种网络模块来改进默认的模型，改进后模型大小缩小了46%，精度提升了2%。在保证模型精度的情况下，该改进极大的提升了模型的运行速度，使得模型可以在小设备和移动端进行流畅的部署和应用，满足日常场景下对抽烟检测的实时性和准确性。The invention applies the yolov5 target detection algorithm to the real-time detection of the smoking scene in daily life. Adjust the width (width_multiple) and depth (depth_multiple) of the default network structure of yolov5, reduce the size of the entire model for smoking detection, and improve the running speed of the model. At the same time, two network modules, Ghost-Conv and C3-Ghost, are combined to improve the default model. After the improvement, the size of the model is reduced by 46% and the accuracy is increased by 2%. Under the condition of ensuring the accuracy of the model, this improvement greatly improves the running speed of the model, so that the model can be smoothly deployed and applied on small devices and mobile terminals, meeting the real-time and accuracy of smoking detection in daily scenarios.

本发明附加方面的优点将在下面的描述中部分给出，部分将从下面的描述中变得明显，或通过本发明的实践了解到。Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will become apparent from the description which follows, or may be learned by practice of the invention.

附图说明Description of drawings

构成本发明的一部分的说明书附图用来提供对本发明的进一步理解，本发明的示意性实施例及其说明用于解释本发明，并不构成对本发明的不当限定。The accompanying drawings forming a part of the present invention are used to provide further understanding of the present invention, and the exemplary embodiments of the present invention and their descriptions are used to explain the present invention, and do not constitute an improper limitation of the present invention.

图1为yolov5主干网络（backbone）的部分网络结构图；Figure 1 is a partial network structure diagram of the yolov5 backbone network (backbone);

图2修改后的本发明实施例yolov5主干网络（backbone）的部分网络结构图。FIG. 2 is a partial network structure diagram of a yolov5 backbone network (backbone) according to a modified embodiment of the present invention.

具体实施方式Detailed ways

应该指出，以下详细说明都是示例性的，旨在对本发明提供进一步的说明。除非另有指明，本文使用的所有技术和科学术语具有与本发明所属技术领域的普通技术人员通常理解的相同含义。It should be noted that the following detailed description is exemplary and intended to provide further explanation of the invention. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

需要注意的是，这里所使用的术语仅是为了描述具体实施方式，而非意图限制根据本发明的示例性实施方式。It should be noted that the terminology used herein is for the purpose of describing specific embodiments only, and is not intended to limit the exemplary embodiments according to the present invention.

在不冲突的情况下，本发明中的实施例及实施例中的特征可以相互组合。Embodiments of the invention and features of the embodiments may be combined with each other without conflict.

当前基于深度学习的目标检测算法的主流研究方向，是不断的轻量化神经网络模型，不断的提高模型在小设备和移动端的运行能力，从而使模型更好的应用于实际的生产生活，创造出更多的社会经济效益。The current mainstream research direction of target detection algorithms based on deep learning is the continuous lightweight neural network model, which continuously improves the running ability of the model on small devices and mobile terminals, so that the model can be better applied to actual production and life, creating a more socioeconomic benefits.

实施例一Example 1

本实施例公开了一种基于轻量化神经网络模型的图像目标检测方法，本实施例子以吸烟行为检测为例进行说明，当然也可应用至其他目标行为的图像检测中。This embodiment discloses an image target detection method based on a lightweight neural network model. This embodiment takes smoking behavior detection as an example for description, and of course it can also be applied to image detection of other target behaviors.

包括以下步骤：Include the following steps:

S1：制作用于深度学习训练和测试的数据集，并对整个数据集进行划分和处理；S1: Make a dataset for deep learning training and testing, and divide and process the entire dataset;

S2：配置用于神经网络模型训练和测试的python和pytorch编程环境；S2: Configure the python and pytorch programming environments for neural network model training and testing;

S3：构造实现yolov5目标检测算法所需要的主干网络和特征融合网络，其中主干网络用于提取待检测图像中的有用特征，特征融合网络用于强化由主干网络提取的有用特征，输出待检测图像的最终特征图；S3: Construct the backbone network and feature fusion network required to implement the yolov5 target detection algorithm, where the backbone network is used to extract useful features in the image to be detected, and the feature fusion network is used to strengthen the useful features extracted by the backbone network, and output the image to be detected. The final feature map of ;

S4：定义yolov5目标检测算法的损失函数（分类损失和定位损失）；S4: Define the loss function of the yolov5 target detection algorithm (classification loss and localization loss);

S5：对yolov5默认网络结构的宽度（width_multiple）和深度（depth_multiple）进行调整，针对抽烟检测轻量化整个模型大小，提高模型运行速度。同时融合了Ghost-Conv和C3-Ghost两种网络模块来改进默认的模型，改进后模型大小缩小了46%，精度提升了2%。在保证模型精度的情况下，该改进极大的提升了模型的运行速度，使得模型可以在小设备和移动端进行流畅的部署和应用，满足目标检测的实时性；S5: Adjust the width (width_multiple) and depth (depth_multiple) of the default network structure of yolov5, reduce the size of the entire model for smoking detection, and improve the running speed of the model. At the same time, two network modules, Ghost-Conv and C3-Ghost, are combined to improve the default model. After the improvement, the size of the model is reduced by 46% and the accuracy is increased by 2%. Under the condition of ensuring the accuracy of the model, this improvement greatly improves the running speed of the model, so that the model can be smoothly deployed and applied on small devices and mobile terminals to meet the real-time performance of target detection;

S6：利用制作好的数据集训练神经网络模型，当损失函数不收敛时，计算在当前训练轮数下模型的整体性能，多次训练保留最优的模型文件；S6: Use the prepared data set to train the neural network model, when the loss function does not converge, calculate the overall performance of the model under the current number of training rounds, and retain the optimal model file after multiple training;

S7：对轻量化后的最优模型进行测试，确保效果真实有效。S7: Test the optimal model after light weight to ensure that the effect is real and effective.

步骤S1的处理过程如下：The processing procedure of step S1 is as follows:

S11：本次算法的训练过程采用的数据集是4860张吸烟行为数据集，其中数据集主要来源于网络下载及他人已标注好的开源数据集。只对数据集中的香烟进行标注，当检测到有抽烟行为时，对香烟主体标注为smoking，否则不做处理；S11: The data set used in the training process of this algorithm is 4860 smoking behavior data sets, of which the data sets are mainly from network downloads and open source data sets that have been marked by others. Only the cigarettes in the data set are marked. When smoking behavior is detected, the main body of the cigarette is marked as smoking, otherwise it will not be processed;

S12: 本发明所采用的数据集标注格式为XML格式，对于已经获得的未标注的数据集，使用标注工具labeling进行标注。主要过程为：框选希望算法通过学习和训练后所能识别的主体，然后将主体相对于图像尺寸的位置和长宽数据保存在相应的XML文件中，文件名与图像文件对应；S12: The data set labeling format adopted in the present invention is XML format, and the labeling tool labeling is used to label the obtained unlabeled data set. The main process is: select the subject that you want the algorithm to recognize after learning and training, and then save the position and length and width data of the subject relative to the image size in the corresponding XML file, and the file name corresponds to the image file;

S13：对标注好的4860张数据集进行随机划分，其中3791张图像用于神经网络的训练过程，1069张用于测试过程。其中训练过程用于得到最终的神经网络模型，测试过程用于测试模型的可靠性和准确性；S13: Randomly divide the labeled dataset of 4860 images, of which 3791 images are used for the training process of the neural network and 1069 images are used for the testing process. The training process is used to obtain the final neural network model, and the testing process is used to test the reliability and accuracy of the model;

S14：针对标注好的吸烟检测数据集，将XML格式的文件转化为pytorch可以正确读取的txt格式文件，根据程序调整相应的文件名称和目录。S14: For the labeled smoking detection data set, convert the XML format file into a txt format file that can be read correctly by pytorch, and adjust the corresponding file name and directory according to the program.

步骤S2的处理过程如下：The processing procedure of step S2 is as follows:

S21：本发明主要通过pycharm搭建虚拟环境，pycharm是针对python所开发的一款编程软件，具备强大的代码编写和管理能力，同时可以丰富的插件、扩展和第三方的程序包。同时pycharm具备自带的虚拟编程环境设置功能，在我们新建工程的时候选择同时新建虚拟环境，就可以构造一个完全与本地隔离的编程环境，这有利于我们后续的代码编写和调试；S21: The present invention mainly builds a virtual environment through pycharm. Pycharm is a programming software developed for python, which has powerful code writing and management capabilities, and can also enrich plug-ins, extensions and third-party program packages. At the same time, pycharm has its own virtual programming environment setting function. When we choose to create a new virtual environment at the same time when we create a new project, we can construct a programming environment that is completely isolated from the local area, which is conducive to our subsequent code writing and debugging;

S22：本发明采用pytorch深度学习框架进行训练和开发，在上述过程中新建虚拟环境后，选择本地的python解释器。然后安装pytorch主体框架和其它必备的第三方科学计算、辅助功能实现的软件包。本发明使用python3.9，pytorch1.9+cu111；S22: The present invention adopts the pytorch deep learning framework for training and development. After the virtual environment is created in the above process, the local python interpreter is selected. Then install the pytorch main framework and other necessary third-party scientific computing and auxiliary function implementation packages. The present invention uses python3.9, pytorch1.9+cu111;

S23：深度学习的神经网络模型训练过程会产生大量的并行计算，所以在训练过程中，需要具备高性能并行计算能力的GPU设备来加速模型训练。本发明使用Nvidia RTX2060GPU进行目标检测算法的训练和开发，为保证设备的兼容性，确保GPU可以正常参与计算过程，需要安装cuda和cudnn驱动，本发明选用的cuda和cudnn驱动版本号为11.1；S23: The deep learning neural network model training process will generate a large amount of parallel computing, so during the training process, GPU devices with high-performance parallel computing capabilities are required to accelerate model training. The present invention uses Nvidia RTX2060 GPU to train and develop the target detection algorithm. In order to ensure the compatibility of the equipment and ensure that the GPU can normally participate in the calculation process, the cuda and cudnn drivers need to be installed, and the version number of the cuda and cudnn drivers selected by the present invention is 11.1;

步骤S3的处理过程如下：The processing procedure of step S3 is as follows:

S31:yolov5的网络结构主要分为主干网络（backbone）和特征融合网络（head），其中主干网络主要使用CSPNet（Cross Stage Partial Networks）跨阶段局部网络模块，该模块用于提取待检测图像中的有用特征，同时可以使网络结构在减少计算量的同时具备更好的性能。CSPNet解决了其他神经网络模型Backbone层中网络优化的梯度信息重复和梯度消失问题，减少了模型的参数量和FLOPS（模型复杂度指标）数值，既保证了推理速度和准确率，又减小了模型尺寸。The network structure of S31:yolov5 is mainly divided into a backbone network (backbone) and a feature fusion network (head). The backbone network mainly uses the CSPNet (Cross Stage Partial Networks) cross-stage local network module, which is used to extract the image to be detected. Useful features, and at the same time, the network structure can have better performance while reducing the amount of computation. CSPNet solves the problem of gradient information repetition and gradient disappearance in network optimization in the Backbone layer of other neural network models, reducing the number of parameters of the model and the FLOPS (model complexity index) value, which not only ensures the inference speed and accuracy, but also reduces the Model size.

CSPNet由若干个部分密集层和一个部分过渡层组成。CSPNet所表示的跨阶段局部网络对待检测图像的处理过程可由以下公式表示：CSPNet consists of several partial dense layers and one partial transition layer. The processing process of the image to be detected by the cross-stage local network represented by CSPNet can be expressed by the following formula:

（1）

(1)

（2）

(2)

（3）

(3)

（4）

(4)

（5）

(5)

（6）

(6)

公式中的k表示部分密集层的层数，X _k表示由第k层部分密集层输出的特征图，W _k和g _k分别表示第k层部分密集层的网络权重和梯度。X _T、W _T和g _T分别表示部分过渡层输出的特征图、网络权重和梯度，X _U和W _U分别表示CSPNet最终输出的特征图和网络权重。k in the formula represents the number of layers of the partial dense layer, X _k represents the feature map output by the partial dense layer of the kth layer, and W _k and g _k represent the network weight and gradient of the partial dense layer of the kth layer, respectively. X _T , W _T and g _T represent the feature map, network weight and gradient output by some transition layers, respectively, and X _U and W _U represent the final output feature map and network weight of CSPNet, respectively.

CSPNet涉及的所有步骤如下：All the steps involved in CSPNet are as follows:

首先通过通道

将输入CSPNet的待检测图像分成两部分。其中

直接连接到CSPNet的末端，而

将穿过若干个部分密集层。first through the channel

The image to be detected input to CSPNet is divided into two parts. in

is directly connected to the end of the CSPNet, while

will pass through several partially dense layers.

若干个部分密接层的最终输出

将经历一个部分过渡层，然后部分过渡层的输出X _T将与

拼接并输出最终的特征图X _U。The final output of several partially dense layers

will go through a partial transition layer, then the output X _T of the partial transition layer will be the same as

Concatenate and output the final feature map X _U .

上述公式（1）-（3）、公式（4）-（6）分别表示CSPNet的前向传播公式和反向传播公式。The above formulas (1)-(3) and formulas (4)-(6) respectively represent the forward propagation formula and the back propagation formula of CSPNet.

由公式（1）-（6）可知，来自CSPNet的

和没有经过CSPNet的待检测图像的特征映射

都被单独拼接并生成最终的特征图，双方都不包含属于对方的用于更新权重的重复梯度信息，从而使模型避免了过多的重复梯度信息，减小了模型复杂度。From formulas (1)-(6), it can be seen that from CSPNet

and the feature map of the image to be detected without CSPNet

Both are spliced separately to generate the final feature map, and neither side contains the repeated gradient information for updating the weights belonging to the other side, so that the model avoids too much repeated gradient information and reduces the model complexity.

S32：yolov5的特征融合网络使用了路径聚合网络(PANET)，其作用是将backbone层中处理的待检测图像信息进一步处理融合，生成特征金字塔。特征金字塔会增强模型对于不同缩放尺度对象的检测，从而能够识别不同大小和尺度的同一个物体。S32: The feature fusion network of yolov5 uses the Path Aggregation Network (PANET), which is used to further process and fuse the image information to be detected processed in the backbone layer to generate a feature pyramid. The feature pyramid enhances the model's detection of objects at different scales, so that it can recognize the same object of different sizes and scales.

PANET的特征提取网络采用了一种改进的FPN（Feature Pyramid Networks）结构，改善了神经网络中低层特征的传播。通路的每个阶段都将前一阶段的特征映射作为输入，并用3*3卷积层处理它们，输出通过侧位连接被添加到自上而下通路的同一阶段特征图中，这些特征图为下一阶段的处理提供信息。不同层的特征图通过自适应池化操作后融合在一起，传入神经网络模型最末端的检测器模块，生成待检测图像中每一种预测类别的相关置信度和预测边框位置信息。The feature extraction network of PANET adopts an improved FPN (Feature Pyramid Networks) structure, which improves the propagation of low-level features in the neural network. Each stage of the pipeline takes the feature maps of the previous stage as input, and processes them with 3*3 convolutional layers, and the output is added to the same stage feature maps of the top-down pipeline through side connections. These feature maps are The next stage of processing provides information. The feature maps of different layers are fused together after adaptive pooling operation, and passed to the detector module at the end of the neural network model to generate the relevant confidence and predicted frame position information of each predicted category in the image to be detected.

步骤S4的处理过程如下：The processing procedure of step S4 is as follows:

S41：用于目标检测的神经网络模型训练的主要流程是，将训练集的图片用固定的大小（batchsize）和格式封装后传入构造好的神经网络中，得出模型的预测结果，计算模型的输出与真实值的损失, 计算损失值的梯度，最后用梯度下降算法更新模型参数。引入损失函数的意义在于将抽象的网络训练过程转化为数学优化问题，通过寻找损失函数的最优解来调整模型参数，从而使模型性能达到最强。S41: The main process of training the neural network model for target detection is to encapsulate the images of the training set with a fixed size (batchsize) and format and then transfer them to the constructed neural network, obtain the prediction results of the model, and calculate the model. The loss of the output and the true value, calculate the gradient of the loss value, and finally update the model parameters with the gradient descent algorithm. The significance of introducing the loss function is to transform the abstract network training process into a mathematical optimization problem, and adjust the model parameters by finding the optimal solution of the loss function, so as to maximize the model performance.

S42：目标检测算法的损失函数主要包括定位损失函数和分类损失函数，其数学公式表示分别为：S42: The loss function of the target detection algorithm mainly includes a positioning loss function and a classification loss function, and the mathematical formulas are expressed as:

其中

分别代表真实框和预测框的左上角坐标与该框的长宽值。in

Represent the coordinates of the upper left corner of the real box and the predicted box and the length and width of the box, respectively.

以上公式中N表示类别总个数，

为当前类别预测值，

为经过激活函数后得到的当前类别的概率，

则为当前类别的真实值（0或1），

为分类损失。In the above formula, N represents the total number of categories,

is the predicted value for the current category,

is the probability of the current category obtained after the activation function,

is the true value of the current category (0 or 1),

is the classification loss.

S43：在本发明所采用的yolov5算法中，关于定位损失函数的部分与其它常见的目标检测算法不同，本发明采用CIOU做为定位损失函数，具体如公式所示：S43: In the yolov5 algorithm adopted by the present invention, the part about the positioning loss function is different from other common target detection algorithms, and the present invention adopts CIOU as the positioning loss function, as shown in the formula:

公式中参数代表的意义：The meaning of the parameters in the formula:

IOU：预测框和真实框的交并比。IOU: The intersection of the predicted box and the ground-truth box.

V：衡量长宽比一致性的参数，也可以定义为：V: A parameter that measures the consistency of the aspect ratio, which can also be defined as:

本发明采用的CIOU损失函数，相比较于传统的定位损失函数，可以更好的提高当预测框与真实框之间出现重叠、包含和异常尺寸边框时的准确率，更有利于神经网络模型的训练。Compared with the traditional localization loss function, the CIOU loss function adopted in the present invention can better improve the accuracy when the predicted frame and the real frame overlap, contain and abnormal size frames, which is more beneficial to the neural network model. train.

步骤S5的处理过程如下：The processing procedure of step S5 is as follows:

S51：以yolo系列为代表的单目标检测算法，因其具备极强的检测速度和优秀的检测精度，被广泛的应用在工业界和其它侧重实时性检测的生活场景中。过去比较成熟的yolo系列算法以yolov3为代表，其在性能与速度上都达到了不错的平衡，具备极强的稳定性。但是yolov3目标检测算法的运行对CPU和GPU的性能仍有不低的要求，导致其生产部署难度大，生活应用成本高。目前最新的yolov5算法，通过在输入端的数据增强和预处理、模型改进和特征融合、输出层改进、损失函数提升、NMS过程改进等方法，不仅进一步提高了yolo算法的检测精度，且模型大小也大大降低，运行速度再次获得了极大的提升，可以更加合适的应用于各种移动端和小设备的应用。S51: The single-target detection algorithm represented by the yolo series is widely used in the industry and other life scenarios that focus on real-time detection because of its strong detection speed and excellent detection accuracy. In the past, the more mature yolo series algorithms were represented by yolov3, which achieved a good balance in performance and speed, and had strong stability. However, the operation of the yolov3 target detection algorithm still has high requirements on the performance of the CPU and GPU, which makes its production and deployment difficult and the cost of living applications high. At present, the latest yolov5 algorithm not only further improves the detection accuracy of the yolo algorithm, but also further improves the detection accuracy of the yolo algorithm through methods such as data enhancement and preprocessing at the input, model improvement and feature fusion, output layer improvement, loss function improvement, and NMS process improvement. It is greatly reduced, and the running speed has been greatly improved again, which can be more appropriately applied to various mobile terminals and small device applications.

S52：yolov5的主干网络（backbone）主要由Conv模块和CSPNet模块组成，两种模块用于对图像进行不同程度的特征提取。通过控制网络结构的宽度（width_multiple）和深度（depth_multiple），可以调整Conv模块和CSPNet模块在整个神经网络中的数量，从而可以根据生产生活的具体需求、当前硬件的计算能力，灵活的控制整个模型的大小和复杂程度，获得更加适合的检测精度和检测速度。S52: The backbone network of yolov5 is mainly composed of the Conv module and the CSPNet module, which are used to extract different degrees of features from the image. By controlling the width (width_multiple) and depth (depth_multiple) of the network structure, the number of Conv modules and CSPNet modules in the entire neural network can be adjusted, so that the entire model can be flexibly controlled according to the specific needs of production and life and the computing power of the current hardware. size and complexity to obtain more suitable detection accuracy and detection speed.

考虑到对生活中的抽烟场景检测，特征复杂度低、实时性要求高、移动端和小设备部署需求极高，故适当缩小了网络宽度和深度，从而使模型大小和复杂度大大降低，运行速度得到了极大的提高。Considering that the detection of smoking scenes in life has low feature complexity, high real-time requirements, and extremely high deployment requirements for mobile terminals and small devices, the network width and depth are appropriately reduced, thereby greatly reducing the size and complexity of the model. Speed has been greatly improved.

S53：yolov5的主干网络（backbone）中的Conv模块和CSPNet模块，其内部主要构造为传统的卷积模块，原理是用深度卷积处理每一个特征通道上的空间信息，然后用点卷积进行通道间的特征融合。对设备计算力的需求高，存在一定程度的计算资源浪费，不利于计算力低的小设备的应用，也不适用于本发明所侧重的实时性和轻量化。S53: Conv module and CSPNet module in the backbone network of yolov5, which are mainly constructed as traditional convolution modules. The principle is to use depth convolution to process the spatial information on each feature channel, and then use point convolution to perform Feature fusion between channels. The demand for the computing power of the device is high, and there is a certain degree of waste of computing resources, which is not conducive to the application of small devices with low computing power, nor is it suitable for the real-time performance and light weight that the present invention focuses on.

故本发明对yolov5主干网络中的Conv模块和CSPNet模块进行改进，引入Ghost思想，核心在于用正常的卷积生成部分真实feature map（特征图），再用这些真实的featuremap经过线性变换 (Cheap operations ) 得到幻象特征层（Ghost feature map），最终由真实特征层和幻象特征层组成完整特征层。设计网络结构为Ghost-Conv和C3-Ghost，在保证了模型精度的情况下，进一步减小了模型复杂度，模型更加轻量化，更适用于小设备的部署，同时在大设备上也有不俗的运行速度。Therefore, the present invention improves the Conv module and CSPNet module in the yolov5 backbone network, and introduces the Ghost idea. The core is to use normal convolution to generate some real feature maps (feature maps), and then use these real feature maps to undergo linear transformation (Cheap operations). ) to obtain the ghost feature map, and finally the complete feature layer is composed of the real feature layer and the ghost feature layer. The designed network structure is Ghost-Conv and C3-Ghost. Under the condition of ensuring the accuracy of the model, the model complexity is further reduced, the model is more lightweight, and it is more suitable for the deployment of small devices, and it is also good for large devices. running speed.

yolov5主干网络（backbone）的部分网络结构图如图1所示，修改后的本发明实施例yolov5主干网络（backbone）的部分网络结构图如图2所示。A partial network structure diagram of the yolov5 backbone network (backbone) is shown in FIG. 1 , and a partial network structure diagram of the yolov5 backbone network (backbone) according to the modified embodiment of the present invention is shown in FIG. 2 .

步骤S6的处理过程如下：The processing procedure of step S6 is as follows:

S61：将S1过程处理和划分好的数据集的文件路径导入yolov5的.yaml文件中，运行程序，读取用于训练的图片的信息和标注好的真实框的位置信息，在数据集中生成相关的train.cache文件和val.cache文件。S61: Import the file path of the data set processed and divided by the S1 process into the .yaml file of yolov5, run the program, read the information of the image used for training and the position information of the marked real frame, and generate the relevant information in the data set The train.cache file and the val.cache file.

S62：设置用于训练神经网络模型的相关参数，在本发明中，深度学习的训练轮数设置为300，batch-size设置为24，image-size设置为640，使用RTX2060显卡进行训练；S62: Set relevant parameters for training the neural network model. In the present invention, the number of training rounds of deep learning is set to 300, the batch-size is set to 24, the image-size is set to 640, and an RTX2060 graphics card is used for training;

S63：本发明在训练过程中使用tensoboard进行可视化，实时监控训练过程中损失函数的收敛状况和模型整体性能的提升状况，当检测到近100轮的训练过程提升不明显，或训练轮数达到300时，结束训练并保存模型文件。S63: The present invention uses tensoboard for visualization in the training process, and monitors the convergence of the loss function and the improvement of the overall performance of the model in real time during the training process. When it is detected that the training process for nearly 100 rounds has not improved significantly, or the number of training rounds reaches 300 , end training and save the model file.

步骤S7的处理过程如下：The processing procedure of step S7 is as follows:

S71：采用S1中划分好的用于测试的数据集进行模型测试；S71: Use the data set divided for testing in S1 to perform model testing;

S72：将测试文件的路径导入yolov5主程序中的.yaml文件中，执行val.python程序，将生成模型的相关性能指标；S72: Import the path of the test file into the .yaml file in the yolov5 main program, execute the val.python program, and generate the relevant performance indicators of the model;

S73：将训练好的模型文件导入detect.py程序中，输入待检测图片或视频的路径，模型将会计算出当前图像中所有分类的相关置信度，通过选择最高的置信度得到最终的识别框并在原始图像中绘制，完成检测过程；S73: Import the trained model file into the detect.py program, enter the path of the image or video to be detected, and the model will calculate the relevant confidence levels of all categories in the current image, and obtain the final recognition frame by selecting the highest confidence level. Draw in the original image to complete the detection process;

本申请整体上而言，上述步骤在具体实施时包括：As a whole, the above steps include:

数据集的划分与导入；Data set division and import;

编程环境的配置与程序实现；Programming environment configuration and program implementation;

神经网络的训练；training of neural networks;

最终效果测试。Final effect test.

其中，数据集的划分是指对标注好的数据集，分别设置为训练集和测试集，然后导入yolov5的.yaml文件中，通过运行程序生成相应的train.cache文件和val.cache文件，来进行后续的网络训练过程。Among them, the division of the data set refers to setting the marked data set as the training set and the test set respectively, and then importing it into the .yaml file of yolov5, and generating the corresponding train.cache file and val.cache file by running the program. Carry out the subsequent network training process.

编程环境的配置与程序实现，是在windos 10系统下，使用pycharm + python3.9配置编程环境，安装1.9.1的pytorch及其它辅助程序包，使用RTX2060显卡并安装11.1版本号的cuda和cudnn进行程序计算加速。The configuration and program implementation of the programming environment are performed under the windos 10 system, using pycharm + python3.9 to configure the programming environment, installing pytorch and other auxiliary packages of 1.9.1, using RTX2060 graphics card and installing cuda and cudnn of version 11.1. Program calculation acceleration.

神经网络的训练，通过导入数据集，构造网络模型框架，设置训练轮数为300，batch-size为24，image-size为640，设置其它超参数，设置GPU设备，执行train.py程序来完成训练过程。The training of the neural network is completed by importing the dataset, constructing the network model framework, setting the number of training rounds to 300, batch-size to 24, image-size to 640, setting other hyperparameters, setting the GPU device, and executing the train.py program. training process.

最终效果测试，是在完成神经网络的训练过程后，将测试文件的路径导入yolov5主程序中的.yaml文件中，执行val.python程序，获得模型的相关性能结果。最后将训练好的模型文件导入detect.py程序中，查看检测结果即可。The final effect test is to import the path of the test file into the .yaml file in the yolov5 main program after completing the training process of the neural network, and execute the val.python program to obtain the relevant performance results of the model. Finally, import the trained model file into the detect.py program and check the detection results.

本实验训练、测试的平台为联想拯救者R7000P，具体硬件配置为NVIDIA GeForceRTX 2060（6G）、AMD Ryzen 7 4800H with Radeon Graphics。The platform for training and testing in this experiment is Lenovo Savior R7000P, and the specific hardware configuration is NVIDIA GeForceRTX 2060 (6G), AMD Ryzen 7 4800H with Radeon Graphics.

本实验采用的数据集大部分是从网络上获取的，一般获取的数据集格式都是xml格式，此时需要利用代码将xml格式转为txt格式。同时数据集需要划分训练集和验证集，本步骤利用网络上开源的代码实验数据集格式的转换和训练集、验证集的划分。数据集中Annotations里存放xml格式的标签文件，JPEGImages里存放照片数据文件。数据划分代码中classes要正确填写xml中已经标注好的类，TRAIN_RATIO用来确定划分训练集和验证集时的比例，如当填写80时代表划分80%的数据给训练集，20%的数据给验证集。运行代码会在当前目录下生成images和labels文件夹，两个文件夹下均生成train文件夹和val文件夹。Most of the data sets used in this experiment are obtained from the Internet. Generally, the data sets obtained are in xml format. In this case, the xml format needs to be converted into txt format using code. At the same time, the data set needs to be divided into training set and verification set. This step uses the open source code on the network to convert the format of the experimental data set and divide the training set and verification set. In the dataset, Annotations store label files in xml format, and JPEGImages store photo data files. In the data division code, the classes that have been marked in the xml should be filled in correctly. TRAIN_RATIO is used to determine the ratio of dividing the training set and the validation set. For example, when filling in 80, it means dividing 80% of the data into the training set and 20% of the data into the validation set. Running the code will generate the images and labels folders in the current directory, and the train folder and the val folder will be generated in both folders.

本实验采用python作为编译语言，采用pytorch深度学习框架，项目代码整体结构主要包括data、models、utils、weights、detect.py、train.py、test.py、requirements.txt等文件。其中data用来存储一些配置文件（yaml文件），models用来存储模型，utils用来存储yolov5的工具类函数，weights用来存储训练好的权重文件，requirements.txt用来下载运行程序的依赖包。This experiment uses python as the compilation language and the pytorch deep learning framework. The overall structure of the project code mainly includes files such as data, models, utils, weights, detect.py, train.py, test.py, and requirements.txt. Among them, data is used to store some configuration files (yaml files), models are used to store models, utils is used to store yolov5 tool functions, weights is used to store trained weight files, and requirements.txt is used to download dependencies for running programs. .

下载好依赖包后首先打开data目录下相应的yaml文件，将其中的训练集和验证集地址修改为自己已经制作好的数据集地址，接下来在models目录下修改模型配置文件，找到预训练权重对应的模型配置文件，打开后修改其中类别一项为1，本例中只检测somking。修改好后即可开始训练，打开train.py，更改weights,cfg,data等参数后，设置训练轮数为300，batch-size设为24，num_workers设为0，使用GPU开始训练。After downloading the dependency package, first open the corresponding yaml file in the data directory, and modify the training set and verification set addresses to the data set addresses that you have made. Next, modify the model configuration file in the models directory to find the pre-training weights After opening the corresponding model configuration file, modify the category item to 1. In this example, only somking is detected. After the modification, you can start training, open train.py, change the parameters such as weights, cfg, data, etc., set the number of training rounds to 300, batch-size to 24, num_workers to 0, and use GPU to start training.

训练好后取根目录中run文件夹下对应训练的weights文件夹中best.pt作为权重文件。将训练好的模型权重文件应用在Map生成可执行文件和测试可执行文件下，得到实验结果，将实验结果和改进前网络训练的结果进行对比，改进前，大小为13.7MB，改进后大小为7.39MB，改进前模型准确率为0.841，改进后模型准确率为0.861，从而验证本申请技术方案有效性。After training, take best.pt in the weights folder of the corresponding training under the run folder in the root directory as the weight file. Apply the trained model weight file to the Map generation executable file and the test executable file to get the experimental results, and compare the experimental results with the results of the network training before the improvement. Before the improvement, the size is 13.7MB, and the size after the improvement is 7.39MB, the accuracy of the model before the improvement is 0.841, and the accuracy of the model after the improvement is 0.861, thereby verifying the effectiveness of the technical solution of the present application.

实施例二Embodiment 2

本实施例的目的是提供了一种基于轻量化神经网络模型的图像目标检测系统，包括：The purpose of this embodiment is to provide an image target detection system based on a lightweight neural network model, including:

目标检测模块，被配置为：利用轻量化神经网络模型计算出接收的当前图像中所有分类的相关置信度，通过选择最高的置信度得到最终的识别框并在原始图片中绘制，完成检测过程。The target detection module is configured to: use the lightweight neural network model to calculate the relevant confidence levels of all categories in the received current image, and obtain the final recognition frame by selecting the highest confidence level and draw it in the original image to complete the detection process.

本发明涉及神经网络、深度学习、机器视觉、目标检测技术，主要使用目前最新的单阶段目标检测算法yolov5，通过对网络结构的宽度（width_multiple）和深度（depth_multiple）进行调整、对主干网络（backbone）中的Conv和CSPNet进行改进，在保证模型精度的情况下，有效的降低了模型的复杂度的大小，不仅获得了极快的运行速度，同时轻量化的模型也可以轻松的在各种移动端和小设备上流畅运行，满足各种生产生活的实际需求。The invention relates to neural network, deep learning, machine vision, and target detection technology, and mainly uses the latest single-stage target detection algorithm yolov5. By adjusting the width (width_multiple) and depth (depth_multiple) of the network structure, the backbone network (backbone ) in Conv and CSPNet are improved, and the complexity of the model is effectively reduced under the condition of ensuring the accuracy of the model, which not only obtains extremely fast running speed, but also the lightweight model can be easily moved in various mobile It runs smoothly on terminals and small devices to meet the actual needs of various production and life.

本领域技术人员应该明白，上述本发明的各模块或各步骤可以用通用的计算机装置来实现，可选地，它们可以用计算装置可执行的程序代码来实现，从而，可以将它们存储在存储装置中由计算装置来执行，或者将它们分别制作成各个集成电路模块，或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。本发明不限制于任何特定的硬件和软件的结合。Those skilled in the art should understand that the above modules or steps of the present invention can be implemented by a general-purpose computer device, or alternatively, they can be implemented by a program code executable by the computing device, so that they can be stored in a storage device. The device is executed by a computing device, or they are separately fabricated into individual integrated circuit modules, or multiple modules or steps in them are fabricated into a single integrated circuit module for implementation. The present invention is not limited to any specific combination of hardware and software.

上述虽然结合附图对本发明的具体实施方式进行了描述，但并非对本发明保护范围的限制，所属领域技术人员应该明白，在本发明的技术方案的基础上，本领域技术人员不需要付出创造性劳动即可做出的各种修改或变形仍在本发明的保护范围以内。Although the specific embodiments of the present invention have been described above in conjunction with the accompanying drawings, they do not limit the scope of protection of the present invention. Those skilled in the art should understand that on the basis of the technical solutions of the present invention, those skilled in the art do not need to pay creative work. Various modifications or deformations that can be made are still within the protection scope of the present invention.

Claims

1. An image target detection method based on a lightweight neural network model is characterized by comprising the following steps:

inputting a path of a picture or a video to be detected;

and calculating the related confidence degrees of all the classifications in the received picture to be detected by using the lightweight neural network model, selecting the highest confidence degree to obtain a final recognition frame, drawing the recognition frame in the picture to be detected, and finishing the detection process.

2. The method for detecting the image target based on the light-weight neural network model as claimed in claim 1, wherein the light-weight neural network model comprises a backbone network and a feature fusion network;

the backbone network processes the picture to be detected by utilizing convolution to generate a real characteristic layer, and then linear transformation is carried out on the real characteristic layer to obtain a phantom characteristic layer;

the feature fusion network further processes and fuses the image information to be detected processed in the phantom feature layer to generate a feature pyramid, and the feature pyramid is used for carrying out enhanced detection on objects with different scaling sizes and identifying the same object with different sizes and scales.

3. The method for detecting the image target based on the light weight neural network model as claimed in claim 1, wherein the training process of the light weight neural network model is as follows:

acquiring a data set containing target behaviors;

wherein, the data set containing the target behavior is derived from an open source data set downloaded from a network, and for the obtained unmarked data set, a marking tool is used for marking: selecting a main body which can be identified after learning and training, and then storing the position and length and width data of the main body relative to the image size in a corresponding XML file, wherein the file name corresponds to the image file;

and randomly dividing the labeled data set into a training data set and a testing data set, wherein the training data set is used for training the lightweight neural network model, and the testing data set is used for testing the lightweight neural network model.

4. The method for detecting the image target based on the light-weight neural network model as claimed in claim 3, wherein the pictures of the training data set are encapsulated by a fixed size and format and then transmitted into the constructed backbone network and the feature fusion network to obtain the prediction result of the light-weight neural network model;

calculating the loss of the output and the true value of the lightweight neural network model, calculating the gradient of the loss value, updating the lightweight neural network model parameters by using a gradient descent algorithm, and adjusting the model parameters by searching the optimal solution of the loss function.

5. The method as claimed in claim 3, wherein the labeled test data set is processed to convert an XML-formatted file into a txt-formatted file that can be read correctly, and the corresponding file name and directory in the test data set are adjusted.

6. The method for detecting the image target based on the light weight neural network model as claimed in claim 2, wherein the backbone network and the feature fusion network are used for operating yolov5 target detection algorithm.

7. The method for detecting the image target based on the light weight neural network model as claimed in claim 1, wherein the light weight neural network model is applied to real-time detection of smoking scenes in daily life.

8. The method for detecting the image target based on the light weight neural network model as claimed in claim 2, wherein each stage of the feature fusion network path takes the feature map of the previous stage as input, and the feature map is processed by a convolution layer, and the output is added to the feature map of the same stage of the top-down path through lateral connection, and the feature maps provide information for the processing of the next stage; and fusing the feature maps of different layers together after self-adaptive pooling operation, and then detecting to generate the related confidence coefficient and the position information of a prediction frame of each prediction category in the image to be detected.

9. The method as claimed in claim 2, wherein the backbone network mainly comprises a Conv module and a CSPNet module, and the number of the Conv module and the CSPNet module in the whole neural network is adjusted by controlling the width and depth of the network structure.

10. An image target detection system based on a lightweight neural network model is characterized by comprising:

a data input module configured to: inputting a path of a picture or a video to be detected;

an object detection module configured to: and calculating the related confidence degrees of all the classifications in the received current image by using the lightweight neural network model, obtaining a final recognition frame by selecting the highest confidence degree, and drawing in the original image to finish the detection process.