CN114332666A - Image target detection method and system based on lightweight neural network model - Google Patents

Image target detection method and system based on lightweight neural network model Download PDF

Info

Publication number
CN114332666A
CN114332666A CN202210234758.6A CN202210234758A CN114332666A CN 114332666 A CN114332666 A CN 114332666A CN 202210234758 A CN202210234758 A CN 202210234758A CN 114332666 A CN114332666 A CN 114332666A
Authority
CN
China
Prior art keywords
neural network
network model
model
image
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210234758.6A
Other languages
Chinese (zh)
Inventor
刘海英
孙凤乾
邓立霞
郑太恒
王超平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qilu University of Technology
Original Assignee
Qilu University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qilu University of Technology filed Critical Qilu University of Technology
Priority to CN202210234758.6A priority Critical patent/CN114332666A/en
Publication of CN114332666A publication Critical patent/CN114332666A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention provides an image target detection method and system based on a lightweight neural network model, belongs to the technical field of image target detection, solves the problem of inaccurate image identification at present, and comprises the following steps: inputting a path of a picture or a video to be detected; and calculating the related confidence degrees of all the classifications in the received current image by using the lightweight neural network model, obtaining a final recognition frame by selecting the highest confidence degree, and drawing in the original image to finish the detection process. Under the condition of ensuring the model precision, the method greatly improves the running speed of the model, so that the model can be deployed and applied smoothly at small equipment and a mobile terminal, and the real-time performance and the accuracy of the smoke detection in daily scenes are met.

Description

Image target detection method and system based on lightweight neural network model
Technical Field
The invention belongs to the technical field of image target detection, and particularly relates to an image target detection method and system based on a lightweight neural network model.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
The task of Object Detection (Object Detection) is to find all objects of interest (objects) in an image, determine their category and location, and is one of the core problems in the field of machine vision. Because various objects have different appearances, shapes and postures, and interference of factors such as illumination, shielding and the like during imaging is added, target detection is always the most challenging problem in the field of machine vision.
The existing research shows that a reliable target detection algorithm is the basis for realizing automatic analysis and understanding of complex scenes. Therefore, image target detection is a basic task in the field of computer vision, and the performance of the image target detection directly affects the performance of middle and high-level tasks such as subsequent target tracking, action recognition and behavior understanding, and further determines the performance of subsequent AI (artificial intelligence) applications such as face detection, behavior description, traffic scene object recognition, content-based internet image retrieval and the like. As these AI applications have penetrated aspects of people's production and life, the target detection techniques have reduced the burden on people to some extent, changing the lifestyle of people.
In recent years, with the continuous update iteration of a GPU (graphics processing unit) having a high-performance computing capability, a target detection algorithm based on deep learning has been developed very rapidly. The unitarget detection algorithm represented by yolo obtains extremely strong image feature extraction and fusion capability through the addition of a neural network and deep learning, and has stronger performance, stability and generalization capability compared with the traditional sliding window type target detection algorithm.
However, the high-performance GPU is relatively expensive to manufacture, relatively immobile, and only suitable for training of models, but not for model deployment and application in actual production and life.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides the image target detection method based on the lightweight neural network model, which greatly improves the running speed of the model under the condition of ensuring the precision of the model, so that the model can be smoothly deployed and applied at small equipment and a mobile terminal, and the real-time performance and the accuracy of detection in daily scenes are met.
In order to achieve the above object, one or more embodiments of the present invention provide the following technical solutions:
in a first aspect, an image target detection method based on a lightweight neural network model is disclosed, which includes:
inputting a path of a picture or a video to be detected;
and calculating the related confidence degrees of all the classifications in the received picture to be detected by using the lightweight neural network model, selecting the highest confidence degree to obtain a final recognition frame, drawing the recognition frame in the picture to be detected, and finishing the detection process.
According to a further technical scheme, the lightweight neural network model comprises a backbone network and a feature fusion network;
the backbone network processes the picture to be detected by utilizing convolution to generate a real characteristic layer, and then linear transformation is carried out on the real characteristic layer to obtain a phantom characteristic layer;
the feature fusion network further processes and fuses the image information to be detected processed in the phantom feature layer to generate a feature pyramid, and the feature pyramid is used for carrying out enhanced detection on objects with different scaling sizes and identifying the same object with different sizes and scales.
According to a further technical scheme, the training process of the lightweight neural network model comprises the following steps:
acquiring a data set containing target behaviors;
wherein, the data set containing the target behavior is derived from an open source data set downloaded from a network, and for the obtained unmarked data set, a marking tool is used for marking: selecting a main body which can be identified after learning and training, and then storing the position and length and width data of the main body relative to the image size in a corresponding XML file, wherein the file name corresponds to the image file;
and randomly dividing the labeled data set into a training data set and a testing data set, wherein the training data set is used for training the lightweight neural network model, and the testing data set is used for testing the lightweight neural network model.
According to the further technical scheme, pictures of the training data set are packaged by fixed size and format and then transmitted into a constructed backbone network and a constructed feature fusion network, and a prediction result of the lightweight neural network model is obtained;
calculating the loss of the output and the true value of the lightweight neural network model, calculating the gradient of the loss value, updating the lightweight neural network model parameters by using a gradient descent algorithm, and adjusting the model parameters by searching the optimal solution of the loss function.
According to the further technical scheme, the marked test data set is processed, the XML-format file is converted into a txt-format file which can be read correctly, and the corresponding file name and directory in the test data set are adjusted.
In a further technical scheme, the backbone network and the feature fusion network are used for operating yolov5 target detection algorithm.
According to the further technical scheme, the lightweight neural network model is applied to real-time detection of daily smoking scenes.
According to the further technical scheme, each stage of the feature fusion network path takes the feature mapping of the previous stage as input, convolution layer processing is used, output is added to feature graphs of the same stage of the top-down path through side position connection, and the feature graphs provide information for processing of the next stage; and fusing the feature maps of different layers together after self-adaptive pooling operation, and then detecting to generate the related confidence coefficient and the position information of a prediction frame of each prediction category in the image to be detected.
In a further technical scheme, the backbone network mainly comprises a Conv module and a CSPNet module, and the number of the Conv module and the CSPNet module in the whole neural network is adjusted by controlling the width and the depth of a network structure.
In a second aspect, an image target detection system based on a lightweight neural network model is disclosed, which includes:
a data input module configured to: inputting a path of a picture or a video to be detected;
an object detection module configured to: and calculating the related confidence degrees of all the classifications in the received picture to be detected by using the lightweight neural network model, selecting the highest confidence degree to obtain a final recognition frame, drawing the recognition frame in the picture to be detected, and finishing the detection process.
The above one or more technical solutions have the following beneficial effects:
the yolov5 target detection algorithm is applied to real-time detection of daily smoking scenes. The width (width _ multiple) and the depth (depth _ multiple) of the yolov5 default network structure are adjusted, the size of the whole model is reduced aiming at the smoking detection, and the running speed of the model is improved. Meanwhile, the method integrates two network modules, namely, the Ghost-Conv network module and the C3-Ghost network module to improve a default model, the size of the improved model is reduced by 46%, and the precision is improved by 2%. Under the condition of ensuring the precision of the model, the running speed of the model is greatly improved by the improvement, so that the model can be smoothly deployed and applied at small equipment and a mobile terminal, and the real-time performance and the accuracy of smoke detection in daily scenes are met.
Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.
FIG. 1 is a partial network structure diagram of yolov5 backbone network (backbone);
fig. 2 is a modified partial network structure diagram of yolov5 backbone network (backbone) according to the embodiment of the present invention.
Detailed Description
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention.
The embodiments and features of the embodiments of the present invention may be combined with each other without conflict.
At present, the mainstream research direction of the target detection algorithm based on deep learning is a continuous lightweight neural network model, and the operation capability of the model at small equipment and a mobile terminal is continuously improved, so that the model is better applied to actual production and life, and more social and economic benefits are created.
Example one
The embodiment describes an image target detection method based on a lightweight neural network model, and takes smoking behavior detection as an example, but the embodiment can also be applied to image detection of other target behaviors.
The method comprises the following steps:
s1: making a data set for deep learning training and testing, and dividing and processing the whole data set;
s2: configuring python and pytorch programming environments for neural network model training and testing;
s3: constructing a backbone network and a feature fusion network required by realizing a yolov5 target detection algorithm, wherein the backbone network is used for extracting useful features in an image to be detected, the feature fusion network is used for reinforcing the useful features extracted by the backbone network, and outputting a final feature map of the image to be detected;
s4: defining the loss functions (classification loss and positioning loss) of yolov5 target detection algorithm;
s5: the width (width _ multiple) and the depth (depth _ multiple) of the yolov5 default network structure are adjusted, the size of the whole model is reduced aiming at the smoking detection, and the running speed of the model is improved. Meanwhile, the method integrates two network modules, namely, the Ghost-Conv network module and the C3-Ghost network module to improve a default model, the size of the improved model is reduced by 46%, and the precision is improved by 2%. Under the condition of ensuring the accuracy of the model, the improvement greatly improves the running speed of the model, so that the model can be deployed and applied smoothly at small equipment and a mobile terminal, and the real-time property of target detection is met;
s6: training a neural network model by using the manufactured data set, calculating the overall performance of the model under the current training round number when the loss function is not converged, and keeping an optimal model file after multiple times of training;
s7: and the light-weighted optimal model is tested, so that the effect is real and effective.
The processing procedure of step S1 is as follows:
s11: the data set adopted in the training process of the algorithm is 4860 smoking behavior data sets, wherein the data set mainly comes from network downloading and other marked development data sets. Only labeling the cigarettes in the data set, labeling the cigarette main body as smocking when smoking behavior is detected, otherwise, not processing;
and S12, marking the acquired unmarked data set by using a marking tool, namely labeling the acquired unmarked data set in an XML format. The main process is as follows: selecting a main body which can be identified by an expected algorithm after learning and training, and then storing the position and length and width data of the main body relative to the image size in a corresponding XML file, wherein the file name corresponds to the image file;
s13: 4860 labeled datasets were randomly partitioned, with 3791 images used for the training process of the neural network and 1069 images used for the testing process. The training process is used for obtaining a final neural network model, and the testing process is used for testing the reliability and accuracy of the model;
s14: and aiming at the marked smoking detection data set, converting the XML-format file into a txt-format file which can be read correctly by the pytoch, and adjusting the corresponding file name and directory according to the program.
The processing procedure of step S2 is as follows:
s21: the virtual environment is mainly built through the pycharm, which is a piece of programming software developed aiming at python, has strong code writing and management capabilities, and can enrich plug-ins, extensions and program packages of third parties. Meanwhile, the pycharm has a self-contained virtual programming environment setting function, and when a project is newly built, a programming environment which is completely isolated from the local can be constructed by selecting to newly build a virtual environment at the same time, so that the subsequent code compiling and debugging of the user are facilitated;
s22: the method adopts a pytorech deep learning framework for training and developing, and selects a local python interpreter after a virtual environment is newly built in the process. And then installing a main body framework of the pyrrch and other necessary software packages for realizing third-party scientific calculation and auxiliary functions. The invention uses python3.9, pytorch1.9+ cu 111;
s23: a large amount of parallel computation is generated in the deep learning neural network model training process, so a GPU device with high performance parallel computation capability is required to accelerate the model training in the training process. The invention uses the Nvidia RTX2060 GPU to train and develop a target detection algorithm, in order to ensure the compatibility of equipment and ensure that the GPU can normally participate in the calculation process, cuda and cudnn drivers need to be installed, and the version number of the cuda and cudnn drivers selected by the invention is 11.1;
the processing procedure of step S3 is as follows:
s31, the network structure of yolov5 is mainly divided into a backbone network (backbone) and a feature fusion network (head), wherein the backbone network mainly uses a CSPNet (cross Stage Partial networks) cross-Stage local network module which is used for extracting useful features in an image to be detected, and meanwhile, the network structure has better performance while reducing the calculated amount. The CSPNet solves the problems of repeated gradient information and gradient disappearance of network optimization in a backsbone layer of other neural network models, reduces the parameter number and FLOPS (model complexity index) value of the models, ensures the reasoning speed and accuracy and reduces the size of the models.
The CSPNet is composed of a plurality of partial dense layers and a partial transition layer. The processing procedure of the to-be-detected image by the cross-phase local network represented by CSPNet can be represented by the following formula:
Figure 400513DEST_PATH_IMAGE001
(1)
Figure 330423DEST_PATH_IMAGE002
(2)
Figure 760268DEST_PATH_IMAGE003
(3)
Figure 1893DEST_PATH_IMAGE004
(4)
Figure 808175DEST_PATH_IMAGE005
(5)
Figure 451646DEST_PATH_IMAGE006
(6)
k in the formula represents the number of partially dense layers,X k a characteristic diagram representing the output by the k-th partially dense layer,W k andg k the network weights and gradients of the k-th layer partially dense layer are represented, respectively.X T W T Andg T respectively representing the feature map, the network weight and the gradient of partial transition layer output,X U andW U respectively representing the characteristic diagram and the network weight of the final output of the CSPNet.
All steps involved in CSPNet are as follows:
first through the channel
Figure 786812DEST_PATH_IMAGE007
The image to be detected of the input CSPNet is divided into two parts. Wherein
Figure 171526DEST_PATH_IMAGE008
Is directly connected to the end of the CSPNet, and
Figure 250341DEST_PATH_IMAGE009
will pass through several partially dense layers.
Final output of several partial sealing layers
Figure 748318DEST_PATH_IMAGE010
Will go through a partial transition layer and then the output of the partial transition layerX T Will be mixed with
Figure 51123DEST_PATH_IMAGE011
Splicing and outputting the final characteristic diagramX U
The above equations (1) to (3) and (4) to (6) represent the forward propagation equation and the backward propagation equation of CSPNet, respectively.
From the formulae (1) to (6), those derived from CSPNet
Figure 1762DEST_PATH_IMAGE012
And feature mapping of images to be detected without CSPNet
Figure 290792DEST_PATH_IMAGE013
And the two sides do not contain repeated gradient information which belongs to the other side and is used for updating the weight, so that the model avoids excessive repeated gradient information and reduces the complexity of the model.
S32: the yolov5 feature fusion network uses a Path Aggregation Network (PANET) and has the function of further processing and fusing the to-be-detected image information processed in the backbone layer to generate a feature pyramid. The feature pyramid enhances the detection of the model for objects of different scaling dimensions, so that the same object of different size and dimensions can be identified.
The feature extraction network of the PANET adopts an improved FPN (feature Pyramid networks) structure, and improves the propagation of low-layer features in a neural network. Each stage of the via takes the feature maps of the previous stage as input and processes them with 3 x 3 convolutional layers, the output being added through side connections to the same stage feature map of the top-down via, which provides information for the next stage of processing. And fusing the feature maps of different layers together after self-adaptive pooling operation, transmitting the feature maps into a detector module at the tail end of the neural network model, and generating the related confidence coefficient and the prediction frame position information of each prediction category in the image to be detected.
The processing procedure of step S4 is as follows:
s41: the main process of neural network model training for target detection is that the pictures of the training set are transmitted into the constructed neural network after being packaged by fixed size and format to obtain the prediction result of the model, the loss of the output and the true value of the model is calculated, the gradient of the loss value is calculated, and finally the model parameters are updated by a gradient descent algorithm. The significance of introducing the loss function is that an abstract network training process is converted into a mathematical optimization problem, and model parameters are adjusted by finding an optimal solution of the loss function, so that the performance of the model is strongest.
S42: the loss function of the target detection algorithm mainly comprises a positioning loss function and a classification loss function, and the mathematical formulas are respectively expressed as follows:
Figure 643276DEST_PATH_IMAGE014
wherein
Figure 585824DEST_PATH_IMAGE015
And respectively represent the coordinates of the upper left corners of the real box and the predicted box and the length and width values of the box.
Figure 554917DEST_PATH_IMAGE016
Figure 975534DEST_PATH_IMAGE017
In the above formulaNThe total number of the categories is shown,
Figure 448104DEST_PATH_IMAGE018
for the prediction value of the current category,
Figure 14083DEST_PATH_IMAGE019
the probability of the current class obtained after the activation function,
Figure 939314DEST_PATH_IMAGE020
then the true value (0 or 1) for the current category,
Figure 163622DEST_PATH_IMAGE021
is a classification loss.
S43: in the yolov5 algorithm adopted by the invention, the part of the positioning loss function is different from other common target detection algorithms, and the CIOU is adopted as the positioning loss function in the invention, which is specifically shown as a formula:
Figure 959540DEST_PATH_IMAGE022
the meaning of the parameter in the formula:
IOU: and the intersection ratio of the prediction frame and the real frame.
V: the parameter for measuring the uniformity of the aspect ratio can also be defined as:
Figure 243890DEST_PATH_IMAGE023
compared with the traditional positioning loss function, the CIOU loss function adopted by the invention can better improve the accuracy when the frames with the overlapped, contained and abnormal sizes are overlapped between the prediction frame and the real frame, and is more beneficial to the training of a neural network model.
The processing procedure of step S5 is as follows:
s51: the single-target detection algorithm represented by the yolo series is widely applied to the industry and other life scenes focusing on real-time detection due to the extremely strong detection speed and excellent detection precision. The yolo series algorithm which is relatively mature in the past is represented by yolov3, and the yolo series algorithm achieves good balance in performance and speed and has extremely strong stability. However, the yolov3 target detection algorithm still has low requirements on the performance of the CPU and the GPU, so that the production and deployment difficulty is high, and the living application cost is high. At present, the latest yolov5 algorithm not only further improves the detection precision of the yolo algorithm, but also greatly reduces the size of the model, greatly improves the running speed again, and can be more suitably applied to various mobile terminals and small equipment by methods such as data enhancement and preprocessing, model improvement and feature fusion, output layer improvement, loss function improvement, NMS process improvement and the like at the input end.
S52: the backbone network (backbone) of yolov5 is mainly composed of a Conv module and a CSPNet module, and the two modules are used for carrying out feature extraction on images to different degrees. By controlling the width (width _ multiple) and the depth (depth _ multiple) of the network structure, the number of the Conv module and the CSPNet module in the whole neural network can be adjusted, so that the size and the complexity of the whole model can be flexibly controlled according to the specific requirements of production and life and the computing capacity of current hardware, and more suitable detection precision and detection speed can be obtained.
In consideration of the detection of smoking scenes in life, the method has the advantages of low feature complexity, high real-time requirement and extremely high requirements on mobile terminal and small equipment deployment, so that the network width and depth are properly reduced, the size and complexity of a model are greatly reduced, and the running speed is greatly improved.
S53: the Conv module and CSPNet module in the backbone network (backbone) of yolov5 are mainly constructed as the traditional convolution module in the interior, and the principle is to process the spatial information on each feature channel by deep convolution and then perform feature fusion between channels by point convolution. The method has high demand on the computing power of the equipment, has a certain degree of computing resource waste, is not beneficial to the application of small equipment with low computing power, and is not suitable for the instantaneity and the light weight which are emphasized by the invention.
The invention improves the Conv module and CSPNet module in yolov5 backbone network, introduces the Ghost idea, the core lies in using normal convolution to generate partial real feature map, then using these real feature maps to get the phantom feature layer (Ghost feature map) through linear transformation (cheep operations), finally the real feature layer and the phantom feature layer compose the complete feature layer. The network structure is designed to be Ghost-Conv and C3-Ghost, the model complexity is further reduced under the condition that the model precision is guaranteed, the model is lighter, the model is more suitable for the deployment of small equipment, and meanwhile, the running speed on large equipment is also unusual.
Fig. 1 shows a partial network structure diagram of yolov5 backbone network (backbone), and fig. 2 shows a partial network structure diagram of yolov5 backbone network (backbone) according to a modified embodiment of the present invention.
The processing procedure of step S6 is as follows:
s61: and importing the file path of the data set processed and divided in the S1 process into a yaml file of yolov5, running a program, reading the information of the picture for training and the position information of the marked real frame, and generating a related train cache file and a val cache file in the data set.
S62: setting relevant parameters for training a neural network model, setting the number of training rounds for deep learning to be 300, setting the batch-size to be 24 and setting the image-size to be 640, and training by using an RTX2060 video card;
s63: the method uses the tenarboard for visualization in the training process, monitors the convergence condition of the loss function and the improvement condition of the overall performance of the model in the training process in real time, and finishes training and saves the model file when detecting that the improvement of the training process of nearly 100 rounds is not obvious or the number of training rounds reaches 300.
The processing procedure of step S7 is as follows:
s71: performing model test by using the data set for test divided in the S1;
s72: importing the path of the test file into a yaml file in a main program of yolov5, executing a val.
S73: importing the trained model file into a detect.py program, inputting a path of a picture or a video to be detected, calculating the relevant confidence degrees of all classifications in the current image by the model, obtaining a final recognition frame by selecting the highest confidence degree, drawing the final recognition frame in the original image, and finishing the detection process;
the present application, as a whole, specifically implements the steps comprising:
dividing and importing a data set;
configuration and program implementation of a programming environment;
training a neural network;
and (5) testing the final effect.
The data set division refers to the steps of respectively setting the labeled data sets as a training set and a test set, then importing the labeled data sets into yaml files of yolov5, and generating corresponding train.
The configuration and program implementation of the programming environment are that under the windows 10 system, the programming environment is configured by using pycharm + python3.9, the pytorch and other auxiliary program packages of 1.9.1 are installed, and program computing acceleration is carried out by using an RTX2060 video card and installing cuda and cudnn with 11.1 version numbers.
And (3) training a neural network, namely constructing a network model frame by importing a data set, setting the number of training rounds to be 300, the batch-size to be 24, the image-size to be 640, setting other super parameters, setting GPU equipment and executing a train.
And the final effect test is to introduce the path of the test file into the yaml file in the main program of yolov5 after the training process of the neural network is completed, and execute a val. And finally, importing the trained model file into a detect.
The platform for training and testing the experiment is a suggested rescuer R7000P, and the specific hardware configuration is NVIDIA GeForce RTX2060 (6G) and AMD Ryzen 74800H with Radon Graphics.
Most of the data sets adopted in the experiment are acquired from the network, the format of the commonly acquired data sets is the xml format, and the xml format needs to be converted into the txt format by using codes. Meanwhile, the data set needs to be divided into a training set and a verification set, and the step utilizes the format conversion of the open source code experiment data set on the network and the division of the training set and the verification set. Tag files in xml format are stored in the data set, and photo data files are stored in the JPEGImages. Classes in the data partitioning code are used to correctly fill in the classes already labeled in xml, TRAIN _ RATIO is used to determine the proportion when the training set and the validation set are partitioned, e.g., when 80 is filled, it represents that 80% of data is partitioned into the training set and 20% of data is partitioned into the validation set. Running the code will generate images and labels folders under the current directory, and a train folder and a val folder are both generated under the two folders.
In the experiment, python is used as a compiling language, a deep learning framework of a pytorch is adopted, and the whole structure of a project code mainly comprises data, models, utils, weights, detect. Data is used for storing configuration files (yaml files), models are used for storing models, utils is used for storing tool class functions of yolov5, weights is used for storing trained weight files, and requirementss.
After the dependency package is downloaded, firstly, corresponding yaml files in the data directory are opened, training set addresses and verification set addresses in the yaml files are modified into data set addresses which are already made, then, model configuration files are modified in the models directory, model configuration files corresponding to pre-training weights are found, after the yaml files are opened, one of categories is modified to be 1, and only somking is detected in the example. After the training is finished, starting training, opening train, changing parameters such as weights, cfg and data, setting the number of training rounds to be 300, setting the pitch-size to be 24, setting num _ works to be 0, and starting training by using the GPU.
And after training, selecting best.pt in the correspondingly trained weights folder below the run folder in the root directory as a weight file. And applying the trained model weight file to a Map generation executable file and a test executable file to obtain an experimental result, and comparing the experimental result with a network training result before improvement, wherein the size is 13.7MB before improvement, the size is 7.39MB after improvement, the model accuracy before improvement is 0.841, and the model accuracy after improvement is 0.861, so that the validity of the technical scheme of the application is verified.
Example two
The object of the embodiment is to provide an image target detection system based on a lightweight neural network model, which includes:
a data input module configured to: inputting a path of a picture or a video to be detected;
an object detection module configured to: and calculating the related confidence degrees of all the classifications in the received current image by using the lightweight neural network model, obtaining a final recognition frame by selecting the highest confidence degree, and drawing in the original image to finish the detection process.
The invention relates to a neural network, deep learning, machine vision and target detection technology, which mainly uses the current latest single-stage target detection algorithm yolov5, and effectively reduces the complexity of a model by adjusting the width (width _ multiple) and the depth (depth _ multiple) of a network structure and improving Conv and CSPNet in a backbone network (backbone), under the condition of ensuring the precision of the model, thereby not only obtaining extremely fast running speed, but also enabling the light-weight model to smoothly run on various moving ends and small equipment easily, and meeting the actual requirements of various production lives.
Those skilled in the art will appreciate that the modules or steps of the present invention described above can be implemented using general purpose computer means, or alternatively, they can be implemented using program code that is executable by computing means, such that they are stored in memory means for execution by the computing means, or they are separately fabricated into individual integrated circuit modules, or multiple modules or steps of them are fabricated into a single integrated circuit module. The present invention is not limited to any specific combination of hardware and software.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims (10)

1. An image target detection method based on a lightweight neural network model is characterized by comprising the following steps:
inputting a path of a picture or a video to be detected;
and calculating the related confidence degrees of all the classifications in the received picture to be detected by using the lightweight neural network model, selecting the highest confidence degree to obtain a final recognition frame, drawing the recognition frame in the picture to be detected, and finishing the detection process.
2. The method for detecting the image target based on the light-weight neural network model as claimed in claim 1, wherein the light-weight neural network model comprises a backbone network and a feature fusion network;
the backbone network processes the picture to be detected by utilizing convolution to generate a real characteristic layer, and then linear transformation is carried out on the real characteristic layer to obtain a phantom characteristic layer;
the feature fusion network further processes and fuses the image information to be detected processed in the phantom feature layer to generate a feature pyramid, and the feature pyramid is used for carrying out enhanced detection on objects with different scaling sizes and identifying the same object with different sizes and scales.
3. The method for detecting the image target based on the light weight neural network model as claimed in claim 1, wherein the training process of the light weight neural network model is as follows:
acquiring a data set containing target behaviors;
wherein, the data set containing the target behavior is derived from an open source data set downloaded from a network, and for the obtained unmarked data set, a marking tool is used for marking: selecting a main body which can be identified after learning and training, and then storing the position and length and width data of the main body relative to the image size in a corresponding XML file, wherein the file name corresponds to the image file;
and randomly dividing the labeled data set into a training data set and a testing data set, wherein the training data set is used for training the lightweight neural network model, and the testing data set is used for testing the lightweight neural network model.
4. The method for detecting the image target based on the light-weight neural network model as claimed in claim 3, wherein the pictures of the training data set are encapsulated by a fixed size and format and then transmitted into the constructed backbone network and the feature fusion network to obtain the prediction result of the light-weight neural network model;
calculating the loss of the output and the true value of the lightweight neural network model, calculating the gradient of the loss value, updating the lightweight neural network model parameters by using a gradient descent algorithm, and adjusting the model parameters by searching the optimal solution of the loss function.
5. The method as claimed in claim 3, wherein the labeled test data set is processed to convert an XML-formatted file into a txt-formatted file that can be read correctly, and the corresponding file name and directory in the test data set are adjusted.
6. The method for detecting the image target based on the light weight neural network model as claimed in claim 2, wherein the backbone network and the feature fusion network are used for operating yolov5 target detection algorithm.
7. The method for detecting the image target based on the light weight neural network model as claimed in claim 1, wherein the light weight neural network model is applied to real-time detection of smoking scenes in daily life.
8. The method for detecting the image target based on the light weight neural network model as claimed in claim 2, wherein each stage of the feature fusion network path takes the feature map of the previous stage as input, and the feature map is processed by a convolution layer, and the output is added to the feature map of the same stage of the top-down path through lateral connection, and the feature maps provide information for the processing of the next stage; and fusing the feature maps of different layers together after self-adaptive pooling operation, and then detecting to generate the related confidence coefficient and the position information of a prediction frame of each prediction category in the image to be detected.
9. The method as claimed in claim 2, wherein the backbone network mainly comprises a Conv module and a CSPNet module, and the number of the Conv module and the CSPNet module in the whole neural network is adjusted by controlling the width and depth of the network structure.
10. An image target detection system based on a lightweight neural network model is characterized by comprising:
a data input module configured to: inputting a path of a picture or a video to be detected;
an object detection module configured to: and calculating the related confidence degrees of all the classifications in the received current image by using the lightweight neural network model, obtaining a final recognition frame by selecting the highest confidence degree, and drawing in the original image to finish the detection process.
CN202210234758.6A 2022-03-11 2022-03-11 Image target detection method and system based on lightweight neural network model Pending CN114332666A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210234758.6A CN114332666A (en) 2022-03-11 2022-03-11 Image target detection method and system based on lightweight neural network model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210234758.6A CN114332666A (en) 2022-03-11 2022-03-11 Image target detection method and system based on lightweight neural network model

Publications (1)

Publication Number Publication Date
CN114332666A true CN114332666A (en) 2022-04-12

Family

ID=81033190

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210234758.6A Pending CN114332666A (en) 2022-03-11 2022-03-11 Image target detection method and system based on lightweight neural network model

Country Status (1)

Country Link
CN (1) CN114332666A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114783069A (en) * 2022-06-21 2022-07-22 中山大学深圳研究院 Method, device, terminal equipment and storage medium for identifying object based on gait
CN114842315A (en) * 2022-05-07 2022-08-02 无锡雪浪数制科技有限公司 Anti-loosening identification method and device for lightweight high-speed rail hub gasket
CN115100495A (en) * 2022-07-08 2022-09-23 福州大学 Lightweight safety helmet detection method based on sub-feature fusion
CN115309301A (en) * 2022-05-17 2022-11-08 西北工业大学 Android mobile phone end-side AR interaction system based on deep learning
CN115439684A (en) * 2022-08-25 2022-12-06 艾迪恩(山东)科技有限公司 Household garbage classification method based on lightweight YOLOv5 and APP
CN115457297A (en) * 2022-08-23 2022-12-09 中国航空油料集团有限公司 Method and device for detecting oil leakage of aviation oil depot and aviation oil safety operation and maintenance system
CN116187398A (en) * 2022-05-16 2023-05-30 山东巍然智能科技有限公司 Method and equipment for constructing lightweight neural network for unmanned aerial vehicle ocean image detection

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113158956A (en) * 2021-04-30 2021-07-23 杭州电子科技大学 Garbage detection and identification method based on improved yolov5 network
CN113221838A (en) * 2021-06-02 2021-08-06 郑州大学 Deep learning-based civilized elevator taking detection system and method
CN114067211A (en) * 2021-11-22 2022-02-18 齐鲁工业大学 Lightweight safety helmet detection method and system for mobile terminal

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113158956A (en) * 2021-04-30 2021-07-23 杭州电子科技大学 Garbage detection and identification method based on improved yolov5 network
CN113221838A (en) * 2021-06-02 2021-08-06 郑州大学 Deep learning-based civilized elevator taking detection system and method
CN114067211A (en) * 2021-11-22 2022-02-18 齐鲁工业大学 Lightweight safety helmet detection method and system for mobile terminal

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
李青青等: "基于改进YOLOv5的叉车能效优化方法研究", 《现代计算机》 *
王凤娇: "船载智能终端数据管理系统设计与实现", 《万方学位论文库》 *
胡貌男等: "基于改进卷积神经网络的图像分类方法", 《通信技术》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114842315A (en) * 2022-05-07 2022-08-02 无锡雪浪数制科技有限公司 Anti-loosening identification method and device for lightweight high-speed rail hub gasket
CN114842315B (en) * 2022-05-07 2024-02-02 无锡雪浪数制科技有限公司 Looseness-prevention identification method and device for lightweight high-speed railway hub gasket
CN116187398A (en) * 2022-05-16 2023-05-30 山东巍然智能科技有限公司 Method and equipment for constructing lightweight neural network for unmanned aerial vehicle ocean image detection
CN116187398B (en) * 2022-05-16 2023-08-25 山东巍然智能科技有限公司 Method and equipment for constructing lightweight neural network for unmanned aerial vehicle ocean image detection
CN115309301A (en) * 2022-05-17 2022-11-08 西北工业大学 Android mobile phone end-side AR interaction system based on deep learning
CN114783069A (en) * 2022-06-21 2022-07-22 中山大学深圳研究院 Method, device, terminal equipment and storage medium for identifying object based on gait
CN115100495A (en) * 2022-07-08 2022-09-23 福州大学 Lightweight safety helmet detection method based on sub-feature fusion
CN115457297A (en) * 2022-08-23 2022-12-09 中国航空油料集团有限公司 Method and device for detecting oil leakage of aviation oil depot and aviation oil safety operation and maintenance system
CN115457297B (en) * 2022-08-23 2023-09-26 中国航空油料集团有限公司 Oil leakage detection method and device for aviation oil depot and aviation oil safety operation and maintenance system
CN115439684A (en) * 2022-08-25 2022-12-06 艾迪恩(山东)科技有限公司 Household garbage classification method based on lightweight YOLOv5 and APP
CN115439684B (en) * 2022-08-25 2024-02-02 艾迪恩(山东)科技有限公司 Household garbage classification method and APP based on lightweight YOLOv5

Similar Documents

Publication Publication Date Title
CN114332666A (en) Image target detection method and system based on lightweight neural network model
CN112084331B (en) Text processing and model training method and device, computer equipment and storage medium
CN111598190B (en) Training method of image target recognition model, image recognition method and device
CN113011282A (en) Graph data processing method and device, electronic equipment and computer storage medium
US20190228273A1 (en) Identifying parameter image adjustments using image variation and sequential processing
CN112633459A (en) Method for training neural network, data processing method and related device
CN114820871B (en) Font generation method, model training method, device, equipment and medium
CN110796199A (en) Image processing method and device and electronic medical equipment
CN117033609B (en) Text visual question-answering method, device, computer equipment and storage medium
CN111368545A (en) Named entity identification method and device based on multi-task learning
CN115222950A (en) Lightweight target detection method for embedded platform
CN115956247A (en) Neural network model optimization method and device
WO2021169366A1 (en) Data enhancement method and apparatus
CN111126358A (en) Face detection method, face detection device, storage medium and equipment
CN114580510A (en) Bone marrow cell fine-grained classification method, system, computer device and storage medium
CN113870863A (en) Voiceprint recognition method and device, storage medium and electronic equipment
CN115525263A (en) Training method of code completion model, code completion method and device
CN112966815A (en) Target detection method, system and equipment based on impulse neural network
CN117708698A (en) Class determination method, device, equipment and storage medium
US11561326B1 (en) System and method for generating accurate hyperlocal nowcasts
CN112861601A (en) Method for generating confrontation sample and related equipment
US20230244932A1 (en) Image occlusion method, model training method, device, and storage medium
CN115546907A (en) In-vivo detection method and system for multi-scale feature aggregation
CN116362301A (en) Model quantization method and related equipment
CN117034133A (en) Data processing method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Country or region after: China

Address after: 250353 University Road, Changqing District, Ji'nan, Shandong Province, No. 3501

Applicant after: Qilu University of Technology (Shandong Academy of Sciences)

Address before: 250353 University Road, Changqing District, Ji'nan, Shandong Province, No. 3501

Applicant before: Qilu University of Technology

Country or region before: China