CN114332666A - Image target detection method and system based on lightweight neural network model - Google Patents
Image target detection method and system based on lightweight neural network model Download PDFInfo
- Publication number
- CN114332666A CN114332666A CN202210234758.6A CN202210234758A CN114332666A CN 114332666 A CN114332666 A CN 114332666A CN 202210234758 A CN202210234758 A CN 202210234758A CN 114332666 A CN114332666 A CN 114332666A
- Authority
- CN
- China
- Prior art keywords
- neural network
- network model
- model
- image
- data set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 66
- 238000003062 neural network model Methods 0.000 title claims abstract description 49
- 238000000034 method Methods 0.000 claims abstract description 58
- 230000008569 process Effects 0.000 claims abstract description 33
- 238000012549 training Methods 0.000 claims description 57
- 238000012360 testing method Methods 0.000 claims description 24
- 230000006870 function Effects 0.000 claims description 20
- 230000004927 fusion Effects 0.000 claims description 17
- 238000012545 processing Methods 0.000 claims description 16
- 238000013528 artificial neural network Methods 0.000 claims description 11
- 230000006399 behavior Effects 0.000 claims description 10
- 230000000391 smoking effect Effects 0.000 claims description 10
- 238000011897 real-time detection Methods 0.000 claims description 4
- 238000011176 pooling Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 239000000779 smoke Substances 0.000 abstract description 2
- 230000006872 improvement Effects 0.000 description 13
- 238000013135 deep learning Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 7
- 238000004519 manufacturing process Methods 0.000 description 7
- 238000002474 experimental method Methods 0.000 description 4
- 230000007704 transition Effects 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 238000002372 labelling Methods 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 235000019504 cigarettes Nutrition 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000010200 validation analysis Methods 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000036544 posture Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 229910052704 radon Inorganic materials 0.000 description 1
- SYUHGPGVQRZVTB-UHFFFAOYSA-N radon atom Chemical compound [Rn] SYUHGPGVQRZVTB-UHFFFAOYSA-N 0.000 description 1
- 230000003014 reinforcing effect Effects 0.000 description 1
- 238000007789 sealing Methods 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Landscapes
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention provides an image target detection method and system based on a lightweight neural network model, belongs to the technical field of image target detection, solves the problem of inaccurate image identification at present, and comprises the following steps: inputting a path of a picture or a video to be detected; and calculating the related confidence degrees of all the classifications in the received current image by using the lightweight neural network model, obtaining a final recognition frame by selecting the highest confidence degree, and drawing in the original image to finish the detection process. Under the condition of ensuring the model precision, the method greatly improves the running speed of the model, so that the model can be deployed and applied smoothly at small equipment and a mobile terminal, and the real-time performance and the accuracy of the smoke detection in daily scenes are met.
Description
Technical Field
The invention belongs to the technical field of image target detection, and particularly relates to an image target detection method and system based on a lightweight neural network model.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
The task of Object Detection (Object Detection) is to find all objects of interest (objects) in an image, determine their category and location, and is one of the core problems in the field of machine vision. Because various objects have different appearances, shapes and postures, and interference of factors such as illumination, shielding and the like during imaging is added, target detection is always the most challenging problem in the field of machine vision.
The existing research shows that a reliable target detection algorithm is the basis for realizing automatic analysis and understanding of complex scenes. Therefore, image target detection is a basic task in the field of computer vision, and the performance of the image target detection directly affects the performance of middle and high-level tasks such as subsequent target tracking, action recognition and behavior understanding, and further determines the performance of subsequent AI (artificial intelligence) applications such as face detection, behavior description, traffic scene object recognition, content-based internet image retrieval and the like. As these AI applications have penetrated aspects of people's production and life, the target detection techniques have reduced the burden on people to some extent, changing the lifestyle of people.
In recent years, with the continuous update iteration of a GPU (graphics processing unit) having a high-performance computing capability, a target detection algorithm based on deep learning has been developed very rapidly. The unitarget detection algorithm represented by yolo obtains extremely strong image feature extraction and fusion capability through the addition of a neural network and deep learning, and has stronger performance, stability and generalization capability compared with the traditional sliding window type target detection algorithm.
However, the high-performance GPU is relatively expensive to manufacture, relatively immobile, and only suitable for training of models, but not for model deployment and application in actual production and life.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides the image target detection method based on the lightweight neural network model, which greatly improves the running speed of the model under the condition of ensuring the precision of the model, so that the model can be smoothly deployed and applied at small equipment and a mobile terminal, and the real-time performance and the accuracy of detection in daily scenes are met.
In order to achieve the above object, one or more embodiments of the present invention provide the following technical solutions:
in a first aspect, an image target detection method based on a lightweight neural network model is disclosed, which includes:
inputting a path of a picture or a video to be detected;
and calculating the related confidence degrees of all the classifications in the received picture to be detected by using the lightweight neural network model, selecting the highest confidence degree to obtain a final recognition frame, drawing the recognition frame in the picture to be detected, and finishing the detection process.
According to a further technical scheme, the lightweight neural network model comprises a backbone network and a feature fusion network;
the backbone network processes the picture to be detected by utilizing convolution to generate a real characteristic layer, and then linear transformation is carried out on the real characteristic layer to obtain a phantom characteristic layer;
the feature fusion network further processes and fuses the image information to be detected processed in the phantom feature layer to generate a feature pyramid, and the feature pyramid is used for carrying out enhanced detection on objects with different scaling sizes and identifying the same object with different sizes and scales.
According to a further technical scheme, the training process of the lightweight neural network model comprises the following steps:
acquiring a data set containing target behaviors;
wherein, the data set containing the target behavior is derived from an open source data set downloaded from a network, and for the obtained unmarked data set, a marking tool is used for marking: selecting a main body which can be identified after learning and training, and then storing the position and length and width data of the main body relative to the image size in a corresponding XML file, wherein the file name corresponds to the image file;
and randomly dividing the labeled data set into a training data set and a testing data set, wherein the training data set is used for training the lightweight neural network model, and the testing data set is used for testing the lightweight neural network model.
According to the further technical scheme, pictures of the training data set are packaged by fixed size and format and then transmitted into a constructed backbone network and a constructed feature fusion network, and a prediction result of the lightweight neural network model is obtained;
calculating the loss of the output and the true value of the lightweight neural network model, calculating the gradient of the loss value, updating the lightweight neural network model parameters by using a gradient descent algorithm, and adjusting the model parameters by searching the optimal solution of the loss function.
According to the further technical scheme, the marked test data set is processed, the XML-format file is converted into a txt-format file which can be read correctly, and the corresponding file name and directory in the test data set are adjusted.
In a further technical scheme, the backbone network and the feature fusion network are used for operating yolov5 target detection algorithm.
According to the further technical scheme, the lightweight neural network model is applied to real-time detection of daily smoking scenes.
According to the further technical scheme, each stage of the feature fusion network path takes the feature mapping of the previous stage as input, convolution layer processing is used, output is added to feature graphs of the same stage of the top-down path through side position connection, and the feature graphs provide information for processing of the next stage; and fusing the feature maps of different layers together after self-adaptive pooling operation, and then detecting to generate the related confidence coefficient and the position information of a prediction frame of each prediction category in the image to be detected.
In a further technical scheme, the backbone network mainly comprises a Conv module and a CSPNet module, and the number of the Conv module and the CSPNet module in the whole neural network is adjusted by controlling the width and the depth of a network structure.
In a second aspect, an image target detection system based on a lightweight neural network model is disclosed, which includes:
a data input module configured to: inputting a path of a picture or a video to be detected;
an object detection module configured to: and calculating the related confidence degrees of all the classifications in the received picture to be detected by using the lightweight neural network model, selecting the highest confidence degree to obtain a final recognition frame, drawing the recognition frame in the picture to be detected, and finishing the detection process.
The above one or more technical solutions have the following beneficial effects:
the yolov5 target detection algorithm is applied to real-time detection of daily smoking scenes. The width (width _ multiple) and the depth (depth _ multiple) of the yolov5 default network structure are adjusted, the size of the whole model is reduced aiming at the smoking detection, and the running speed of the model is improved. Meanwhile, the method integrates two network modules, namely, the Ghost-Conv network module and the C3-Ghost network module to improve a default model, the size of the improved model is reduced by 46%, and the precision is improved by 2%. Under the condition of ensuring the precision of the model, the running speed of the model is greatly improved by the improvement, so that the model can be smoothly deployed and applied at small equipment and a mobile terminal, and the real-time performance and the accuracy of smoke detection in daily scenes are met.
Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.
FIG. 1 is a partial network structure diagram of yolov5 backbone network (backbone);
fig. 2 is a modified partial network structure diagram of yolov5 backbone network (backbone) according to the embodiment of the present invention.
Detailed Description
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention.
The embodiments and features of the embodiments of the present invention may be combined with each other without conflict.
At present, the mainstream research direction of the target detection algorithm based on deep learning is a continuous lightweight neural network model, and the operation capability of the model at small equipment and a mobile terminal is continuously improved, so that the model is better applied to actual production and life, and more social and economic benefits are created.
Example one
The embodiment describes an image target detection method based on a lightweight neural network model, and takes smoking behavior detection as an example, but the embodiment can also be applied to image detection of other target behaviors.
The method comprises the following steps:
s1: making a data set for deep learning training and testing, and dividing and processing the whole data set;
s2: configuring python and pytorch programming environments for neural network model training and testing;
s3: constructing a backbone network and a feature fusion network required by realizing a yolov5 target detection algorithm, wherein the backbone network is used for extracting useful features in an image to be detected, the feature fusion network is used for reinforcing the useful features extracted by the backbone network, and outputting a final feature map of the image to be detected;
s4: defining the loss functions (classification loss and positioning loss) of yolov5 target detection algorithm;
s5: the width (width _ multiple) and the depth (depth _ multiple) of the yolov5 default network structure are adjusted, the size of the whole model is reduced aiming at the smoking detection, and the running speed of the model is improved. Meanwhile, the method integrates two network modules, namely, the Ghost-Conv network module and the C3-Ghost network module to improve a default model, the size of the improved model is reduced by 46%, and the precision is improved by 2%. Under the condition of ensuring the accuracy of the model, the improvement greatly improves the running speed of the model, so that the model can be deployed and applied smoothly at small equipment and a mobile terminal, and the real-time property of target detection is met;
s6: training a neural network model by using the manufactured data set, calculating the overall performance of the model under the current training round number when the loss function is not converged, and keeping an optimal model file after multiple times of training;
s7: and the light-weighted optimal model is tested, so that the effect is real and effective.
The processing procedure of step S1 is as follows:
s11: the data set adopted in the training process of the algorithm is 4860 smoking behavior data sets, wherein the data set mainly comes from network downloading and other marked development data sets. Only labeling the cigarettes in the data set, labeling the cigarette main body as smocking when smoking behavior is detected, otherwise, not processing;
and S12, marking the acquired unmarked data set by using a marking tool, namely labeling the acquired unmarked data set in an XML format. The main process is as follows: selecting a main body which can be identified by an expected algorithm after learning and training, and then storing the position and length and width data of the main body relative to the image size in a corresponding XML file, wherein the file name corresponds to the image file;
s13: 4860 labeled datasets were randomly partitioned, with 3791 images used for the training process of the neural network and 1069 images used for the testing process. The training process is used for obtaining a final neural network model, and the testing process is used for testing the reliability and accuracy of the model;
s14: and aiming at the marked smoking detection data set, converting the XML-format file into a txt-format file which can be read correctly by the pytoch, and adjusting the corresponding file name and directory according to the program.
The processing procedure of step S2 is as follows:
s21: the virtual environment is mainly built through the pycharm, which is a piece of programming software developed aiming at python, has strong code writing and management capabilities, and can enrich plug-ins, extensions and program packages of third parties. Meanwhile, the pycharm has a self-contained virtual programming environment setting function, and when a project is newly built, a programming environment which is completely isolated from the local can be constructed by selecting to newly build a virtual environment at the same time, so that the subsequent code compiling and debugging of the user are facilitated;
s22: the method adopts a pytorech deep learning framework for training and developing, and selects a local python interpreter after a virtual environment is newly built in the process. And then installing a main body framework of the pyrrch and other necessary software packages for realizing third-party scientific calculation and auxiliary functions. The invention uses python3.9, pytorch1.9+ cu 111;
s23: a large amount of parallel computation is generated in the deep learning neural network model training process, so a GPU device with high performance parallel computation capability is required to accelerate the model training in the training process. The invention uses the Nvidia RTX2060 GPU to train and develop a target detection algorithm, in order to ensure the compatibility of equipment and ensure that the GPU can normally participate in the calculation process, cuda and cudnn drivers need to be installed, and the version number of the cuda and cudnn drivers selected by the invention is 11.1;
the processing procedure of step S3 is as follows:
s31, the network structure of yolov5 is mainly divided into a backbone network (backbone) and a feature fusion network (head), wherein the backbone network mainly uses a CSPNet (cross Stage Partial networks) cross-Stage local network module which is used for extracting useful features in an image to be detected, and meanwhile, the network structure has better performance while reducing the calculated amount. The CSPNet solves the problems of repeated gradient information and gradient disappearance of network optimization in a backsbone layer of other neural network models, reduces the parameter number and FLOPS (model complexity index) value of the models, ensures the reasoning speed and accuracy and reduces the size of the models.
The CSPNet is composed of a plurality of partial dense layers and a partial transition layer. The processing procedure of the to-be-detected image by the cross-phase local network represented by CSPNet can be represented by the following formula:
k in the formula represents the number of partially dense layers,X k a characteristic diagram representing the output by the k-th partially dense layer,W k andg k the network weights and gradients of the k-th layer partially dense layer are represented, respectively.X T 、W T Andg T respectively representing the feature map, the network weight and the gradient of partial transition layer output,X U andW U respectively representing the characteristic diagram and the network weight of the final output of the CSPNet.
All steps involved in CSPNet are as follows:
first through the channelThe image to be detected of the input CSPNet is divided into two parts. WhereinIs directly connected to the end of the CSPNet, andwill pass through several partially dense layers.
Final output of several partial sealing layersWill go through a partial transition layer and then the output of the partial transition layerX T Will be mixed withSplicing and outputting the final characteristic diagramX U 。
The above equations (1) to (3) and (4) to (6) represent the forward propagation equation and the backward propagation equation of CSPNet, respectively.
From the formulae (1) to (6), those derived from CSPNetAnd feature mapping of images to be detected without CSPNetAnd the two sides do not contain repeated gradient information which belongs to the other side and is used for updating the weight, so that the model avoids excessive repeated gradient information and reduces the complexity of the model.
S32: the yolov5 feature fusion network uses a Path Aggregation Network (PANET) and has the function of further processing and fusing the to-be-detected image information processed in the backbone layer to generate a feature pyramid. The feature pyramid enhances the detection of the model for objects of different scaling dimensions, so that the same object of different size and dimensions can be identified.
The feature extraction network of the PANET adopts an improved FPN (feature Pyramid networks) structure, and improves the propagation of low-layer features in a neural network. Each stage of the via takes the feature maps of the previous stage as input and processes them with 3 x 3 convolutional layers, the output being added through side connections to the same stage feature map of the top-down via, which provides information for the next stage of processing. And fusing the feature maps of different layers together after self-adaptive pooling operation, transmitting the feature maps into a detector module at the tail end of the neural network model, and generating the related confidence coefficient and the prediction frame position information of each prediction category in the image to be detected.
The processing procedure of step S4 is as follows:
s41: the main process of neural network model training for target detection is that the pictures of the training set are transmitted into the constructed neural network after being packaged by fixed size and format to obtain the prediction result of the model, the loss of the output and the true value of the model is calculated, the gradient of the loss value is calculated, and finally the model parameters are updated by a gradient descent algorithm. The significance of introducing the loss function is that an abstract network training process is converted into a mathematical optimization problem, and model parameters are adjusted by finding an optimal solution of the loss function, so that the performance of the model is strongest.
S42: the loss function of the target detection algorithm mainly comprises a positioning loss function and a classification loss function, and the mathematical formulas are respectively expressed as follows:
whereinAnd respectively represent the coordinates of the upper left corners of the real box and the predicted box and the length and width values of the box.
In the above formulaNThe total number of the categories is shown,for the prediction value of the current category,the probability of the current class obtained after the activation function,then the true value (0 or 1) for the current category,is a classification loss.
S43: in the yolov5 algorithm adopted by the invention, the part of the positioning loss function is different from other common target detection algorithms, and the CIOU is adopted as the positioning loss function in the invention, which is specifically shown as a formula:
the meaning of the parameter in the formula:
IOU: and the intersection ratio of the prediction frame and the real frame.
V: the parameter for measuring the uniformity of the aspect ratio can also be defined as:
compared with the traditional positioning loss function, the CIOU loss function adopted by the invention can better improve the accuracy when the frames with the overlapped, contained and abnormal sizes are overlapped between the prediction frame and the real frame, and is more beneficial to the training of a neural network model.
The processing procedure of step S5 is as follows:
s51: the single-target detection algorithm represented by the yolo series is widely applied to the industry and other life scenes focusing on real-time detection due to the extremely strong detection speed and excellent detection precision. The yolo series algorithm which is relatively mature in the past is represented by yolov3, and the yolo series algorithm achieves good balance in performance and speed and has extremely strong stability. However, the yolov3 target detection algorithm still has low requirements on the performance of the CPU and the GPU, so that the production and deployment difficulty is high, and the living application cost is high. At present, the latest yolov5 algorithm not only further improves the detection precision of the yolo algorithm, but also greatly reduces the size of the model, greatly improves the running speed again, and can be more suitably applied to various mobile terminals and small equipment by methods such as data enhancement and preprocessing, model improvement and feature fusion, output layer improvement, loss function improvement, NMS process improvement and the like at the input end.
S52: the backbone network (backbone) of yolov5 is mainly composed of a Conv module and a CSPNet module, and the two modules are used for carrying out feature extraction on images to different degrees. By controlling the width (width _ multiple) and the depth (depth _ multiple) of the network structure, the number of the Conv module and the CSPNet module in the whole neural network can be adjusted, so that the size and the complexity of the whole model can be flexibly controlled according to the specific requirements of production and life and the computing capacity of current hardware, and more suitable detection precision and detection speed can be obtained.
In consideration of the detection of smoking scenes in life, the method has the advantages of low feature complexity, high real-time requirement and extremely high requirements on mobile terminal and small equipment deployment, so that the network width and depth are properly reduced, the size and complexity of a model are greatly reduced, and the running speed is greatly improved.
S53: the Conv module and CSPNet module in the backbone network (backbone) of yolov5 are mainly constructed as the traditional convolution module in the interior, and the principle is to process the spatial information on each feature channel by deep convolution and then perform feature fusion between channels by point convolution. The method has high demand on the computing power of the equipment, has a certain degree of computing resource waste, is not beneficial to the application of small equipment with low computing power, and is not suitable for the instantaneity and the light weight which are emphasized by the invention.
The invention improves the Conv module and CSPNet module in yolov5 backbone network, introduces the Ghost idea, the core lies in using normal convolution to generate partial real feature map, then using these real feature maps to get the phantom feature layer (Ghost feature map) through linear transformation (cheep operations), finally the real feature layer and the phantom feature layer compose the complete feature layer. The network structure is designed to be Ghost-Conv and C3-Ghost, the model complexity is further reduced under the condition that the model precision is guaranteed, the model is lighter, the model is more suitable for the deployment of small equipment, and meanwhile, the running speed on large equipment is also unusual.
Fig. 1 shows a partial network structure diagram of yolov5 backbone network (backbone), and fig. 2 shows a partial network structure diagram of yolov5 backbone network (backbone) according to a modified embodiment of the present invention.
The processing procedure of step S6 is as follows:
s61: and importing the file path of the data set processed and divided in the S1 process into a yaml file of yolov5, running a program, reading the information of the picture for training and the position information of the marked real frame, and generating a related train cache file and a val cache file in the data set.
S62: setting relevant parameters for training a neural network model, setting the number of training rounds for deep learning to be 300, setting the batch-size to be 24 and setting the image-size to be 640, and training by using an RTX2060 video card;
s63: the method uses the tenarboard for visualization in the training process, monitors the convergence condition of the loss function and the improvement condition of the overall performance of the model in the training process in real time, and finishes training and saves the model file when detecting that the improvement of the training process of nearly 100 rounds is not obvious or the number of training rounds reaches 300.
The processing procedure of step S7 is as follows:
s71: performing model test by using the data set for test divided in the S1;
s72: importing the path of the test file into a yaml file in a main program of yolov5, executing a val.
S73: importing the trained model file into a detect.py program, inputting a path of a picture or a video to be detected, calculating the relevant confidence degrees of all classifications in the current image by the model, obtaining a final recognition frame by selecting the highest confidence degree, drawing the final recognition frame in the original image, and finishing the detection process;
the present application, as a whole, specifically implements the steps comprising:
dividing and importing a data set;
configuration and program implementation of a programming environment;
training a neural network;
and (5) testing the final effect.
The data set division refers to the steps of respectively setting the labeled data sets as a training set and a test set, then importing the labeled data sets into yaml files of yolov5, and generating corresponding train.
The configuration and program implementation of the programming environment are that under the windows 10 system, the programming environment is configured by using pycharm + python3.9, the pytorch and other auxiliary program packages of 1.9.1 are installed, and program computing acceleration is carried out by using an RTX2060 video card and installing cuda and cudnn with 11.1 version numbers.
And (3) training a neural network, namely constructing a network model frame by importing a data set, setting the number of training rounds to be 300, the batch-size to be 24, the image-size to be 640, setting other super parameters, setting GPU equipment and executing a train.
And the final effect test is to introduce the path of the test file into the yaml file in the main program of yolov5 after the training process of the neural network is completed, and execute a val. And finally, importing the trained model file into a detect.
The platform for training and testing the experiment is a suggested rescuer R7000P, and the specific hardware configuration is NVIDIA GeForce RTX2060 (6G) and AMD Ryzen 74800H with Radon Graphics.
Most of the data sets adopted in the experiment are acquired from the network, the format of the commonly acquired data sets is the xml format, and the xml format needs to be converted into the txt format by using codes. Meanwhile, the data set needs to be divided into a training set and a verification set, and the step utilizes the format conversion of the open source code experiment data set on the network and the division of the training set and the verification set. Tag files in xml format are stored in the data set, and photo data files are stored in the JPEGImages. Classes in the data partitioning code are used to correctly fill in the classes already labeled in xml, TRAIN _ RATIO is used to determine the proportion when the training set and the validation set are partitioned, e.g., when 80 is filled, it represents that 80% of data is partitioned into the training set and 20% of data is partitioned into the validation set. Running the code will generate images and labels folders under the current directory, and a train folder and a val folder are both generated under the two folders.
In the experiment, python is used as a compiling language, a deep learning framework of a pytorch is adopted, and the whole structure of a project code mainly comprises data, models, utils, weights, detect. Data is used for storing configuration files (yaml files), models are used for storing models, utils is used for storing tool class functions of yolov5, weights is used for storing trained weight files, and requirementss.
After the dependency package is downloaded, firstly, corresponding yaml files in the data directory are opened, training set addresses and verification set addresses in the yaml files are modified into data set addresses which are already made, then, model configuration files are modified in the models directory, model configuration files corresponding to pre-training weights are found, after the yaml files are opened, one of categories is modified to be 1, and only somking is detected in the example. After the training is finished, starting training, opening train, changing parameters such as weights, cfg and data, setting the number of training rounds to be 300, setting the pitch-size to be 24, setting num _ works to be 0, and starting training by using the GPU.
And after training, selecting best.pt in the correspondingly trained weights folder below the run folder in the root directory as a weight file. And applying the trained model weight file to a Map generation executable file and a test executable file to obtain an experimental result, and comparing the experimental result with a network training result before improvement, wherein the size is 13.7MB before improvement, the size is 7.39MB after improvement, the model accuracy before improvement is 0.841, and the model accuracy after improvement is 0.861, so that the validity of the technical scheme of the application is verified.
Example two
The object of the embodiment is to provide an image target detection system based on a lightweight neural network model, which includes:
a data input module configured to: inputting a path of a picture or a video to be detected;
an object detection module configured to: and calculating the related confidence degrees of all the classifications in the received current image by using the lightweight neural network model, obtaining a final recognition frame by selecting the highest confidence degree, and drawing in the original image to finish the detection process.
The invention relates to a neural network, deep learning, machine vision and target detection technology, which mainly uses the current latest single-stage target detection algorithm yolov5, and effectively reduces the complexity of a model by adjusting the width (width _ multiple) and the depth (depth _ multiple) of a network structure and improving Conv and CSPNet in a backbone network (backbone), under the condition of ensuring the precision of the model, thereby not only obtaining extremely fast running speed, but also enabling the light-weight model to smoothly run on various moving ends and small equipment easily, and meeting the actual requirements of various production lives.
Those skilled in the art will appreciate that the modules or steps of the present invention described above can be implemented using general purpose computer means, or alternatively, they can be implemented using program code that is executable by computing means, such that they are stored in memory means for execution by the computing means, or they are separately fabricated into individual integrated circuit modules, or multiple modules or steps of them are fabricated into a single integrated circuit module. The present invention is not limited to any specific combination of hardware and software.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.
Claims (10)
1. An image target detection method based on a lightweight neural network model is characterized by comprising the following steps:
inputting a path of a picture or a video to be detected;
and calculating the related confidence degrees of all the classifications in the received picture to be detected by using the lightweight neural network model, selecting the highest confidence degree to obtain a final recognition frame, drawing the recognition frame in the picture to be detected, and finishing the detection process.
2. The method for detecting the image target based on the light-weight neural network model as claimed in claim 1, wherein the light-weight neural network model comprises a backbone network and a feature fusion network;
the backbone network processes the picture to be detected by utilizing convolution to generate a real characteristic layer, and then linear transformation is carried out on the real characteristic layer to obtain a phantom characteristic layer;
the feature fusion network further processes and fuses the image information to be detected processed in the phantom feature layer to generate a feature pyramid, and the feature pyramid is used for carrying out enhanced detection on objects with different scaling sizes and identifying the same object with different sizes and scales.
3. The method for detecting the image target based on the light weight neural network model as claimed in claim 1, wherein the training process of the light weight neural network model is as follows:
acquiring a data set containing target behaviors;
wherein, the data set containing the target behavior is derived from an open source data set downloaded from a network, and for the obtained unmarked data set, a marking tool is used for marking: selecting a main body which can be identified after learning and training, and then storing the position and length and width data of the main body relative to the image size in a corresponding XML file, wherein the file name corresponds to the image file;
and randomly dividing the labeled data set into a training data set and a testing data set, wherein the training data set is used for training the lightweight neural network model, and the testing data set is used for testing the lightweight neural network model.
4. The method for detecting the image target based on the light-weight neural network model as claimed in claim 3, wherein the pictures of the training data set are encapsulated by a fixed size and format and then transmitted into the constructed backbone network and the feature fusion network to obtain the prediction result of the light-weight neural network model;
calculating the loss of the output and the true value of the lightweight neural network model, calculating the gradient of the loss value, updating the lightweight neural network model parameters by using a gradient descent algorithm, and adjusting the model parameters by searching the optimal solution of the loss function.
5. The method as claimed in claim 3, wherein the labeled test data set is processed to convert an XML-formatted file into a txt-formatted file that can be read correctly, and the corresponding file name and directory in the test data set are adjusted.
6. The method for detecting the image target based on the light weight neural network model as claimed in claim 2, wherein the backbone network and the feature fusion network are used for operating yolov5 target detection algorithm.
7. The method for detecting the image target based on the light weight neural network model as claimed in claim 1, wherein the light weight neural network model is applied to real-time detection of smoking scenes in daily life.
8. The method for detecting the image target based on the light weight neural network model as claimed in claim 2, wherein each stage of the feature fusion network path takes the feature map of the previous stage as input, and the feature map is processed by a convolution layer, and the output is added to the feature map of the same stage of the top-down path through lateral connection, and the feature maps provide information for the processing of the next stage; and fusing the feature maps of different layers together after self-adaptive pooling operation, and then detecting to generate the related confidence coefficient and the position information of a prediction frame of each prediction category in the image to be detected.
9. The method as claimed in claim 2, wherein the backbone network mainly comprises a Conv module and a CSPNet module, and the number of the Conv module and the CSPNet module in the whole neural network is adjusted by controlling the width and depth of the network structure.
10. An image target detection system based on a lightweight neural network model is characterized by comprising:
a data input module configured to: inputting a path of a picture or a video to be detected;
an object detection module configured to: and calculating the related confidence degrees of all the classifications in the received current image by using the lightweight neural network model, obtaining a final recognition frame by selecting the highest confidence degree, and drawing in the original image to finish the detection process.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210234758.6A CN114332666A (en) | 2022-03-11 | 2022-03-11 | Image target detection method and system based on lightweight neural network model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210234758.6A CN114332666A (en) | 2022-03-11 | 2022-03-11 | Image target detection method and system based on lightweight neural network model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114332666A true CN114332666A (en) | 2022-04-12 |
Family
ID=81033190
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210234758.6A Pending CN114332666A (en) | 2022-03-11 | 2022-03-11 | Image target detection method and system based on lightweight neural network model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114332666A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114783069A (en) * | 2022-06-21 | 2022-07-22 | 中山大学深圳研究院 | Method, device, terminal equipment and storage medium for identifying object based on gait |
CN114842315A (en) * | 2022-05-07 | 2022-08-02 | 无锡雪浪数制科技有限公司 | Anti-loosening identification method and device for lightweight high-speed rail hub gasket |
CN115100495A (en) * | 2022-07-08 | 2022-09-23 | 福州大学 | Lightweight safety helmet detection method based on sub-feature fusion |
CN115309301A (en) * | 2022-05-17 | 2022-11-08 | 西北工业大学 | Android mobile phone end-side AR interaction system based on deep learning |
CN115439684A (en) * | 2022-08-25 | 2022-12-06 | 艾迪恩(山东)科技有限公司 | Household garbage classification method based on lightweight YOLOv5 and APP |
CN115457297A (en) * | 2022-08-23 | 2022-12-09 | 中国航空油料集团有限公司 | Method and device for detecting oil leakage of aviation oil depot and aviation oil safety operation and maintenance system |
CN116187398A (en) * | 2022-05-16 | 2023-05-30 | 山东巍然智能科技有限公司 | Method and equipment for constructing lightweight neural network for unmanned aerial vehicle ocean image detection |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113158956A (en) * | 2021-04-30 | 2021-07-23 | 杭州电子科技大学 | Garbage detection and identification method based on improved yolov5 network |
CN113221838A (en) * | 2021-06-02 | 2021-08-06 | 郑州大学 | Deep learning-based civilized elevator taking detection system and method |
CN114067211A (en) * | 2021-11-22 | 2022-02-18 | 齐鲁工业大学 | Lightweight safety helmet detection method and system for mobile terminal |
-
2022
- 2022-03-11 CN CN202210234758.6A patent/CN114332666A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113158956A (en) * | 2021-04-30 | 2021-07-23 | 杭州电子科技大学 | Garbage detection and identification method based on improved yolov5 network |
CN113221838A (en) * | 2021-06-02 | 2021-08-06 | 郑州大学 | Deep learning-based civilized elevator taking detection system and method |
CN114067211A (en) * | 2021-11-22 | 2022-02-18 | 齐鲁工业大学 | Lightweight safety helmet detection method and system for mobile terminal |
Non-Patent Citations (3)
Title |
---|
李青青等: "基于改进YOLOv5的叉车能效优化方法研究", 《现代计算机》 * |
王凤娇: "船载智能终端数据管理系统设计与实现", 《万方学位论文库》 * |
胡貌男等: "基于改进卷积神经网络的图像分类方法", 《通信技术》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114842315A (en) * | 2022-05-07 | 2022-08-02 | 无锡雪浪数制科技有限公司 | Anti-loosening identification method and device for lightweight high-speed rail hub gasket |
CN114842315B (en) * | 2022-05-07 | 2024-02-02 | 无锡雪浪数制科技有限公司 | Looseness-prevention identification method and device for lightweight high-speed railway hub gasket |
CN116187398A (en) * | 2022-05-16 | 2023-05-30 | 山东巍然智能科技有限公司 | Method and equipment for constructing lightweight neural network for unmanned aerial vehicle ocean image detection |
CN116187398B (en) * | 2022-05-16 | 2023-08-25 | 山东巍然智能科技有限公司 | Method and equipment for constructing lightweight neural network for unmanned aerial vehicle ocean image detection |
CN115309301A (en) * | 2022-05-17 | 2022-11-08 | 西北工业大学 | Android mobile phone end-side AR interaction system based on deep learning |
CN114783069A (en) * | 2022-06-21 | 2022-07-22 | 中山大学深圳研究院 | Method, device, terminal equipment and storage medium for identifying object based on gait |
CN115100495A (en) * | 2022-07-08 | 2022-09-23 | 福州大学 | Lightweight safety helmet detection method based on sub-feature fusion |
CN115457297A (en) * | 2022-08-23 | 2022-12-09 | 中国航空油料集团有限公司 | Method and device for detecting oil leakage of aviation oil depot and aviation oil safety operation and maintenance system |
CN115457297B (en) * | 2022-08-23 | 2023-09-26 | 中国航空油料集团有限公司 | Oil leakage detection method and device for aviation oil depot and aviation oil safety operation and maintenance system |
CN115439684A (en) * | 2022-08-25 | 2022-12-06 | 艾迪恩(山东)科技有限公司 | Household garbage classification method based on lightweight YOLOv5 and APP |
CN115439684B (en) * | 2022-08-25 | 2024-02-02 | 艾迪恩(山东)科技有限公司 | Household garbage classification method and APP based on lightweight YOLOv5 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114332666A (en) | Image target detection method and system based on lightweight neural network model | |
CN112084331B (en) | Text processing and model training method and device, computer equipment and storage medium | |
CN111598190B (en) | Training method of image target recognition model, image recognition method and device | |
CN113011282A (en) | Graph data processing method and device, electronic equipment and computer storage medium | |
US20190228273A1 (en) | Identifying parameter image adjustments using image variation and sequential processing | |
CN112633459A (en) | Method for training neural network, data processing method and related device | |
CN114820871B (en) | Font generation method, model training method, device, equipment and medium | |
CN110796199A (en) | Image processing method and device and electronic medical equipment | |
CN117033609B (en) | Text visual question-answering method, device, computer equipment and storage medium | |
CN111368545A (en) | Named entity identification method and device based on multi-task learning | |
CN115222950A (en) | Lightweight target detection method for embedded platform | |
CN115956247A (en) | Neural network model optimization method and device | |
WO2021169366A1 (en) | Data enhancement method and apparatus | |
CN111126358A (en) | Face detection method, face detection device, storage medium and equipment | |
CN114580510A (en) | Bone marrow cell fine-grained classification method, system, computer device and storage medium | |
CN113870863A (en) | Voiceprint recognition method and device, storage medium and electronic equipment | |
CN115525263A (en) | Training method of code completion model, code completion method and device | |
CN112966815A (en) | Target detection method, system and equipment based on impulse neural network | |
CN117708698A (en) | Class determination method, device, equipment and storage medium | |
US11561326B1 (en) | System and method for generating accurate hyperlocal nowcasts | |
CN112861601A (en) | Method for generating confrontation sample and related equipment | |
US20230244932A1 (en) | Image occlusion method, model training method, device, and storage medium | |
CN115546907A (en) | In-vivo detection method and system for multi-scale feature aggregation | |
CN116362301A (en) | Model quantization method and related equipment | |
CN117034133A (en) | Data processing method, device, equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Country or region after: China Address after: 250353 University Road, Changqing District, Ji'nan, Shandong Province, No. 3501 Applicant after: Qilu University of Technology (Shandong Academy of Sciences) Address before: 250353 University Road, Changqing District, Ji'nan, Shandong Province, No. 3501 Applicant before: Qilu University of Technology Country or region before: China |