CN111553406A - Target detection system, method and terminal based on improved YOLO-V3 - Google Patents

Target detection system, method and terminal based on improved YOLO-V3 Download PDF

Info

Publication number
CN111553406A
CN111553406A CN202010333517.8A CN202010333517A CN111553406A CN 111553406 A CN111553406 A CN 111553406A CN 202010333517 A CN202010333517 A CN 202010333517A CN 111553406 A CN111553406 A CN 111553406A
Authority
CN
China
Prior art keywords
image
module
darknet
target detection
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010333517.8A
Other languages
Chinese (zh)
Other versions
CN111553406B (en
Inventor
田鹏程
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Kaike Intelligent Technology Co ltd
Original Assignee
Shanghai Kaike Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Kaike Intelligent Technology Co ltd filed Critical Shanghai Kaike Intelligent Technology Co ltd
Priority to CN202010333517.8A priority Critical patent/CN111553406B/en
Publication of CN111553406A publication Critical patent/CN111553406A/en
Application granted granted Critical
Publication of CN111553406B publication Critical patent/CN111553406B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target detection system based on improved YOLO-V3, which comprises: the system comprises an image acquisition module, an image preprocessing module, a darknet-39 trunk network module, a multi-scale convolutional layer feature combination module, a weighted feature fusion module and a prediction module, wherein the darknet-39 trunk network module adopts the darknet-39 trunk network module to extract image features to obtain feature maps of 5 convolutional layers with different scales; the multi-scale convolutional layer feature combination module is used for optimally combining the feature maps of 5 convolutional layers with different scales to obtain a combined feature map; the weighted feature fusion module is used for carrying out weighted feature fusion on the combined feature map; and the prediction module is used for performing regression prediction on the fused feature map by adopting a YOLO-V3 algorithm to obtain a target detection result. The system has a small network model, accelerates the target detection speed, enhances the network characteristic fusion effect and realizes a better detection result.

Description

Target detection system, method and terminal based on improved YOLO-V3
Technical Field
The invention relates to the technical field of computer vision, in particular to a system, a method and a terminal for detecting a target based on YOLO-V3.
Background
YOLO (You Only Look one) -V3 is a popular object detection algorithm at present, and is fast and stable, but a backbone network of YOLO-V3 adopts a Darknet-53 network structure, the number of parameters is 65.86BFLOPs (Billon Float Point operations), and the model parameters are large, so that the speed of the algorithm is greatly reduced when the embedded device runs, and the real-time detection effect cannot be achieved. When the input size is 416 × 416, the minimum feature map size of the YOLO-V3 for extracting features is 13 × 13, which is still too large, so that the YOLO-V3 operator has a poor detection effect on objects with medium or large sizes. The YOLOv3 predicts the targets with different sizes by using the multi-scale feature maps from different layers, fuses the feature information of high and low layers, improves the detection precision to a certain extent, but ignores the characteristic that the contribution degrees of the feature maps of different layers are different, and causes the feature fusion effect to be poor.
Disclosure of Invention
Aiming at the defects in the prior art, the target detection system, the target detection method, the terminal and the medium based on the YOLO-V3 provided by the embodiment of the invention have the advantages that the target detection speed is high, the detection effect on objects with medium or large sizes is improved, the fusion effect of fusing different layers of feature maps by the YOLO-V3 is improved, and the mAP index of the detected objects is improved.
In a first aspect, an embodiment of the present invention provides a target detection system based on YOLO-V3, including: an image acquisition module, an image preprocessing module, a darknet-39 backbone network module, a multi-scale convolutional layer feature combination module, a weighted feature fusion module and a prediction module,
the image acquisition module is used for acquiring an image to be identified;
the image preprocessing module is used for preprocessing an image to be identified to obtain a preprocessed image;
the darknet-39 trunk network module is obtained by improving a darknet-53 trunk network to obtain a darknet-39 trunk network model, and the darknet-39 trunk network model is adopted to extract image characteristics to obtain characteristic diagrams of convolution layers with 5 different scales;
the multi-scale convolutional layer feature combination module is used for optimally combining feature maps of 5 convolutional layers with different scales to obtain a combined feature map;
the weighted feature fusion module is used for carrying out weighted feature fusion on the combined feature map;
and the prediction module is used for performing regression prediction on the fused feature map by adopting a YOLO-V3 algorithm to obtain a target detection result.
In a second aspect, an embodiment of the present invention provides a target detection method based on improved YOLO-V3, including:
acquiring an image to be identified;
preprocessing an image to be recognized to obtain a preprocessed image;
extracting image characteristics by adopting a trained darknet-39 trunk network model to obtain characteristic graphs of convolution layers with 5 different scales;
optimally combining the feature maps of the convolutional layers with different scales to obtain a combined feature map;
performing weighted feature fusion on the combined feature map;
and performing regression prediction on the fused feature map by using a YOLO-V3 algorithm to obtain a target detection result.
In a third aspect, an intelligent terminal provided in an embodiment of the present invention includes a processor, an input device, an output device, and a memory, where the processor, the input device, the output device, and the memory are connected to each other, the memory is used to store a computer program, the computer program includes program instructions, and the processor is configured to call the program instructions to execute the method steps described in the foregoing embodiment.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, in which a computer program is stored, the computer program comprising program instructions, which, when executed by a processor, cause the processor to perform the method steps described in the above embodiments.
The invention has the beneficial effects that:
the embodiment of the invention provides a target detection system, a method, a terminal and a medium based on improved YOLO-V3, which adopt a darknet-39 backbone network to extract features, reduce the size of a model and accelerate the target detection speed, adopt 5 convolutional layers with different scales to extract feature maps, fully fuse shallow layer features and deep layer feature information, improve the detection effect of objects with medium or large sizes, carry out combined weighted feature fusion on the feature maps of different convolutional layers according to different contribution degrees of the feature maps of the different convolutional layers, enhance the network feature fusion effect and realize better detection results.
Drawings
In order to more clearly illustrate the detailed description of the invention or the technical solutions in the prior art, the drawings that are needed in the detailed description of the invention or the prior art will be briefly described below. Throughout the drawings, like elements or portions are generally identified by like reference numerals. In the drawings, elements or portions are not necessarily drawn to scale.
FIG. 1 is a block diagram of a target detection system based on improved YOLO-V3 according to a first embodiment of the present invention;
FIG. 2 is a flow chart of a target detection method based on improved YOLO-V3 according to a second embodiment of the present invention;
fig. 3 shows a block diagram of an intelligent terminal according to a third embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
It is to be noted that, unless otherwise specified, technical or scientific terms used herein shall have the ordinary meaning as understood by those skilled in the art to which the invention pertains.
As shown in fig. 1, there is shown a block diagram of a target detection system based on improved YOLO-V3 according to a first embodiment of the present invention, where the system includes: the system comprises an image acquisition module 101, an image preprocessing module 102, a dark-39 backbone network module 103, a multi-scale convolutional layer feature combination module 104, a weighted feature fusion module 105 and a prediction module 106, wherein the image acquisition module 101 is used for acquiring an image to be identified; the image preprocessing module 102 is configured to preprocess an image to be recognized to obtain a preprocessed image; the darknet-39 trunk network module 103 is used for obtaining a darknet-39 trunk network model through improving a darknet-53 trunk network, and extracting image characteristics by adopting the darknet-39 trunk network model to obtain characteristic diagrams of convolution layers with 5 different scales; the multi-scale convolutional layer feature combination module 104 is used for optimally combining feature maps of 5 convolutional layers with different scales to obtain a combined feature map, wherein the optimal combination is different combinations according to different layers, the front layers and the rear layers are combined in pairs, and the middle layer is three and combined; the weighted feature fusion module 105 is configured to perform weighted feature fusion on the combined feature map; the prediction module 106 is configured to perform regression prediction on the fused feature map by using a YOLO-V3 algorithm to obtain a target detection result.
The image preprocessing module 102 comprises an image rotation unit and a scaling unit, wherein the image rotation unit is used for randomly turning, rotating and cutting an image to be identified; the scaling unit is used for carrying out scale transformation on the image to be identified.
The darknet-39 trunk network module 103 cuts channels of the darknet-53 network, reduces the number of model parameters, fully extracts picture features and improves the operation efficiency, so that compared with the improved YOLO-V3 algorithm, the calculation amount is reduced by 80%, and the speed is improved by 4 times. The structure of the darknet-39 backbone network in the darknet-39 backbone network module is shown in table 1.
Figure BDA0002465797330000051
The darknet-39 trunk network module comprises a darknet-39 trunk network training unit, wherein 2 convolutional layers are added in a trunk network of a traditional YOLO-V3 algorithm by the aid of the darknet-39 trunk network training unit, and 5 convolutional layer characteristic graphs with different scales are adopted for target detection; and acquiring a data set, dividing the data set into a training set, a test set and a verification set, re-clustering the coordinates of the bounding box on the training set by adopting a k-means clustering algorithm, and calculating the coordinates of 15 bounding boxes of the characteristic diagrams of the convolutional layers with 5 different scales.
The method comprises the following steps that a darknet-39 main network module reasonably prunes a darknet-53 main network, optimizes a network structure, removes some redundant convolution operations and obtains the darknet-39 main network, and specifically comprises the steps of halving the number of channels of a Level 5 layer, taking the Level 5 layer as a characteristic output layer, and enabling stride (step length) to be 4, so that the small target object detection rate is favorably improved. The Level 4, Level 3 and Level 2 layers reduce the number of channels by half, and the number of simultaneous operations is also reduced by half, and stride at this time is respectively 8, 16 and 32. Finally, a 3 × 3 convolutional layer is added, and the feature extraction effect is enhanced while the number of parameters is hardly increased, wherein stride is 64. The darknet-39 network at this time cannot directly load the weight parameters of the original darknet-53, and needs to be retrained. This example performs classification training on the ImageNet LSVRC 2012 data set, trains 90 epochs, has an initial learning rate of 1e-03, decreases the learning rate ten times when step is 170000 and 350000, has a batch _ size of 128, and has a weight attenuation coefficient of 5 e-04.
Taking the coco dataset as an example, the coco 2017 test dataset has 118287 training sets, 5000 verification sets and 40670 test sets, and 80 categories. In addition, the normalization process is performed on the process considering the picture sizes on the training set are different. In the field of target detection, the similarity between two bounding boxes is measured by taking the size of IOU (area interaction ratio) as a standard, detectionResult represents the area of a predicted rectangular box, GroundTruth represents the area of a real rectangular box,
Figure BDA0002465797330000061
then for target detection, the distance metric formula can be calculated as follows:
d(box,centroid)=1-IOU(box,centroid)
centroids refer to the centers of the bounding boxes, and if the IOU value between two bounding boxes is larger, the distance between them is smaller. Before the image to be recognized is input into the darknet-39 backbone network module, the image preprocessing module preprocesses the image to be recognized to transform the picture size into a fixed size, and in the embodiment, a multi-scale training method is adopted, and one size is randomly selected from the set {256,320,384,448,512,576,640,704,768} to serve as the picture input size at this time. Taking the size of the input image to be recognized as 448 × 448 as an example, the coordinates of 15 bounding boxes of the feature map of the 5 scale convolutional layers are calculated as follows:
(4,6),(7,16),(14,9),(22,17),(13,30),(28,37),(46,23),(25,70),(49,58),(86,39),(56,124),(99,83),(114,205),(199,124),(294,275)。
inputting the size of an image to be recognized to 448 × 448, establishing an image pyramid of the image to be recognized and different levels of image golden pointsThe pyramid network feature layer feature size is 7 ×,14 ×,28 ×,56, 112 ×, the feature map size is 1, 2, 3, 4, 5 from small to large, the feature pyramid performs up-sampling operation on the feature pyramid by 2 times of step length, and is fused with the next layer depth residual network to form a rapid detection model for depth fusion, the expression capability of the feature pyramid is enhanced, compared with the conventional YOLOv3 network, the method has a wider range, so that the detection effect of small objects and large objects can be remarkably improved, the detection effect of small objects and large objects can not be increased, the feature maps of different depths are respectively subjected to target detection, the feature maps of the future layers are subjected to up-sampling by the feature map of the current feature source map, the feature maps of the future layers are utilized, the feature maps of the future layers are organically fused with the semantic information of high order of improving the detection accuracy, the feature pyramid network feature map is 7 ×,14, the feature map can be subjected to depth fusion by the method of greatly reducing the convolution of the feature pyramid detection of the same pyramid detection method of the same as a convolution, the method of calculating a characteristic map of the method of greatly reducing a convolution, the method of calculating a depth fusion, the method of calculating a method of the same as a method of the method of reducing1And L2For better feature fusion, the embodiment adopts a weighted feature fusion mode, and the feature after fusion is F1,L1Corresponding weighting coefficient w1,L2Corresponding weighting coefficient is w2And then:
Figure BDA0002465797330000071
the prediction module performs regression prediction on the weighted and fused feature map by using YOLO-V3, the YOLO-V3 divides the feature map into N × N grids (feature maps with different scales and N with different sizes, in this embodiment, there are 5 scales, N is 7,14, 28,56, and 112, each grid predicts 3 different bounding boxes, the target detection result can be represented as N × [3 × (C + Con + B) ], C represents the number of categories, Con represents the confidence, and B represents the coordinates of the bounding boxes.
In order to enable the detection network to be converged quickly, the tailored dark net-39 network structure is pre-trained on the ImageNet data set, and the obtained weight file is directly loaded into the detection network as an initialization weight. The hyper-parameters during pre-training the dark net-39 network are set as follows, the training epoch is 120, the initial learning rate is 1e-04, the learning rate adopts a cosine _ decay descending mode, the final learning rate is 1e-6, the momentum is 0.9, the batch _ size is set as 32, an l2 regularization mode is adopted, and the weight attenuation coefficient is 5 e-04.
According to the target detection system based on the improved YOLO-V3, the method and the device adopt the darknet-39 trunk network for feature extraction, reduce the size of a model, accelerate the target detection speed, adopt 5 different-scale convolutional layers for feature map extraction, fully fuse shallow feature information and deep feature information, improve the detection effect on objects with medium or large sizes, perform combined weighted feature fusion on different convolutional layer feature maps according to different contribution degrees of the feature maps of the different convolutional layers, enhance the network feature fusion effect, and realize better detection results.
In the first embodiment, a target detection system based on improved YOLO-V3 is provided, and correspondingly, the application also provides a target detection method based on improved YOLO-V3. Please refer to fig. 2, which is a flowchart illustrating a target detection method based on improved YOLO-V3 according to a second embodiment of the present invention. Since the method embodiment is basically similar to the device embodiment, the description is simple, and the relevant points can be referred to the partial description of the device embodiment. The method embodiments described below are merely illustrative.
As shown in fig. 2, a flowchart of a target detection method based on improved YOLO-V3 according to a second embodiment of the present invention is shown, and the method includes:
s201, acquiring an image to be identified.
In the present embodiment, the input image to be recognized has a size of 448 × 448.
S202, preprocessing the image to be recognized to obtain a preprocessed image.
Specifically, the specific method for preprocessing the image to be recognized comprises the following steps:
randomly turning and cutting the image to be recognized horizontally/vertically;
and carrying out scale transformation on the image to be identified.
S203, extracting image characteristics by adopting the trained darknet-39 trunk network model to obtain characteristic diagrams of convolution layers with 5 different scales.
Specifically, the step of training the darknet-39 backbone network model includes the following steps:
2 convolutional layers are added in a backbone network of a traditional YOLO-V3 algorithm, and 5 convolutional layer characteristic graphs with different scales are adopted for target detection.
Specifically, the method comprises the steps of reasonably pruning the darknet-53 network, optimizing the network structure, removing some redundant convolution operations, and obtaining the darknet-39 network, wherein the steps are specifically that the number of channels of a Level 5 layer is halved, the Level 5 layer is also used as a characteristic output layer, and stride is 4 at the moment, so that the detection rate of small target objects is favorably improved. The Level 4, Level 3 and Level 2 layers reduce the number of channels by half, and the number of simultaneous operations is also reduced by half, and stride at this time is respectively 8, 16 and 32. Finally, a 3 × 3 convolutional layer is added, and the feature extraction effect is enhanced while the number of parameters is hardly increased, wherein stride is 64. The darknet-39 network at this time cannot directly load the weight parameters of the original darknet-53, and needs to be retrained. This embodiment performs classification training on the ImageNet LSVRC 2012 data set, trains 120 epochs, has an initial learning rate of 1e-03, decreases the learning rate ten times when step is 170000 and 350000, has a batch _ size of 128, and has a weight attenuation coefficient of 5 e-04.
Acquiring a data set, and dividing the data set into a training set, a test set and a verification set;
re-clustering the coordinates of the bounding box on the training set by adopting a k-means clustering algorithm, and calculating the coordinates of 15 bounding boxes of the convolution layer characteristic diagrams with 5 different scales.
And S204, optimally combining the feature maps of the convolutional layers with different scales to obtain a combined feature map.
And S205, performing weighted feature fusion on the combined feature map.
And S206, performing regression prediction on the fused feature map by adopting a YOLO-V3 algorithm to obtain a target detection result.
According to the target detection method based on the improved YOLO-V3, provided by the embodiment of the invention, the characteristics are extracted by adopting a darknet-39 backbone network, the size of a model is reduced, the target detection speed is accelerated, the characteristic diagram extraction is carried out by adopting 5 convolutional layers with different scales, the shallow characteristic information and the deep characteristic information are fully fused, the detection effect on objects with medium or large sizes is improved, the combined weighted characteristic fusion is carried out on the characteristic diagrams of different convolutional layers according to different contribution degrees of the characteristic diagrams of different convolutional layers, the network characteristic fusion effect is enhanced, and a better detection result is realized.
As shown in fig. 3, a schematic structural diagram of an intelligent terminal according to a third embodiment of the present invention is shown, where the intelligent terminal includes a processor 301, an input device 302, an output device 303, and a memory 304, where the processor 301, the input device 302, the output device 303, and the memory 304 are connected to each other, the memory 304 is used for storing a computer program, the computer program includes program instructions, and the processor 301 is configured to call the program instructions to execute the method described in the second embodiment.
It should be understood that, in the embodiment of the present invention, the Processor 301 may be a Central Processing Unit (CPU), and the Processor may also be other general processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The input device 302 may include a touch pad, a fingerprint sensor (for collecting fingerprint information of a user and direction information of the fingerprint), a microphone, etc., and the output device 303 may include a display (LCD, etc.), a speaker, etc.
The memory 304 may include a read-only memory and a random access memory, and provides instructions and data to the processor 301. A portion of the memory 304 may also include non-volatile random access memory. For example, the memory 304 may also store device type information.
In a specific implementation, the processor 301, the input device 302, and the output device 303 described in this embodiment of the present invention may execute the implementation described in the method embodiment provided in this embodiment of the present invention, and may also execute the implementation described in the system embodiment described in this embodiment of the present invention, which is not described herein again.
The invention also provides an embodiment of a computer-readable storage medium, in which a computer program is stored, which computer program comprises program instructions that, when executed by a processor, cause the processor to carry out the method described in the above embodiment.
The computer readable storage medium may be an internal storage unit of the terminal described in the foregoing embodiment, for example, a hard disk or a memory of the terminal. The computer readable storage medium may also be an external storage device of the terminal, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the terminal. Further, the computer-readable storage medium may also include both an internal storage unit and an external storage device of the terminal. The computer-readable storage medium is used for storing the computer program and other programs and data required by the terminal. The computer readable storage medium may also be used to temporarily store data that has been output or is to be output.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the terminal and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed terminal and method can be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electric, mechanical or other form of connection.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.

Claims (8)

1. An improved YOLO-V3-based target detection system, comprising: an image acquisition module, an image preprocessing module, a darknet-39 backbone network module, a multi-scale convolutional layer feature combination module, a weighted feature fusion module and a prediction module,
the image acquisition module is used for acquiring an image to be identified;
the image preprocessing module is used for preprocessing an image to be identified to obtain a preprocessed image;
the darknet-39 trunk network module is obtained by improving a darknet-53 trunk network to obtain a darknet-39 trunk network model, and the darknet-39 trunk network model is adopted to extract image characteristics to obtain characteristic diagrams of convolution layers with 5 different scales;
the multi-scale convolutional layer feature combination module is used for optimally combining feature maps of 5 convolutional layers with different scales to obtain a combined feature map;
the weighted feature fusion module is used for carrying out weighted feature fusion on the combined feature map;
and the prediction module is used for performing regression prediction on the fused feature map by adopting a YOLO-V3 algorithm to obtain a target detection result.
2. The improved YOLO-V3-based target detection system according to claim 1, wherein the darknet-39 backbone network module comprises a darknet-39 backbone network training unit, the darknet-39 backbone network training unit adds 2 convolutional layers in the backbone network of the conventional YOLO-V3 algorithm, and performs target detection using 5 convolutional layer feature maps with different scales;
acquiring a data set, dividing the data set into a training set, a testing set and a verification set,
re-clustering the coordinates of the bounding box on the training set by adopting a k-means clustering algorithm, and calculating the coordinates of 15 bounding boxes of the convolution layer characteristic diagrams with 5 different scales.
3. The improved YOLO-V3-based object detection system according to claim 1, wherein the image pre-processing module comprises an image rotation unit and a scaling unit, the image rotation unit is used for randomly flipping horizontally/vertically, cropping, and recognizing the image; the scaling unit is used for carrying out scale transformation on the image to be identified.
4. A target detection method based on improved YOLO-V3 is characterized by comprising the following steps:
acquiring an image to be identified;
preprocessing an image to be recognized to obtain a preprocessed image;
extracting image characteristics by adopting a trained darknet-39 trunk network model to obtain characteristic graphs of convolution layers with 5 different scales;
optimally combining the feature maps of the convolutional layers with different scales to obtain a combined feature map;
performing weighted feature fusion on the combined feature map;
and performing regression prediction on the fused feature map by using a YOLO-V3 algorithm to obtain a target detection result.
5. The improved YOLO-V3-based target detection method according to claim 4, further comprising a step of training a darknet-39 backbone network model, wherein the method for training the darknet-39 backbone network model comprises:
2 convolutional layers are added in a backbone network of a traditional YOLO-V3 algorithm, and 5 convolutional layer characteristic graphs with different scales are adopted for target detection;
acquiring a data set, dividing the data set into a training set, a testing set and a verification set,
re-clustering the coordinates of the bounding box on the training set by adopting a k-means clustering algorithm, and calculating the coordinates of 15 bounding boxes of the convolution layer characteristic diagrams with 5 different scales.
6. The improved YOLO-V3-based target detection method according to claim 4, wherein the specific method for preprocessing the image to be recognized comprises the following steps:
randomly turning and cutting the image to be recognized horizontally/vertically;
and carrying out scale transformation on the image to be identified.
7. An intelligent terminal comprising a processor, an input device, an output device and a memory, the processor, the input device, the output device and the memory being interconnected, the memory being adapted to store a computer program, the computer program comprising program instructions, characterized in that the processor is configured to invoke the program instructions to perform the method according to any of claims 4-6.
8. A computer-readable storage medium, characterized in that the computer storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to carry out the method according to any one of claims 4-6.
CN202010333517.8A 2020-04-24 2020-04-24 Target detection system, method and terminal based on improved YOLO-V3 Active CN111553406B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010333517.8A CN111553406B (en) 2020-04-24 2020-04-24 Target detection system, method and terminal based on improved YOLO-V3

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010333517.8A CN111553406B (en) 2020-04-24 2020-04-24 Target detection system, method and terminal based on improved YOLO-V3

Publications (2)

Publication Number Publication Date
CN111553406A true CN111553406A (en) 2020-08-18
CN111553406B CN111553406B (en) 2023-04-28

Family

ID=72007656

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010333517.8A Active CN111553406B (en) 2020-04-24 2020-04-24 Target detection system, method and terminal based on improved YOLO-V3

Country Status (1)

Country Link
CN (1) CN111553406B (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112132032A (en) * 2020-09-23 2020-12-25 平安国际智慧城市科技股份有限公司 Traffic sign detection method and device, electronic equipment and storage medium
CN112183255A (en) * 2020-09-15 2021-01-05 西北工业大学 Underwater target visual identification and attitude estimation method based on deep learning
CN112200201A (en) * 2020-10-13 2021-01-08 上海商汤智能科技有限公司 Target detection method and device, electronic equipment and storage medium
CN112232258A (en) * 2020-10-27 2021-01-15 腾讯科技(深圳)有限公司 Information processing method and device and computer readable storage medium
CN112307976A (en) * 2020-10-30 2021-02-02 北京百度网讯科技有限公司 Target detection method, target detection device, electronic equipment and storage medium
CN112507896A (en) * 2020-12-14 2021-03-16 大连大学 Method for detecting cherry fruits by adopting improved YOLO-V4 model
CN112633066A (en) * 2020-11-20 2021-04-09 苏州浪潮智能科技有限公司 Aerial small target detection method, device, equipment and storage medium
CN112668560A (en) * 2021-03-16 2021-04-16 中国矿业大学(北京) Pedestrian detection method and system for pedestrian flow dense area
CN112801169A (en) * 2021-01-25 2021-05-14 中国人民解放军陆军工程大学 Camouflage target detection method based on improved YOLO algorithm
CN112949692A (en) * 2021-02-03 2021-06-11 歌尔股份有限公司 Target detection method and device
CN112966565A (en) * 2021-02-05 2021-06-15 深圳市优必选科技股份有限公司 Object detection method and device, terminal equipment and storage medium
CN113435367A (en) * 2021-06-30 2021-09-24 北大方正集团有限公司 Social distance evaluation method and device and storage medium
CN113838021A (en) * 2021-09-18 2021-12-24 长春理工大学 Pulmonary nodule detection system based on improved YOLOv5 network
CN114170421A (en) * 2022-02-10 2022-03-11 青岛海尔工业智能研究院有限公司 Image detection method, device, equipment and storage medium
WO2022083784A1 (en) * 2020-10-23 2022-04-28 西安科锐盛创新科技有限公司 Road detection method based on internet of vehicles
CN117960839A (en) * 2024-03-29 2024-05-03 山西建投临汾建筑产业有限公司 Steel structural member welding deformation correcting device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170147905A1 (en) * 2015-11-25 2017-05-25 Baidu Usa Llc Systems and methods for end-to-end object detection
CN109614985A (en) * 2018-11-06 2019-04-12 华南理工大学 A kind of object detection method based on intensive connection features pyramid network
CN109685152A (en) * 2018-12-29 2019-04-26 北京化工大学 A kind of image object detection method based on DC-SPP-YOLO
CN110443208A (en) * 2019-08-08 2019-11-12 南京工业大学 YOLOv 2-based vehicle target detection method, system and equipment
WO2019232830A1 (en) * 2018-06-06 2019-12-12 平安科技(深圳)有限公司 Method and device for detecting foreign object debris at airport, computer apparatus, and storage medium
CN110766098A (en) * 2019-11-07 2020-02-07 中国石油大学(华东) Traffic scene small target detection method based on improved YOLOv3
CN110991311A (en) * 2019-11-28 2020-04-10 江南大学 Target detection method based on dense connection deep network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170147905A1 (en) * 2015-11-25 2017-05-25 Baidu Usa Llc Systems and methods for end-to-end object detection
WO2019232830A1 (en) * 2018-06-06 2019-12-12 平安科技(深圳)有限公司 Method and device for detecting foreign object debris at airport, computer apparatus, and storage medium
CN109614985A (en) * 2018-11-06 2019-04-12 华南理工大学 A kind of object detection method based on intensive connection features pyramid network
CN109685152A (en) * 2018-12-29 2019-04-26 北京化工大学 A kind of image object detection method based on DC-SPP-YOLO
CN110443208A (en) * 2019-08-08 2019-11-12 南京工业大学 YOLOv 2-based vehicle target detection method, system and equipment
CN110766098A (en) * 2019-11-07 2020-02-07 中国石油大学(华东) Traffic scene small target detection method based on improved YOLOv3
CN110991311A (en) * 2019-11-28 2020-04-10 江南大学 Target detection method based on dense connection deep network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
戴伟聪;金龙旭;李国宁;郑志强;: "遥感图像中飞机的改进YOLOv3实时检测算法" *
朱鹏;陈虎;李科;程宾洋;: "一种轻量级的多尺度特征人脸检测方法" *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183255A (en) * 2020-09-15 2021-01-05 西北工业大学 Underwater target visual identification and attitude estimation method based on deep learning
CN112132032A (en) * 2020-09-23 2020-12-25 平安国际智慧城市科技股份有限公司 Traffic sign detection method and device, electronic equipment and storage medium
CN112200201A (en) * 2020-10-13 2021-01-08 上海商汤智能科技有限公司 Target detection method and device, electronic equipment and storage medium
WO2022083784A1 (en) * 2020-10-23 2022-04-28 西安科锐盛创新科技有限公司 Road detection method based on internet of vehicles
US20230154202A1 (en) * 2020-10-23 2023-05-18 Xi'an Creation Keji Co., Ltd. Method of road detection based on internet of vehicles
CN112232258A (en) * 2020-10-27 2021-01-15 腾讯科技(深圳)有限公司 Information processing method and device and computer readable storage medium
CN112307976A (en) * 2020-10-30 2021-02-02 北京百度网讯科技有限公司 Target detection method, target detection device, electronic equipment and storage medium
CN112307976B (en) * 2020-10-30 2024-05-10 北京百度网讯科技有限公司 Target detection method, target detection device, electronic equipment and storage medium
CN112633066A (en) * 2020-11-20 2021-04-09 苏州浪潮智能科技有限公司 Aerial small target detection method, device, equipment and storage medium
CN112507896A (en) * 2020-12-14 2021-03-16 大连大学 Method for detecting cherry fruits by adopting improved YOLO-V4 model
CN112507896B (en) * 2020-12-14 2023-11-07 大连大学 Method for detecting cherry fruits by adopting improved YOLO-V4 model
CN112801169A (en) * 2021-01-25 2021-05-14 中国人民解放军陆军工程大学 Camouflage target detection method based on improved YOLO algorithm
CN112801169B (en) * 2021-01-25 2024-02-06 中国人民解放军陆军工程大学 Camouflage target detection method, system, device and storage medium based on improved YOLO algorithm
CN112949692A (en) * 2021-02-03 2021-06-11 歌尔股份有限公司 Target detection method and device
CN112966565A (en) * 2021-02-05 2021-06-15 深圳市优必选科技股份有限公司 Object detection method and device, terminal equipment and storage medium
CN112668560A (en) * 2021-03-16 2021-04-16 中国矿业大学(北京) Pedestrian detection method and system for pedestrian flow dense area
CN113435367A (en) * 2021-06-30 2021-09-24 北大方正集团有限公司 Social distance evaluation method and device and storage medium
CN113838021A (en) * 2021-09-18 2021-12-24 长春理工大学 Pulmonary nodule detection system based on improved YOLOv5 network
CN114170421A (en) * 2022-02-10 2022-03-11 青岛海尔工业智能研究院有限公司 Image detection method, device, equipment and storage medium
CN117960839A (en) * 2024-03-29 2024-05-03 山西建投临汾建筑产业有限公司 Steel structural member welding deformation correcting device
CN117960839B (en) * 2024-03-29 2024-06-04 山西建投临汾建筑产业有限公司 Steel structural member welding deformation correcting device

Also Published As

Publication number Publication date
CN111553406B (en) 2023-04-28

Similar Documents

Publication Publication Date Title
CN111553406A (en) Target detection system, method and terminal based on improved YOLO-V3
CN110647817B (en) Real-time face detection method based on MobileNet V3
WO2020221298A1 (en) Text detection model training method and apparatus, text region determination method and apparatus, and text content determination method and apparatus
CN112733749B (en) Real-time pedestrian detection method integrating attention mechanism
CN111814794B (en) Text detection method and device, electronic equipment and storage medium
CN109829448B (en) Face recognition method, face recognition device and storage medium
CN110020592A (en) Object detection model training method, device, computer equipment and storage medium
CN113822209B (en) Hyperspectral image recognition method and device, electronic equipment and readable storage medium
CN111652217A (en) Text detection method and device, electronic equipment and computer storage medium
CN106682233A (en) Method for Hash image retrieval based on deep learning and local feature fusion
CN111723786A (en) Method and device for detecting wearing of safety helmet based on single model prediction
EP4047509A1 (en) Facial parsing method and related devices
CN105144239A (en) Image processing device, program, and image processing method
CN107784288A (en) A kind of iteration positioning formula method for detecting human face based on deep neural network
CN111353491B (en) Text direction determining method, device, equipment and storage medium
CN111325237B (en) Image recognition method based on attention interaction mechanism
CN111339935A (en) Optical remote sensing picture classification method based on interpretable CNN image classification model
CN113487610B (en) Herpes image recognition method and device, computer equipment and storage medium
CN115861462B (en) Training method and device for image generation model, electronic equipment and storage medium
CN113011253B (en) Facial expression recognition method, device, equipment and storage medium based on ResNeXt network
CN110852327A (en) Image processing method, image processing device, electronic equipment and storage medium
CN114972947B (en) Depth scene text detection method and device based on fuzzy semantic modeling
CN111373393B (en) Image retrieval method and device and image library generation method and device
CN113408651B (en) Unsupervised three-dimensional object classification method based on local discriminant enhancement
Xu et al. Multi‐pyramid image spatial structure based on coarse‐to‐fine pyramid and scale space

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant