CN112036214A - Method for identifying small target in low-image-quality video in real time - Google Patents

Method for identifying small target in low-image-quality video in real time Download PDF

Info

Publication number
CN112036214A
CN112036214A CN201910479019.1A CN201910479019A CN112036214A CN 112036214 A CN112036214 A CN 112036214A CN 201910479019 A CN201910479019 A CN 201910479019A CN 112036214 A CN112036214 A CN 112036214A
Authority
CN
China
Prior art keywords
target
standard state
small
algorithm
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910479019.1A
Other languages
Chinese (zh)
Inventor
张昭智
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Paidao Intelligent Technology Co ltd
Original Assignee
Shanghai Paidao Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Paidao Intelligent Technology Co ltd filed Critical Shanghai Paidao Intelligent Technology Co ltd
Priority to CN201910479019.1A priority Critical patent/CN112036214A/en
Publication of CN112036214A publication Critical patent/CN112036214A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/245Aligning, centring, orientation detection or correction of the image by locating a pattern; Special marks for positioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/255Detecting or recognising potential candidate objects based on visual cues, e.g. shapes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a method for identifying small targets in a low-image-quality video in real time, wherein the small targets and the large targets have necessary relative position relation, and the small targets have a standard state and a non-standard state; extracting a certain number of pictures in a video as a data set for labeling; the data set is marked as two categories of a small target standard state and a small target non-standard state, each category corresponds to two rectangular frames, the first frame is a large target in the standard state, and the second frame is the standard state of the small target; the first frame is a large target in the non-standard state, and the second frame is a small target in the non-standard state; training reference by using the data set as a target detection algorithm; and carrying out target state identification on the video by using a target detection algorithm. The invention has higher detection accuracy.

Description

Method for identifying small target in low-image-quality video in real time
Technical Field
The invention relates to the field of computer vision, in particular to a method for identifying small targets in a low-quality video in real time.
Background
The task of detecting a target by using computer vision is to analyze information which can be understood by a computer from an image, and in the actual detection process, besides the category information of the target in the picture, the position information of the target also needs to be obtained. At present, target detection algorithms based on deep learning are mainly divided into two major categories, namely target detection algorithms based on classification and target detection algorithms based on regression.
The classification-based target detection algorithm mainly divides the target detection process into two stages. The first stage is mainly to select a candidate region, the second stage is to classify the candidate region and adjust the position, and the target detection result is obtained after the two stages. The typical model of this scheme is a Faster region-based convolutional neural network algorithm (fast R-CNN) proposed by Ren S et al in 2015, and the target detection system is divided into two modules by a candidate region generation network (RPN), the first module is a deep full convolution network for extracting candidate regions, and the second module uses a fast R-CNN detector for detection based on region extraction. The whole system is a single and unified target detection network. The Faster R-CNN algorithm framework is shown. Firstly, taking the whole picture as input, obtaining a feature layer through convolution calculation, and then inputting the convolution feature into an RPN network to obtain feature information of a candidate frame; then, judging whether the features extracted from the candidate frame belong to a specific class by using a classifier; and finally, further adjusting the position of the candidate frame belonging to a certain characteristic by using a regressor, wherein the whole network process shares the characteristic information extracted by the convolutional neural network.
In a convolution feature map with a certain size, the RPN network can generate candidate frames with a plurality of sizes, and the problems of variable target size and inconsistent fixed receptive field are caused. If the number of the candidate boxes is increased, the detection speed of the algorithm is reduced, and the requirement of the actual production environment on the real-time performance is difficult to meet.
The regression-based target detection algorithm simplifies the target detection process into a uniform end-to-end regression problem, so that the position and category information of the detected target can be obtained simultaneously only by processing the picture once (comparing with multiple candidate region selection classification). Unlike two-stage models based on region extraction, the single-stage approach can achieve feature sharing through a complete single training. Typical representatives of such algorithms are you just need to look Once (YOLO) SSD, etc. The following description focuses on the SSD as an example.
In 2016, LiuW et al propose an SSD algorithm to apply a single deep neural network to image target detection. The SSD algorithmic framework is shown with its localization bounding boxes defined as a set of spatially discrete default boxes and corresponding to different aspect ratios and mapping locations. During prediction, the network generates a corresponding probability score for the target class in each default box, and adjusts the default boxes to achieve a good match with the target shape. In addition, the network also makes complete prediction on the targets with different image qualities by combining a plurality of feature maps of the targets, and realizes the detection task of the multi-size targets.
In the SSD algorithm, when no candidate region exists, the region regression difficulty is high, and the problem of difficult convergence is easy to occur; feature maps of different layers of the SSD are used as independent input of the classification network, so that the same object is detected by frames with different sizes at the same time, and repeated operation is caused; since small targets correspond to small areas in the feature map and cannot be trained sufficiently, the detection effect of the SSD on small targets is still not ideal.
When the existing computer vision technology is used for detecting and identifying small targets in a low-quality video, the detection accuracy rate obtained by using a traditional deep learning method is low due to the fact that the targets are small.
Therefore, the problem of real-time detection and identification of small targets in low-quality videos in the prior art is urgently needed to be solved.
Disclosure of Invention
The invention aims to solve the problem of detecting and identifying small targets in a low-quality video in real time. The existing computer vision technology is improved, and the small target in the low-quality video is detected and identified with high accuracy.
In order to achieve the purpose, the invention provides a method for identifying small targets in a low-quality video in real time, wherein the small targets and the large targets have necessary relative position relation, and the small targets have a standard state and a non-standard state; extracting a certain number of pictures in a video as a data set for labeling; the data set is marked as two categories of a small target standard state and a small target non-standard state, each category corresponds to two rectangular frames, the first frame is a large target in the standard state, and the second frame is the standard state of the small target; the first frame is a large target in the non-standard state, and the second frame is a small target in the non-standard state; training reference by using the data set as a target detection algorithm; and carrying out target state identification on the video by using a target detection algorithm.
Wherein the small target is inside the large target. The small target is a human head, the large target is a human body, and the standard state of the small target is a state in which a safety helmet is worn on the human head.
Further, the target detection algorithm is an SSD algorithm. The localization bounding box of the object detection algorithm is defined as a set of spatially discrete default boxes and corresponds to different aspect ratios and mapping locations. When the target detection algorithm performs prediction, the network generates a corresponding probability score for the target category in each default frame, and adjusts the default frame to achieve good matching with the target shape. The network in the target detection algorithm also makes a complete prediction on targets with different image qualities by combining a plurality of feature mappings of the targets, and realizes a detection task on the targets with multiple sizes.
And forming a new corresponding relation according to the relation and the state classification of the large and small targets in the target detection algorithm to replace the corresponding characteristic layer in the algorithm. Specifically, an input picture for detection is firstly input into the target detection algorithm by image compression, a first loss value is obtained at the same time, then the corresponding image position is extracted by using the output position information, then the corresponding image position is input into the algorithm to obtain a second loss value and a detection result, a total loss value is obtained by using the linear combination of the first loss value and the second loss value, and model training work is carried out by using the process. Further, in the model prediction stage, the detection result can be directly output through an algorithm for obtaining the second loss value, so that the calculation speed of the model is accelerated.
Compared with the prior art, the method has the advantages that the method utilizes the information of the correlation between the objects, so that the target position can be quickly positioned in the object detection process, and then the target area is directly classified, so that a detection scheme with high detection accuracy and high detection speed can be obtained. Meanwhile, the scheme has the advantages of high detection speed and low storage occupancy rate, and has higher detection accuracy rate for the small target detection problem of the low-image-quality video compared with other deep learning network models with the same detection speed and the same storage occupancy rate.
In order to make the aforementioned objects, features and embodiments of the present invention more comprehensible, the following detailed description of the structural design and operational procedures of the present invention is provided in conjunction with the accompanying drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
FIG. 1 is a schematic diagram of an algorithm according to an embodiment of the present invention;
FIG. 2 is a flow chart of one embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
Referring first to fig. 1, fig. 1 is a schematic view illustrating the present invention, and one embodiment of the present invention is a monitoring and identification of whether a worker in a work area wears a crash helmet. Because the safety helmet is a small target, the safety helmet is difficult to clearly identify in the existing low-quality monitoring video, especially the occasion needing automatic identification of a computer. In this embodiment, since the head of the person is attached to the body of the person according to the correlation of the object, a method of detecting the body of the person, then intercepting the detected body from the picture, and then performing a secondary detection on whether the head of the person wears the crash helmet is adopted. The human body is larger than the head relative to the original image, so the detection accuracy is higher.
In one embodiment of the invention, the network structure, the model size and the calculation speed of the SSD network model are optimized, and a target detection and identification algorithm capable of effectively identifying small targets in the low-quality video is established. The description will be given taking an example of detecting whether a worker wears a helmet in an actual plant environment.
Monitoring data collected by actual video monitoring equipment are utilized, and a certain number of pictures are extracted to be used as data sets to be labeled through monitoring videos in different seasons, different weather and different time of a year. And the problem of data imbalance is avoided by artificial data selection. And then carrying out data annotation on the selected data. When in marking, the collected data set is marked into two categories of a wearable safety helmet and a non-wearable safety helmet, and each category corresponds to two rectangular frames as targets. For the type of wearing the safety helmet, the first frame is the whole human body, and the second frame is the safety helmet; for the class of no safety helmet, the first frame is the whole human body, and the second frame is the head.
The algorithm model adopted by the scheme is based on the existing SSD model, and part of connection structures of the algorithm model are shown in Table 1, an average pooling layer and a Softmax layer in the original SSD network are removed, and three new feature layers are added through three groups of single-depth and single-point convolution kernel groups after a Conv2d _13_ pointwise layer. Con2d _11_ pointwise and Con2d _13_ pointwise layers of the original mobile convolutional network MobileNet network and newly added Con2d _14_ pointwise, Con2d _115_ pointwise, Con2d _16_ pointwise and Con2d _17_ pointwise layers are taken as feature extraction layers of the SSD anchor block. The configuration of the anchor block is: the minimum scale factor is set to 0.2, the maximum scale factor is set to 0.9, the size factors of the anchor blocks on the six feature layers are respectively 0.2, 0.34, 0.48, 0.62, 0.86 and 0.9, five aspect ratios and an additional 1 are configured for the anchor block of each layer: 1, so that there are six anchor boxes per anchor position per feature layer.
By analyzing the network structure in the MobileNet, the MobileNet can still keep a higher image classification effect under the condition of greatly reducing network parameters and computation. Meanwhile, the features in the image can be extracted well under the condition of greatly reducing the network operation amount. After the network is modified by the method, the size of the extracted feature map is smaller than that of the SSD, and the number of anchor point frames required by the new network is only one third of that of the SSD network. Meanwhile, experience and experiments show that the adjustment has obvious improvement effect on the detection effect of the algorithm.
Table 1 network structure added by this scheme compared with the original network model
Figure BDA0002082597830000071
The above algorithm process is abbreviated as MSSD, and the data flow diagram of the present invention is shown in FIG. 1. The method comprises the steps of firstly, inputting an input picture for detection into an algorithm MSSD _1 by means of image compression, obtaining a first Loss value Loss _1 at the same time, then, extracting a corresponding image position by means of output position information, then, inputting the image position into an algorithm MSSD _2, obtaining a second Loss value Loss _2, obtaining a total Loss value by means of linear combination of the first Loss value Loss _1 and the second Loss value Loss _2, and conducting model training work by means of the process. In the model prediction stage, the detection result can be directly output through the algorithm MSSD _2, so that the calculation speed of the model is accelerated, and the occupation of the storage space is reduced.
According to the detection algorithm provided by the text, when the trained model is used for detecting the small target in the low-image-quality video, the detection effect better than that of the original SSD algorithm can be obviously achieved.
Referring next to fig. 2, fig. 2 is a flowchart illustrating an embodiment of the present invention. In the embodiment shown in fig. 2, in the first step, a large target and a small target are selected, the small target and the large target have a necessary relative position relationship, and the small target can be in the large target or outside the determined position. The small target has a standard state and a non-standard state. Secondly, extracting a certain number of pictures in the video as a data set for labeling; the data set is marked as two categories of a small target standard state and a small target non-standard state, each category corresponds to two rectangular frames, the first frame is a large target in the standard state, and the second frame is the standard state of the small target; the first frame is a large target in the non-standard state, and the second frame is a small target in the non-standard state; and thirdly, forming a new corresponding relation in the target detection algorithm according to the relation and the state classification of the large target and the small target to replace the corresponding characteristic layer in the algorithm. And fourthly, firstly, inputting the input picture for detection into an algorithm by utilizing image compression, and simultaneously obtaining a first loss value. And fifthly, extracting the corresponding image position by using the output position information, and inputting the image position into an algorithm to obtain a second loss value and a detection result. And sixthly, obtaining a total loss value by utilizing the linear combination of the first loss value and the second loss value, and carrying out model training work by utilizing the process.
In the model prediction stage, the detection result can be directly output through an algorithm for obtaining the second loss value, so that the calculation speed of the model is accelerated.
The localization bounding box of the object detection algorithm is defined as a set of spatially discrete default boxes and corresponds to different aspect ratios and mapping locations. When the target detection algorithm performs prediction, the network generates a corresponding probability score for the target category in each default frame, and adjusts the default frame to achieve good matching with the target shape. The network in the target detection algorithm also makes a complete prediction on targets with different image qualities by combining a plurality of feature mappings of the targets, and realizes a detection task on the targets with multiple sizes.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (10)

1. The method for identifying the small target in the low-image-quality video in real time is characterized in that the small target and the large target have necessary relative position relation, and the small target has a standard state and a non-standard state; extracting a certain number of pictures in a video as a data set for labeling; the data set is marked as two categories of a small target standard state and a small target non-standard state, each category corresponds to two rectangular frames, the first frame is a large target in the standard state, and the second frame is the standard state of the small target; the first frame is a large target in the non-standard state, and the second frame is a small target in the non-standard state; training reference by using the data set as a target detection algorithm; and carrying out target state identification on the video by using a target detection algorithm.
2. The method of claim 1, wherein the small object is inside the large object.
3. The method according to claim 1, wherein the small object is a head, the large object is a human body, and the standard state of the small object is a state in which a helmet is worn on the head.
4. The method of claim 1, wherein the object detection algorithm is a Shot Multi-box Detector (SSD) algorithm.
5. The method as claimed in claim 1, wherein the location bounding box of the object detection algorithm is defined as a set of spatially discrete default boxes corresponding to different aspect ratios and mapping locations.
6. The method of claim 1, wherein the target detection algorithm generates a probability score for each target class in the default frame during the prediction, and adjusts the default frame to achieve a good match with the target shape.
7. The method as claimed in claim 1, wherein the network of the object detection algorithm further performs a complete prediction of the objects with different image quality combined with the feature maps thereof to perform a task of detecting the objects with different sizes.
8. The method as claimed in claim 1, wherein the object detection algorithm is further characterized by forming a new corresponding relation to replace the corresponding feature layer in the algorithm according to the relation and status classification of the large and small objects.
9. The method as claimed in claim 8, wherein the input image for detection is input into the algorithm by image compression to obtain a first loss value, the corresponding image position is extracted by using the output position information, and then the input image is input into the algorithm to obtain a second loss value and a detection result, and the total loss value is obtained by using the linear combination of the first loss value and the second loss value, and the model training is performed by using the process.
10. The method as claimed in claim 9, wherein the detecting result is directly outputted by an algorithm for obtaining the second loss value in the model predicting stage, thereby speeding up the calculation of the model.
CN201910479019.1A 2019-06-03 2019-06-03 Method for identifying small target in low-image-quality video in real time Pending CN112036214A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910479019.1A CN112036214A (en) 2019-06-03 2019-06-03 Method for identifying small target in low-image-quality video in real time

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910479019.1A CN112036214A (en) 2019-06-03 2019-06-03 Method for identifying small target in low-image-quality video in real time

Publications (1)

Publication Number Publication Date
CN112036214A true CN112036214A (en) 2020-12-04

Family

ID=73576508

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910479019.1A Pending CN112036214A (en) 2019-06-03 2019-06-03 Method for identifying small target in low-image-quality video in real time

Country Status (1)

Country Link
CN (1) CN112036214A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115311514A (en) * 2022-07-25 2022-11-08 阿波罗智能技术(北京)有限公司 Sample updating method and device, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115311514A (en) * 2022-07-25 2022-11-08 阿波罗智能技术(北京)有限公司 Sample updating method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111339882B (en) Power transmission line hidden danger detection method based on example segmentation
CN111126325B (en) Intelligent personnel security identification statistical method based on video
CN106875373B (en) Mobile phone screen MURA defect detection method based on convolutional neural network pruning algorithm
CN111784685A (en) Power transmission line defect image identification method based on cloud edge cooperative detection
CN110738127A (en) Helmet identification method based on unsupervised deep learning neural network algorithm
CN108564052A (en) Multi-cam dynamic human face recognition system based on MTCNN and method
CN113160062B (en) Infrared image target detection method, device, equipment and storage medium
CN111754498A (en) Conveyor belt carrier roller detection method based on YOLOv3
CN110309718A (en) A kind of electric network operation personnel safety cap wearing detection method
CN108711148B (en) Tire defect intelligent detection method based on deep learning
CN109935080B (en) Monitoring system and method for real-time calculation of traffic flow on traffic line
CN112215795B (en) Intelligent detection method for server component based on deep learning
KR101183105B1 (en) Method of establishing information of cloud data and establishing system of information of cloud data
CN107688830B (en) Generation method of vision information correlation layer for case serial-parallel
CN109376580B (en) Electric power tower component identification method based on deep learning
WO2021139197A1 (en) Image processing method and apparatus
CN112084838B (en) Workshop safety helmet detection method
CN112258470B (en) Intelligent industrial image critical compression rate analysis system and method based on defect detection
CN111062278A (en) Abnormal behavior identification method based on improved residual error network
CN115690542A (en) Improved yolov 5-based aerial insulator directional identification method
CN110751195A (en) Fine-grained image classification method based on improved YOLOv3
CN115456955A (en) Method for detecting internal burr defect of ball cage dust cover
CN111833347A (en) Transmission line damper defect detection method and related device
CN110046568A (en) A kind of video actions recognition methods based on Time Perception structure
CN115512387A (en) Construction site safety helmet wearing detection method based on improved YOLOV5 model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination