CN109726741B - Method and device for detecting multiple target objects - Google Patents

Method and device for detecting multiple target objects Download PDF

Info

Publication number
CN109726741B
CN109726741B CN201811488003.9A CN201811488003A CN109726741B CN 109726741 B CN109726741 B CN 109726741B CN 201811488003 A CN201811488003 A CN 201811488003A CN 109726741 B CN109726741 B CN 109726741B
Authority
CN
China
Prior art keywords
target
image
detection
camera
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811488003.9A
Other languages
Chinese (zh)
Other versions
CN109726741A (en
Inventor
夏炎
刘镇
吕李娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University of Science and Technology
Original Assignee
Jiangsu University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu University of Science and Technology filed Critical Jiangsu University of Science and Technology
Priority to CN201811488003.9A priority Critical patent/CN109726741B/en
Publication of CN109726741A publication Critical patent/CN109726741A/en
Application granted granted Critical
Publication of CN109726741B publication Critical patent/CN109726741B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a method and a device for detecting a multi-target object, wherein the method comprises the following steps: connecting a target object detection device; creating a pretrained multi-target object detection model by using a convolutional neural network; installing deep learning framework software; sequentially reading each frame of image from the camera; reducing the image read by the camera to 448 x 448 pixels; dividing the reduced image into 7*7 grids with the same size; breaking whether the object is in the grid cell of 7*7 using the coordinate values; sending the grid unit with the object into a pre-training network model to obtain a frame regression value; outputting the frame regression of 90 object categories of each grid; outputting the position value and the confidence coefficient of each frame regression object; setting a threshold to filter out frames with low scores; and carrying out non-maximum value inhibition treatment on the reserved frames, and merging the frames to obtain a final detection result. The invention solves the problems of complicated design, low detection speed and poor multi-target concurrency capability of image feature extraction in the prior art.

Description

Method and device for detecting multiple target objects
Technical Field
The invention belongs to the technical field of computer image processing and machine vision, relates to a method for detecting multiple target objects, and particularly relates to a method and a device for detecting multiple target objects of a two-dimensional video camera.
Background
Conventional target detection generally uses a framework of sliding windows, mainly comprising three steps: (1) Using sliding windows with different sizes to frame a certain part in the graph as a candidate region; (2) extracting visual features related to the candidate region. Such as Harr features commonly used for face detection; HOG features commonly used for pedestrian detection and common target detection, etc.; (3) The recognition is performed using a classifier, such as a conventional SVM model. In the conventional object detection, the multi-scale deformation component model DPM regards an object as a plurality of components (such as a nose, a mouth and the like of a human face), and the object is described by the relation among the components, so that the characteristic is very consistent with the non-rigid body characteristics of many objects in nature. DPM can be regarded as an extension of HOG+SVM, well inherits the advantages of the two, and achieves good effects on tasks such as face detection, pedestrian detection and the like, but DPM is relatively complex, and the detection speed is low, so that a plurality of improved methods are also presented. Among them, the method of target detection based on deep learning is a research hotspot in recent years. After the development of target detection based on deep learning, the actual effect is difficult to break through. For example, mAP on the ILSVRC2013 test set of OverFeat can only reach 24.3%. Many of these innovations have been done in real time by combining some traditional vision domain methods with deep learning, such as Selective Search (Selective Search) and image Pyramid (Pyramid). These are all based on region nomination. This approach requires high computational resources to implement and it is difficult to handle multiple targets simultaneously. When multiple targets are detected in a real-time camera video stream, accelerated training of multiple GPU graphics cards is often required, so that portability of equipment for detecting target objects is poor. It is often difficult to apply in some end-to-end real-time processing fields where there is no network and mobility requirements are high.
Disclosure of Invention
The invention aims to solve the problems and the defects of the prior art and provides a method and a device for detecting a multi-target object of a two-dimensional video camera.
The method for detecting the target object has the advantages of lower power consumption, lower consumption of computing resources and portability, and can be suitable for a network-free environment and realize end-to-end real-time target object detection. The maximum target detection types of the invention are 90.
In order to achieve the above purpose, the technical scheme provided by the invention is as follows:
a multi-target object detection method comprising the steps of:
step 1: connecting a target object detection device;
step 2: creating a pretrained multi-target object detection model by using a convolutional neural network;
step 3: installing deep learning framework application software on the target object detection equipment device;
step 4: reading application software by using a camera, and sequentially reading each frame of image from the camera;
step 5: reducing the image read by the camera to 448 x 448 pixels;
step 6: dividing the reduced image into 7*7 grids with the same size;
step 7: breaking whether the object is in the grid cell of 7*7 using the coordinate values;
step 8: sending 7*7 grid units with objects judged in the step 7 into a pre-training network model to obtain a frame regression value;
step 9: outputting the frame regression values of 90 object categories of each grid through the discriminators of 90 categories;
step 10: outputting the position value and the confidence coefficient of each frame regression object through the 90 classes of discriminators;
step 11: after the position value and the confidence value of each frame are obtained, a threshold value is set, and frames with low scores are filtered out;
step 12: and carrying out non-maximum value inhibition treatment on the reserved frames, and merging the frames to obtain a final detection result.
Further, the connection manner of the connection device apparatus in step 1 is as follows:
the mobile terminal display card chip is connected with the embedded main board, the camera is connected with the embedded main board, the power adapter is connected with the embedded main board, and the hard disk is connected with the embedded main board.
Further, the specific content and steps of creating the pre-training multi-target object detection model in the step 2 are as follows:
(1) Preparing a training sample picture of a target object to be detected;
(2) Manually calibrating the position and size frames of the target in the sample picture;
(3) Scaling down the scaled sample picture to 448 x 448 pixels;
(4) Performing feature extraction on the reduced samples by using a 24-layer convolutional neural network to obtain some frame regression coordinates, confidence coefficient of objects contained in the frames and category probability;
(5) And performing non-maximum suppression on all the frames, and outputting a unique frame after screening.
Further, in the specific way of determining whether the object is in the grid cell of 7*7 by using the coordinate values in step 7, the coordinates of the center point of the object are compared with the coordinate ranges of the grid cell, so as to determine whether the object is in the grid cell.
Further, the detection method further comprises the step of comparing the confidence of the object with a threshold of a target image to judge whether the video to be detected contains the target image or not; comparing the confidence value with a threshold value of the target image, and judging that the video to be detected contains the target image when the confidence value is larger than or equal to the threshold value of the target image; and when the confidence score is smaller than the threshold value of the target image, judging that the target image is not included in the video to be detected.
Further, the detection method further comprises the step of comparing the position value of the object with the position value in a pre-training multi-target object detection model to judge the accuracy of object target detection; comparing the position value with the position of the manual calibration target in the sample by utilizing the cross ratio, and judging that the detection to be detected is correct when the position value is greater than or equal to a cross ratio threshold value of the position of the manual calibration target in the sample; and when the position value and the position of the manual calibration target in the sample are smaller than and are compared with a threshold value, judging that the detection to be detected is wrong.
In order to achieve the above purpose, another technical scheme provided by the invention is as follows:
the device for detecting the multi-target object comprises a mobile terminal display card chip, an embedded main board, a camera, a power adapter and a hard disk, wherein the embedded main board is a hardware platform for detecting the whole multi-target object; the mobile terminal display card chip is an embedded image processing module for processing the video stream image; the camera is used for acquiring video images; the power adapter is responsible for supplying power to the embedded motherboard; the hard disk is used for storing data; the embedded main board is respectively connected with the display card chip, the camera, the power adapter and the hard disk.
The method for detecting the multi-target object has the characteristics and beneficial effects that:
1. the method uses a low-power consumption power supply, and the power consumption of the equipment is lower than that of the deep learning target detection of a computer and a server side;
2. the method of the invention can be used under the condition of no network, and does not need to be transmitted to a server side for operation in real time. The system can be used outdoors and is in an environment with poor network;
3. the method has small volume, is applied to an embedded device, and is suitable for terminal application of multi-target object identification;
4. the method of the present invention has a process Frame Per Second (FPS) of 75.1% accuracy at 18-32VOC 2007 dataset. The precision can be very high under the condition of meeting the low power consumption;
5. the method of the invention can simultaneously process 90 kinds of multi-targets.
Drawings
Fig. 1 is a flowchart of a multi-target object detection method according to the present invention.
Fig. 2 is a device connection diagram according to the present invention.
FIG. 3 is a flow chart of creating a pre-trained multi-target object detection model in accordance with the present invention.
Fig. 4 is a flow chart of installation of device software for multi-target object detection according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
Fig. 1 is a flowchart of a multi-target object detection method provided by the present invention. The invention provides a method for detecting a multi-target object, which comprises the following steps:
s101, connecting a target object detection device and connecting devices required by a multi-target object detection method;
s102, creating a pre-training multi-target object detection model by using a convolutional neural network. A model of multi-objective detection needs to be pre-trained before installation into the device, with the goal of detection without a network;
s103, installing deep learning framework application software on the target object detection equipment device;
s104, calling application software by using a camera, and sequentially reading each frame of image from the video camera;
s105, scaling the image read in the S104 to 448 x 448 pixels by using image scaling application software;
s106, dividing the image scaled in the S105 into grids of 7*7 by using image cutting application software;
s107, whether the broken object is in the grid cell of 7*7 or not is determined by the coordinate values. If the center coordinates of the object's frame fall within this grid, this grid is used to predict the object; if the object's bounding box center coordinates are not in the grid, the grid is not used to predict the object. Each grid predicts multiple frame regressions, each frame regressions needs to be accompanied by a confidence value for prediction besides the position of the frame regressions;
s108, sending the picture of the judging object in the grid of S107 to a pretrained multi-target object detection model of S102;
s109, calculating an output boundary frame position value and a confidence value by utilizing a multi-target object detection model network, wherein the confidence value represents the confidence of the predicted frame containing an object and the predicted multi-accuracy information of the frame, and calculating the confidence value of the position by adopting the following formula:
Figure BDA0001895011030000041
as described in the above equation, the first term takes 1 if there is an object falling in a grid cell, and takes 0 otherwise. The second term is the ratio of the intersection between the predicted bounding box regression and the actual manually labeled bounding box. Each frame regression predicts 5 values of center point abscissa, center point ordinate, width, length and confidence value, and each grid predicts 90 category information. 7*7 grids, each predicting 2 bounding box regressions, and 90 categories. The output is a tensor of 7*7 (5×2+90).
S110, filtering the value of S109, wherein a depth confidence threshold value with a threshold value of 0.5 is adopted.
In the invention, the step of comparing the confidence value of the position and the score of the confidence degree with the image threshold of the target to judge whether the video to be detected contains the target image comprises the following steps:
comparing the confidence value and confidence response score of the location with a threshold value of a target image;
if the response score is greater than or equal to the threshold value of the target image, judging that the video to be detected contains the target image;
and if the response score is smaller than the threshold value of the target image, judging that the target image is not included in the video to be detected.
S111, positioning the position value and the confidence value of each frame obtained after the step of judging that the video to be detected comprises the target image in the step of S110, setting a threshold value, and filtering out frames with low scores;
and S112, performing non-maximum value inhibition processing on the reserved frames, merging the frames to obtain a final detection result, and reserving only one frame with the highest value to obtain the detection result.
Based on the above object detection methods, the present invention further provides a multi-object detection device, which is used for executing the multi-object detection method.
Fig. 2 is a connection diagram of each device for multi-target object detection according to the present invention, and a device 1200 for multi-target object detection according to the present invention is provided. The mobile terminal display card comprises 1 mobile terminal display card chip, 1 embedded main board, 1 camera, 1 power adapter and 1 hard disk. The embedded main board is a hardware platform for detecting the whole target object. The mobile terminal display card chip is an embedded image processing module which is responsible for processing the video stream image. The camera is used for acquiring video images. The power adapter is responsible for supplying power to the embedded motherboard. The hard disk is used for storing data. The mobile terminal display card chip 804 is connected with the embedded motherboard 805, the camera 802 is connected with the embedded motherboard 805, the power adapter 801 is connected with the embedded motherboard 805, and the hard disk 803 is connected with the embedded motherboard 805.
FIG. 3 is a flowchart for creating a pre-trained multi-target object detection model according to the present invention, and a flowchart for creating a multi-target object detection model is provided, comprising the following steps:
s110: preparing training sample pictures of a target object to be detected, wherein the number of samples of a single target is not less than 1 ten thousand pictures;
s111: manually calibrating the position and the size frame of the target in the sample picture, and manually calibrating the position of the real frame of the sample by using image processing software;
s112: using image processing software to reduce the calibrated sample to 448 x 448 pixels;
s113: using 24 layers of convolutional neural networks, then connecting 2 full-connection layers and a convolutional neural network with the size of 1 x 90 to perform feature extraction on the reduced sample to obtain a plurality of frame regression coordinate values, confidence coefficient values of objects contained in the frames and class probability values of 90 targets;
s114: and (3) performing non-maximum value inhibition screening on the characteristic convolution layer value obtained in the step (S113), and finally merging the characteristic convolution layer values into a frame.
Fig. 4 is a flowchart of installation of equipment software for multi-target object detection according to the present invention.
S201, first, a 64-bit os of the wu Ban Tu is installed on a computer. This operating system uses a long-term support version;
s202, after the step of S201 is completed, connecting a 64-bit operating system of the black-ban diagram brushed on the host computer with a data line on target object detection equipment;
s203, after the system installation of S202 is completed, using an embedded installation package of Injeida to install a deep learning image acceleration package of kuda;
s204, after the installation of S203 is completed, installing a google deep learning frame, and using a deep learning method to process the target detection problem;
s205, installing camera calling software and image processing software after S204 is completed, wherein the camera calling software and the image processing software are mainly used for operations such as image reading, zooming and cutting;
s206, installing an object detection framework, and integrating excellent algorithms and recognition frame algorithms by using a google target detection application framework.
The invention aims to provide a method for detecting a multi-target object of a two-dimensional video camera. The problems of complicated design, low detection speed, heavy equipment, high power consumption and the like of image feature extraction in the prior art are solved. The target object detection by the method has lower power consumption, lower calculation resource and portability, and can be suitable for real-time target object detection end to end in a network-free environment. The device uses a low-power supply with 90W, and the power consumption of the device is lower than that of deep learning target detection at a computer and a server. The device in the invention can be used under the condition of no network, and does not need to be transmitted to a server side for operation in real time. Can be used outdoors in areas with poor network. The equipment has small volume, the whole equipment has the size of 40cm, and the equipment is suitable for terminal application of multi-target object identification. The frame number per second (FPS) of the method of the invention was 75.1% accurate at the 18-32VOC 2007 dataset. And high precision can be achieved under the condition of low power consumption. The number of multi-target species that can be processed simultaneously in the method of the present invention is 90.

Claims (1)

1. The detection device comprises a mobile terminal display card chip, an embedded main board, a camera, a power adapter and a hard disk, wherein the embedded main board is a hardware platform for detecting the whole multi-target object; the mobile terminal display card chip is an embedded image processing module and is responsible for processing video stream images; the camera is used for acquiring video images; the power adapter is responsible for supplying power to the embedded motherboard; the hard disk is used for storing data; the embedded main board is respectively connected with the display card chip, the camera, the power adapter and the hard disk; the detection method is characterized by comprising the following steps of:
step 1: the connection target object detection equipment device comprises the following specific connection modes: connecting a mobile terminal display card chip with an embedded main board, connecting a camera with the embedded main board, connecting a power adapter with the embedded main board, and connecting a hard disk with the embedded main board;
step 2: creating a pretrained multi-target object detection model by using a convolutional neural network, wherein the method comprises the following specific contents and steps:
(1) Preparing a training sample picture of a target object to be detected;
(2) Manually calibrating the position and size frames of the target in the sample picture;
(3) Scaling down the scaled sample picture to 448 x 448 pixels;
(4) Performing feature extraction on the reduced samples by using a 24-layer convolutional neural network to obtain some frame regression coordinates, confidence coefficient of objects contained in the frames and category probability;
(5) Performing non-maximum suppression on all frames, and outputting a unique frame after screening;
step 3: installing deep learning framework application software on the target object detection equipment device;
step 4: reading application software by using a camera, and sequentially reading each frame of image from the camera;
step 5: reducing the image read by the camera to 448 x 448 pixels;
step 6: dividing the reduced image into 7*7 grids with the same size;
step 7: judging whether the object is in the grid unit of 7*7 by utilizing the coordinate values, wherein the specific method is that the center point coordinate of the object is compared with the coordinate range of the grid unit to judge whether the object is in the grid unit;
step 8: sending 7*7 grid units with objects judged in the step 7 into a pre-training network model to obtain a frame regression value;
step 9: outputting the frame regression values of 90 object categories of each grid through the discriminators of 90 categories;
step 10: outputting the position value and the confidence coefficient of each frame regression object through the 90 classes of discriminators;
step 11: after the position value and the confidence value of each frame are obtained, a threshold value is set, and frames with low scores are filtered out;
step 12: performing non-maximum value inhibition treatment on the reserved frames, and merging the frames to obtain a final detection result;
the detection method further comprises the step of comparing the confidence coefficient of the object with a threshold value of a target image to judge whether the video to be detected contains the target image or not; and compares the confidence score to a threshold for the target image,
when the confidence score is larger than or equal to a threshold value of the target image, judging that the target image is contained in the video to be detected;
when the confidence score is smaller than a threshold value of a target image, judging that the target image is not included in the video to be detected;
the detection method further comprises the steps of comparing the position value of the object with the position value in a pre-training multi-target object detection model, and judging the accuracy of object target detection;
and comparing the cross ratio of the position value and the position of the manual calibration target in the sample,
when the position value and the position of the manual calibration target in the sample are greater than or equal to a threshold value of the cross ratio, judging that the detection to be detected is correct;
and when the position value and the position of the manual calibration target in the sample are smaller than and are compared with a threshold value, judging that the detection to be detected is wrong.
CN201811488003.9A 2018-12-06 2018-12-06 Method and device for detecting multiple target objects Active CN109726741B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811488003.9A CN109726741B (en) 2018-12-06 2018-12-06 Method and device for detecting multiple target objects

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811488003.9A CN109726741B (en) 2018-12-06 2018-12-06 Method and device for detecting multiple target objects

Publications (2)

Publication Number Publication Date
CN109726741A CN109726741A (en) 2019-05-07
CN109726741B true CN109726741B (en) 2023-05-30

Family

ID=66295623

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811488003.9A Active CN109726741B (en) 2018-12-06 2018-12-06 Method and device for detecting multiple target objects

Country Status (1)

Country Link
CN (1) CN109726741B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110348312A (en) * 2019-06-14 2019-10-18 武汉大学 A kind of area video human action behavior real-time identification method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106682697A (en) * 2016-12-29 2017-05-17 华中科技大学 End-to-end object detection method based on convolutional neural network
CN106803071A (en) * 2016-12-29 2017-06-06 浙江大华技术股份有限公司 Object detecting method and device in a kind of image
CN108647655A (en) * 2018-05-16 2018-10-12 北京工业大学 Low latitude aerial images power line foreign matter detecting method based on light-duty convolutional neural networks
CN108875595A (en) * 2018-05-29 2018-11-23 重庆大学 A kind of Driving Scene object detection method merged based on deep learning and multilayer feature

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107665336A (en) * 2017-09-20 2018-02-06 厦门理工学院 Multi-target detection method based on Faster RCNN in intelligent refrigerator
CN107844770A (en) * 2017-11-03 2018-03-27 东北大学 A kind of electric melting magnesium furnace unusual service condition automatic recognition system based on video

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106682697A (en) * 2016-12-29 2017-05-17 华中科技大学 End-to-end object detection method based on convolutional neural network
CN106803071A (en) * 2016-12-29 2017-06-06 浙江大华技术股份有限公司 Object detecting method and device in a kind of image
CN108647655A (en) * 2018-05-16 2018-10-12 北京工业大学 Low latitude aerial images power line foreign matter detecting method based on light-duty convolutional neural networks
CN108875595A (en) * 2018-05-29 2018-11-23 重庆大学 A kind of Driving Scene object detection method merged based on deep learning and multilayer feature

Also Published As

Publication number Publication date
CN109726741A (en) 2019-05-07

Similar Documents

Publication Publication Date Title
US10885365B2 (en) Method and apparatus for detecting object keypoint, and electronic device
CN108230357B (en) Key point detection method and device, storage medium and electronic equipment
CN111080693A (en) Robot autonomous classification grabbing method based on YOLOv3
CN110348522B (en) Image detection and identification method and system, electronic equipment, and image classification network optimization method and system
CN108229673B (en) Convolutional neural network processing method and device and electronic equipment
CN110930296B (en) Image processing method, device, equipment and storage medium
CN112669344A (en) Method and device for positioning moving object, electronic equipment and storage medium
CN112857268B (en) Object area measuring method, device, electronic equipment and storage medium
CN108229494B (en) Network training method, processing method, device, storage medium and electronic equipment
CN113012200B (en) Method and device for positioning moving object, electronic equipment and storage medium
US11694331B2 (en) Capture and storage of magnified images
CN109544516B (en) Image detection method and device
CN115861715B (en) Knowledge representation enhancement-based image target relationship recognition algorithm
CN113344862A (en) Defect detection method, defect detection device, electronic equipment and storage medium
CN109726741B (en) Method and device for detecting multiple target objects
CN113780145A (en) Sperm morphology detection method, sperm morphology detection device, computer equipment and storage medium
CN113469087B (en) Picture frame detection method, device, equipment and medium in building drawing
CN114757941A (en) Transformer substation equipment defect identification method and device, electronic equipment and storage medium
CN114612971A (en) Face detection method, model training method, electronic device, and program product
CN113936158A (en) Label matching method and device
CN113537026A (en) Primitive detection method, device, equipment and medium in building plan
CN114037865B (en) Image processing method, apparatus, device, storage medium, and program product
CN117523345B (en) Target detection data balancing method and device
CN111753625B (en) Pedestrian detection method, device, equipment and medium
CN114092739B (en) Image processing method, apparatus, device, storage medium, and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant