WO2021139049A1 - 检测方法、检测装置、监控设备和计算机可读存储介质 - Google Patents

检测方法、检测装置、监控设备和计算机可读存储介质 Download PDF

Info

Publication number
WO2021139049A1
WO2021139049A1 PCT/CN2020/087212 CN2020087212W WO2021139049A1 WO 2021139049 A1 WO2021139049 A1 WO 2021139049A1 CN 2020087212 W CN2020087212 W CN 2020087212W WO 2021139049 A1 WO2021139049 A1 WO 2021139049A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
target
detection
detected
pixel
Prior art date
Application number
PCT/CN2020/087212
Other languages
English (en)
French (fr)
Inventor
邢军华
欧阳一村
曾志辉
许文龙
贺涛
蒋铮
Original Assignee
深圳中兴网信科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳中兴网信科技有限公司 filed Critical 深圳中兴网信科技有限公司
Publication of WO2021139049A1 publication Critical patent/WO2021139049A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/254Analysis of motion involving subtraction of images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20224Image subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30232Surveillance

Definitions

  • This application relates to the technical field of video image recognition, for example, to a detection method, a detection device, a monitoring device, and a computer-readable storage medium.
  • the station building is the core area of the entire project monitoring system. A large number of calculations and analysis are deployed here. In order to prevent casual people from entering at will and the responsibility for system update and maintenance, to ensure the safety of the station system and the effective management of the system, it is necessary to monitor the station. Hundreds or even thousands of cameras for real-time pedestrian detection. There are two common pedestrian detection algorithms: inter-frame difference method and target detection algorithm based on deep learning.
  • the inter-frame difference method is a method of obtaining the contour of a moving target by performing difference operations on two consecutive frames of a video image sequence.
  • the method is simple to implement, fast in calculation (about 5ms), and insensitive to changes in light.
  • holes are prone to occur in the moving body.
  • the target when the target is moving fast, it affects the accurate extraction of the target area, and the detection effect depends on the setting of the difference threshold. Any moving objects in the foreground will be detected. Distinguishing pedestrians and objects has the problem of misjudgment of targets and a high rate of misdetection.
  • the target detection algorithm based on deep learning mainly implements end-to-end automatic learning to capture the characteristics of objects through weight sharing, local connection and other strategies, so that the network has stronger analytical capabilities.
  • a server is required to support as many cameras as possible, and the detection speed of the target detection algorithm (about 20ms) is about four times slower than the inter-frame difference detection speed (about 5ms) .
  • the running speed is too slow, it is difficult to support the simultaneous detection of hundreds of cameras, which greatly reduces the input-output ratio of the project.
  • This application proposes a detection method, including: acquiring image data of at least one camera device; identifying the pixel value of each pixel of the target image in the image data; To determine the image to be detected, the absolute value of the target corresponding to the two adjacent frames of target image is the absolute value of the pixel difference between the pixel points at the same position in the two adjacent frames of target image Value; using the YOLO v3 model to perform detection operations on the image to be detected to identify the target detection object in the image to be detected; record the target detection object.
  • This application proposes a detection device that includes a memory, a processor, and a computer program stored on the memory and capable of running on the processor, and the processor implements the detection method provided in any of the foregoing embodiments when the processor executes the computer program.
  • the present application proposes a monitoring device, which includes: at least one camera device configured to collect image data; and the above-mentioned detection device, which is connected to the at least one camera device.
  • This application proposes a computer-readable storage medium that stores a computer program, and when the computer program is executed by a processor, the detection method as provided in any of the foregoing embodiments is implemented.
  • Fig. 1 shows a schematic flow chart of a detection method according to an embodiment of the present application
  • FIG. 2 shows a schematic flowchart of a detection method according to another embodiment of the present application
  • FIG. 3 shows a schematic flowchart of a detection method according to another embodiment of the present application.
  • FIG. 4 shows a schematic flowchart of a detection method according to another embodiment of the present application.
  • FIG. 5 shows a schematic flowchart of a detection method according to another embodiment of the present application.
  • Fig. 6 shows a schematic block diagram of a detection device according to an embodiment of the present application.
  • a detection method which includes:
  • Step 102 Obtain image data of at least one camera device
  • Step 104 Identify the pixel value of each pixel of the target image in the image data
  • Step 106 Determine the image to be detected according to the preset pixel value and the sum of all target absolute values corresponding to two adjacent frames of target images, where the target absolute value corresponding to two adjacent frames of target image is the target image of two adjacent frames The absolute value of the pixel difference between pixels at the same position in
  • Step 108 Use the YOLO v3 model to perform a detection operation on the image to be detected to identify the target detection object in the image to be detected;
  • Step 110 Record the target detection object existing in the image to be detected.
  • the image data of at least one camera device is acquired, the pixel values of all pixels in the target image collected by the same camera device are identified, and the pixel values of the pixels at the same position in two adjacent frames of target images are calculated.
  • determine whether there is a moving object so as to determine whether there is a moving object, so as to capture two adjacent frames of target images on the same camera Perform inter-frame difference processing, and at the same time use the target image of the detected moving object as the image to be detected, and send it to the third version (You Only Look Once, YOLO v3) model for detection calculations, using the YOLO v3 model Identify the target detection object in the image to be detected and record it so that the user can monitor the monitoring area in real time.
  • the detection method of the present application utilizes the rapidity and versatility of the difference between frames and the high precision of the target detection algorithm based on deep learning, which greatly improves the detection speed and accuracy of real-time detection when the detection object enters the monitoring area, and can support a large number of cameras at the same time
  • the real-time detection of the device greatly improves the input-output ratio of the actual project, and can eliminate the influence of misdetection caused by non-detection objects, and solve the problem of accuracy, speed, and economic efficiency input-output ratio of pedestrian detection methods in related technologies.
  • the target detection object may be a movable object such as pedestrians, vehicles, animals, etc.
  • the YOLO v3 model for the target detection object is set according to the characteristic parameters of the target detection object, so that according to the image to be detected and the YOLO v3 model, It can accurately and quickly identify target detection objects in different types of moving objects.
  • the image data includes the identity information (ID) of the camera device, the target image collected by the camera device, and the collection time of the target image. After the target detection object is detected in the target image, the image is collected according to the With the identification information and time of the camera device, the user can locate the location and time of the appearance of the target detection object in time, thereby realizing real-time monitoring of the monitoring area.
  • ID identity information
  • the target image collected by the camera device After the target detection object is detected in the target image, the image is collected according to the With the identification information and time of the camera device, the user can locate the location and time of the appearance of the target detection object in time, thereby realizing real-time monitoring of the monitoring area.
  • the backbone network of the YOLO v3 model is 53 layers, and there is no pooling layer and fully connected layer in the model structure. Compared with the single deep neural network detection model (Single Shot MultiBox Detector, SSD), it greatly improves the accuracy of target detection.
  • the fast convolutional network detection model (Faster Region with CNN feature, Faster_RCNN) effectively improves the detection speed.
  • YOLO v3-tiny can be used (Mini YOLO v3) model.
  • a detection method which includes:
  • Step 202 Obtain configuration information of image data
  • Step 204 Determine the number of processes required to download the image data according to the configuration information
  • Step 206 Download image data in parallel according to the number of processes
  • Step 208 Identify the pixel value of each pixel of the target image in the image data
  • Step 210 Determine the image to be detected according to the preset pixel value and the sum of all target absolute values corresponding to two adjacent frames of target images, wherein the target absolute value corresponding to two adjacent frames of target image is the target image of two adjacent frames The absolute value of the pixel difference between pixels at the same position in
  • Step 212 Use the YOLO v3 model to perform a detection operation on the image to be detected, so as to identify the target detection object in the image to be detected;
  • Step 214 Record the target detection object existing in the image to be detected.
  • the number of processes required to download the image data is determined, and the image data of the multi-channel camera device is downloaded in parallel according to the number of processes, which greatly shortens the download time of the target image. Effectively improve the detection efficiency, and can make full use of server resources to enhance the user experience.
  • the configuration information is information such as memory occupation and data type.
  • the method before performing inter-frame difference processing on the target image in the image data corresponding to each of the at least one camera device, the method further includes: filtering the target image; and performing filtering processing on the target image.
  • the image undergoes contrast enhancement processing to eliminate unnecessary parts in the target image, improve the definition of image features, and facilitate the calculation of pixel values, thereby improving the accuracy of moving object detection.
  • the target image may also be cropped to filter a large amount of background content in the image.
  • a detection method which includes:
  • Step 302 Obtain image data of at least one camera device
  • Step 304 Identify the pixel value of each pixel in the target image in the image data corresponding to each camera device
  • Step 306 Calculate the pixels between the pixel values of the pixels at the same position in the current frame of the target image and the previous frame of the target image according to the correspondence between the pixels of the current frame of the target image and the pixels of the previous frame of the target image Difference
  • Step 308 whether the sum of the absolute values of all the pixel differences is greater than the preset pixel value, if the sum of the absolute values of all the pixel differences is greater than the preset pixel value, go to step 310, if the sum of the absolute values of all the pixel differences is less than Or equal to the preset pixel value, go to step 316;
  • Step 310 Use the target image of the current frame as the image to be detected
  • Step 312 using the YOLO v3 model to perform a detection operation on the image to be detected, so as to identify the target detection object in the image to be detected;
  • Step 314 Record the target detection object existing in the image to be detected
  • Step 316 Record the target detection object existing in the image to be detected that was identified by the detection operation of the image to be detected using the YOLO v3 model last time.
  • the pixel value of each pixel in the current frame of the target image is calculated to be the same position in the previous frame of the target image
  • the pixel difference between the pixel values of all pixels is compared with the sum of the absolute value of the pixel difference of all pixels and the preset pixel value. If the sum of the absolute value of all the pixel differences is greater than the preset pixel value, the current There is a big difference between the target image of the frame and the target image of the previous frame, that is, if there is a moving object, the target image of the current frame is taken as the image to be detected.
  • the detection result of the target detection object in the last image to be detected is directly recorded. There is no need to perform calculations on the image through the YOLO v3 model again, so that it can be used in a large number of images.
  • the images containing moving objects are selected from the data, which facilitates the subsequent identification of the target detection object on the image, greatly improves the detection speed of real-time detection when the detection object enters the monitoring area, and avoids the situation that the accuracy cannot be guaranteed in the related technology. The problem of detecting batches of images.
  • the preset pixel value can be set reasonably according to the actual scene and the pixel value of the image.
  • identifying the pixel value of the target image includes: performing gray-scale processing on two consecutive frames of images to weaken similar parts of the image and highlight the changed parts of the image; binarize the gray-scale image and extract from it The pixel value of each pixel in the target image.
  • a detection method which includes:
  • Step 402 Obtain image data of at least one camera device.
  • Step 404 Identify the pixel value of each pixel of the target image in the image data
  • Step 406 Determine the image to be detected based on the preset pixel value and the sum of all target absolute values corresponding to two adjacent frames of target images, where the target absolute value corresponding to two adjacent frames of target image is the target image of two adjacent frames The absolute value of the pixel difference between pixels at the same position in
  • Step 408 Use the YOLO v3 model to perform a detection operation on the image to be detected to identify the target detection object in the image to be detected;
  • Step 410 Segment the image to be detected according to a preset size to obtain a detection cell
  • Step 412 input the detection cell into the convolutional neural network model, and determine the bounding box of the detection cell;
  • Step 414 Determine the positional confidence and classification confidence of the bounding box according to the bounding box and the preset category bounding box;
  • Step 416 Use a non-maximum value suppression algorithm to process the fixed position reliability and classification confidence to obtain category information of the target detection object;
  • Step 418 Generate and upload an event record according to the image to be detected, the category information of the target detection object, the identity information of the camera device, and the collection time of the image to be detected.
  • the input image to be detected is divided into S ⁇ S grid-like detection cells, and sent to Convolutional Neural Networks (CNN) to extract features.
  • CNN Convolutional Neural Networks
  • Each cell will predict multiple The confidence of the bounding box and the bounding box, where the confidence of the bounding box includes the positional confidence and the classification confidence.
  • the classification confidence is the probability that the target detection object belongs to multiple categories in the bounding box, and the non-maximum suppression algorithm ( Non-maximum suppression (NMS) processes the positional confidence and classification confidence to obtain the category information of the target detection object.
  • NMS Non-maximum suppression
  • the pedestrian category is an adult or a child.
  • After detecting the target detection object in the target image record the image to be detected and the target detection object category information corresponding to the image to be detected, the identity information of the camera device and the collection time of the image to be detected, and generate and upload event records for the convenience of users Inquire about the entry and exit of the detected objects in the monitoring area at any time.
  • the size and position of the bounding box are represented by (x, y, w, h), where (x, y) are the center coordinates of the bounding box, and w and h are the width and height of the bounding box, respectively. .
  • the YOLO v3 model is used to divide the input image into S ⁇ S grid-like detection cells.
  • the CNN network model is responsible for detecting the target whose center point falls in the detection cell, that is, each cell will predict B A bounding box and the confidence of the bounding box.
  • Each cell corresponds to the total number of prediction categories. There are a total of C categories. Among them, the confidence has two meanings. One is the probability that the bounding box contains the target. The second is the accuracy of the bounding box.
  • the accuracy of the bounding box can be characterized by the intersection over union (IOU) of the predicted box (ie, the bounding box) and the actual box (ground truth, the bounding box of the preset category) .
  • a detection method is proposed.
  • a station building is used as a monitoring area.
  • the station building is equipped with multiple cameras and pedestrians are the target detection objects.
  • the detection method includes:
  • Step 502 the station room camera collects picture data in real time
  • Step 504 download data from multiple cameras in parallel by multiple processes
  • Step 506 Perform inter-frame difference on two adjacent frames of images from the same camera
  • Step 508 Determine whether the sum of the absolute values of all pixel differences is less than the preset threshold. If the sum of the absolute values of all the pixel differences is less than the preset threshold, go to step 510. If the sum of the absolute values of all the pixel differences is greater than Or equal to the preset threshold, go to step 512;
  • Step 510 Return the stored last detection result
  • Step 512 Multi-process calls the YOLO v3 model to perform detection and saves, replaces and updates the detection result and camera ID;
  • Step 514 Return the result of detecting pedestrians to the intelligent recognition system to form an event record.
  • multiple processes are used to download multiple camera data (camera ID, picture, and acquisition time) in parallel, and then the picture frame difference is performed.
  • the detection processing uses the YOLO v3 model with high accuracy and speed.
  • the detection method provided in this embodiment utilizes the rapidity of the difference between frames and the high precision of the deep learning YOLO v3 target detection algorithm based on multi-process downloading pictures and multi-process detection, which greatly improves the detection of pedestrians entering the station building real-time detection system
  • Speed and accuracy can support real-time detection of hundreds of cameras at the same time, which greatly improves the input-output ratio of actual projects, and solves the shortcomings of the pedestrian detection methods in related technologies in accuracy, speed, and economic benefit input-output ratio.
  • a large number of images collected are used to perform training iteration optimization to obtain a YOLO v3 model.
  • a detection device 600 which includes a memory 602, a processor 604, and a computer program stored in the memory 602 and running on the processor 604.
  • the device 604 executes the computer program, the detection method of any of the foregoing embodiments is implemented.
  • a monitoring device including: at least one camera device, the camera device is configured to collect image data; and the above-mentioned detection device, the detection device is configured to be connected to at least one camera device, the detection device.
  • the following steps can be achieved when the computer program is executed: acquiring image data of at least one camera device; identifying the pixel value of each pixel of the target image in the image data; according to the preset pixel value and all corresponding to two adjacent frames of the target image Determine the target absolute value of the target image to be detected, wherein the target absolute value corresponding to two adjacent frames of target image is the absolute value of the pixel difference between the pixel points at the same position in the two adjacent frames of target image; Use the YOLO v3 model to perform detection operations on the image to be detected to identify the target detection object in the image to be detected; record the target detection object.
  • the monitoring equipment can obtain image data of at least one camera device, identify the pixel values of all pixels in the target image collected by the same camera device, and calculate the pixels of the pixels at the same position in two adjacent frames of target images The pixel difference between the values, according to the size relationship between the absolute value of all the pixel differences and the preset pixel value, determine whether there is a moving object, so as to capture two adjacent frames of target images on the same camera Perform inter-frame difference processing, and at the same time, use the target image of the detected moving object as the image to be detected, and send it to the YOLO v3 model for detection calculations. Through the YOLO v3 model, the target detection object in the image to be detected is identified and recorded. For users to monitor the monitoring area in real time.
  • the monitoring equipment uses the rapidity and extensiveness of the difference between frames and the high precision of the target detection algorithm based on deep learning, which greatly improves the detection speed and accuracy of real-time detection when the detected object enters the monitoring area, and can support a large number of camera devices at the same time.
  • Real-time detection greatly improves the input-output ratio of the actual project, and can eliminate false detections caused by non-detection objects, and solve the problems of the accuracy, speed, and input-output ratio of economic benefits in the pedestrian detection methods in related technologies. The problem.
  • a computer-readable storage medium which stores a computer program, and when the computer program is executed by a processor, the steps of the detection method as in any of the foregoing embodiments are implemented.
  • connection can be a fixed connection, a detachable connection, or an integral connection; it can be directly connected or indirectly connected through an intermediate medium.
  • the description of the terms “one embodiment”, “some embodiments”, “specific embodiments”, etc. means that the features, structures, materials, or characteristics described in conjunction with the embodiments or examples are included in the application. In at least one embodiment or example. In this specification, the schematic representations of the above-mentioned terms do not necessarily refer to the same embodiment or example. Moreover, the described features, structures, materials or characteristics can be combined in any one or more embodiments or examples in a suitable manner.

Abstract

一种检测方法、检测装置、监控设备和计算机可读存储介质。所述检测方法包括:获取至少一个摄像装置的图像数据;识别图像数据中目标图像的每一个像素点的像素值;根据预设像素值以及相邻两帧目标图像对应的所有的目标绝对值之和,确定待检测图像,其中,所述邻两帧目标图像对应的目标绝对值为相邻两帧目标图像中同一位置的像素点之间的像素差值的绝对值;采用YOLO v3模型对待检测图像进行检测运算,以识别待检测图像中存在的目标检测对象;记录目标检测对象。

Description

检测方法、检测装置、监控设备和计算机可读存储介质
本申请要求在2020年01月10日提交中国专利局、申请号为202010027424.2的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本申请涉及视频图像识别技术领域,例如,涉及一种检测方法、检测装置、监控设备和计算机可读存储介质。
背景技术
站房是整个工程监控系统的核心区域,大量的计算分析部署在这里,为防止闲杂人等随意进入及系统更新维护责任到人,保证站房系统安全及系统的有效管理,需要对监控站房的上百路甚至上千路摄像头进行行人的实时检测。常见的行人检测算法有两种:帧间差分法和基于深度学习的目标检测算法。
帧间差分法是一种通过对视频图像序列的连续两帧图像做差分运算获取运动目标轮廓的方法。该方法实现简单,运算速度快(5ms左右),对光线的变化不敏感。但是,在运动体内易产生空洞,例如在目标运动速度较快的情况下,影响目标区域的准确提取,且检测效果取决于差分阈值的设定,对前景中的任何运动物体都会进行检测,无法区分行人和物体,存在目标误判、误检率高的问题。
基于深度学习的目标检测算法主要通过权值共享、局部连接等策略来实现端到端的自动学习捕捉物体的特征,使网络具有更强的解析能力。但为了工程项目的投入产出比最大化,则需要一台服务器支持尽可能多路的摄像头,而目标检测算法检测速度(20ms左右)相比帧间差分检测速度(5ms左右)慢四倍左右,仅仅用目标检测算法,运行速度过慢,难以支持上百路摄像头的同时检测,使得项目的投入产出比大大降低。
发明内容
本申请至少解决相关技术中存在的上述技术问题。
本申请提出了一种检测方法,包括:获取至少一个摄像装置的图像数据; 识别图像数据中目标图像的每一个像素点的像素值;根据预设像素值以及相邻两帧目标图像对应的所有的目标绝对值之和,确定待检测图像,其中,所述相邻两帧目标图像对应的目标绝对值为所述相邻两帧目标图像中同一位置的像素点之间的像素差值的绝对值;采用YOLO v3模型对待检测图像进行检测运算,以识别待检测图像中存在的目标检测对象;记录目标检测对象。
本申请提出了一种检测装置,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行计算机程序时实现上述任一实施例提供的检测方法。
本申请提出了一种监控设备,包括:至少一个摄像装置,所述摄像装置设置为采集图像数据;以及上述检测装置,所述检测装置与所述至少一个摄像装置连接。
本申请提出了一种计算机可读存储介质,存储有计算机程序,计算机程序被处理器执行时实现如上述任一实施例提供的检测方法。
附图说明
图1示出了本申请一个实施例的检测方法流程示意图;
图2示出了本申请又一个实施例的检测方法流程示意图;
图3示出了本申请又一个实施例的检测方法流程示意图;
图4示出了本申请又一个实施例的检测方法流程示意图;
图5示出了本申请又一个实施例的检测方法流程示意图;
图6示出了本申请一个实施例的检测装置示意框图。
具体实施方式
下面结合附图和具体实施方式对本申请进行描述。
在下面的描述中阐述了很多细节以便于充分理解本申请,但是,本申请还可以采用其他不同于在此描述的其他方式来实施,因此,本申请的保护范围并不限于下面公开的具体实施例的限制。
下面参照图1至图6描述根据本申请实施例的检测方法、检测装置600、监控设备及计算机可读存储介质。
实施例一
如图1所示,根据本申请第一方面的实施例,提出了一种检测方法,该方 法包括:
步骤102,获取至少一个摄像装置的图像数据;
步骤104,识别图像数据中目标图像的每一个像素点的像素值;
步骤106,根据预设像素值以及相邻两帧目标图像对应的所有的目标绝对值之和,确定待检测图像,其中,相邻两帧目标图像对应的目标绝对值为相邻两帧目标图像中同一位置的像素点之间的像素差值的绝对值;
步骤108,采用YOLO v3模型对待检测图像进行检测运算,以识别待检测图像中存在的目标检测对象;
步骤110,记录待检测图像中存在的目标检测对象。
在该实施例中,获取至少一个摄像装置的图像数据,识别同一路摄像装置采集目标图像中全部像素点的像素值,并计算相邻两帧目标图像中同一位置的像素点的像素值之间的像素差值,根据所有像素点的像素差值的绝对值之和与预设像素值之间的大小关系,判断是否存在运动物体,从而对采集于同一路摄像装置的相邻两帧目标图像进行帧间差分处理,同时将检测到运动物体的目标图像作为待检测图像,并送入你只看一次的第三个版本(You Only Look Once,YOLO v3)模型进行检测运算,利用YOLO v3模型识别出待检测图像中存在的目标检测对象,并进行记录,以供用户对监测区域进行实时监控。本申请的检测方法利用帧间差分的快速性、广泛性及基于深度学习的目标检测算法的高精度,大大提高了检测对象进入监测区域时实时检测的检测速度和精度,而且可同时支持大量摄像装置的实时检测,极大地提高了实际工程的投入产出比,并且能够排除非检测对象引起的误检影响,解决相关技术中的行人检测方法在准确率、速度、经济效益投入产出比等方面存在不足的问题。
在一实施例中,目标检测对象可以是行人、车辆、动物等能够运动的物体,根据目标检测对象的特征参数设置针对该目标检测对象的YOLO v3模型,从而根据待检测图像和YOLO v3模型,能够准确、快速地在不同类型运动物体中识别出目标检测对象。
在一实施例中,图像数据包括摄像装置的身份信息(Identity Information,ID)、摄像装置采集的目标图像以及目标图像的采集时间,在目标图像中检 测到目标检测对象后,根据采集该图像的摄像装置的身份信息和时间,用户能够及时定位目标检测对象出现的位置和时间,从而实现监测区域的实时监控。
在一实施例中,YOLO迭代三个版本作为最具代表性的one-stage(单阶段)目标检测模型,YOLO v3模型能够达到速度和精度的和谐统一。YOLO v3模型的骨干网络为53层,而且模型结构中没有池化层和全连接层,相比于单一深层神经网络检测模型(Single Shot MultiBox Detector,SSD)大大提高了目标检测的精度,相比于快速的卷积网络检测模型(Faster Region with CNN feature,Faster_RCNN)有效提升了检测速度,在一实施例中,对于精度要求较低的监测场景,为了进一步提升检测速度,可采用YOLO v3-tiny(微型YOLO v3)模型。
实施例二
如图2所示,根据本申请的又一个实施例,提出了一种检测方法,该方法包括:
步骤202,获取图像数据的配置信息;
步骤204,根据配置信息确定下载图像数据所需的进程数量;
步骤206,根据进程数量并行下载图像数据;
步骤208,识别图像数据中目标图像的每一个像素点的像素值;
步骤210,根据预设像素值以及相邻两帧目标图像对应的所有的目标绝对值之和,确定待检测图像,其中,相邻两帧目标图像对应的目标绝对值为相邻两帧目标图像中同一位置的像素点之间的像素差值的绝对值;
步骤212,采用YOLO v3模型对待检测图像进行检测运算,以识别待检测图像中存在的目标检测对象;
步骤214,记录待检测图像中存在的目标检测对象。
在该实施例中,根据图像数据的配置信息,确定下载图像数据所需的进程数量,根据进程数量以多进程并行的方式下载多路摄像装置的图像数据,大大缩短了目标图像的下载时间,有效提升了检测效率,并能够充分利用服务器资源,提升用户的使用体验。
在一实施例中,配置信息为内存占用、数据类别等信息。
在一实施例中,对至少一个摄像装置中的每一个摄像装置对应的图像数据中的目标图像进行帧间差分处理之前,还包括:对目标图像进行滤波处理;以及对经滤波处理后的目标图像进行对比度增强处理,从而消除目标图像中不需要的部分,提高图像特征的清晰度,有利于进行像素值之间的计算,从而提高运动物体检测的准确性。
在一实施例中,考虑到帧间差分的处理效率,还可以对目标图像进行剪裁,以过滤图像中大量的背景内容。
实施例三
如图3所示,根据本申请的又一个实施例,提出了一种检测方法,该方法包括:
步骤302,获取至少一个摄像装置的图像数据;
步骤304,识别每一个摄像装置对应的图像数据中的目标图像中每一个像素点的像素值;
步骤306,根据当前帧目标图像的像素点与前一帧目标图像的像素点之间的对应关系,计算当前帧目标图像与前一帧目标图像中同一位置的像素点的像素值之间的像素差值;
步骤308,所有像素差值的绝对值之和是否大于预设像素值,若所有像素差值的绝对值之和大于预设像素值,进入步骤310,若所有像素差值的绝对值之和小于或等于预设像素值,进入步骤316;
步骤310,将当前帧目标图像作为待检测图像;
步骤312,采用YOLO v3模型对待检测图像进行检测运算,以识别待检测图像中存在的目标检测对象;
步骤314,记录待检测图像中存在的目标检测对象;
步骤316,记录上一次采用YOLO v3模型对待检测图像进行检测运算识别的待检测图像中存在的目标检测对象。
在该实施例中,识别同一路摄像装置采集的相邻两帧目标图像中全部像素点的像素值之后,计算当前帧目标图像中每个像素点的像素值与前一帧目标图像中同一位置的像素点的像素值之间的像素差值,对比所有像素点的像素差值的绝对值之和与预设像素值,若所有像素差值的绝对值之和大于预设 像素值,说明当前帧目标图像和前一帧目标图像之间存在较大差异,即出现运动物体,则将当前帧目标图像作为待检测图像,若所有像素差值的绝对值之和小于或等于预设像素值,说明当前帧目标图像和前一帧目标图像之间差异较小,此时直接记录上一次待检测图像中目标检测对象的检测结果,无需再次通过YOLO v3模型对图像进行运算,从而在大量的图像数据中筛选出包含运动物体的图像,便于后续对该图像进行目标检测对象的识别,大大提高了检测对象进入监测区域时实时检测的检测速度,避免了相关技术中无法在保证精度的情况下,对批量图像进行检测的问题。
在一实施例中,预设像素值可以根据实际场景和图像像素值进行合理设置。
在一实施例中,识别目标图像的像素值包括:对连续两帧图像进行灰度化处理,以削弱图像的相似部分,突出显示图像的变化部分;二值化该灰度图像,并从中提取目标图像中每一个像素点的像素值。
实施例四
如图4所示,根据本申请的又一个实施例,提出了一种检测方法,该方法包括:
步骤402,获取至少一个摄像装置的图像数据;
步骤404,识别图像数据中目标图像的每一个像素点的像素值;
步骤406,根据预设像素值以及相邻两帧目标图像对应的所有的目标绝对值之和,确定待检测图像,其中,相邻两帧目标图像对应的目标绝对值为相邻两帧目标图像中同一位置的像素点之间的像素差值的绝对值;
步骤408,采用YOLO v3模型对待检测图像进行检测运算,以识别待检测图像中存在的目标检测对象;
步骤410,根据预设尺寸分割待检测图像,得到检测单元格;
步骤412,将检测单元格输入卷积神经网络模型,确定检测单元格的边界框;
步骤414,根据边界框和预设类别边界框,确定边界框的定位置信度和分类置信度;
步骤416,采用非极大值抑制算法对定位置信度和分类置信度进行处理, 得到目标检测对象的类别信息;
步骤418,根据待检测图像、目标检测对象类别信息、摄像装置身份信息及待检测图像采集时间生成并上传事件记录。
在该实施例中,将输入的待检测图像分割成S×S网格状检测单元格,并送入卷积神经网络(Convolutional Neural Networks,CNN)提取特征,每个单元格会预测出多个边界框以及边界框的置信度,其中,边界框的置信度包括定位置信度和分类置信度,分类置信度即边界框中目标检测对象属于多个类别的概率,采用非极大值抑制算法(Non-maximum suppression,NMS)对定位置信度和分类置信度进行处理,得到目标检测对象的类别信息,通过上述方案,优化了检测系统,对目标检测对象进行归类,以便于用户对同类型目标检测对象进行追踪,例如,在进行行人检测的情况下,识别出行人类别为成人或儿童。在目标图像中检测到目标检测对象后,将待检测图像和与待检测图像对应的目标检测对象类别信息、摄像装置身份信息及待检测图像采集时间进行记录,生成并上传事件记录,以便于用户随时查询监测区域内检测对象的进出情况。
在一实施例中,边界框的大小和位置用(x,y,w,h)来表征,其中,(x,y)是边界框的中心坐标,w、h分别是边界框的宽和高。
在一实施例中,利用YOLO v3模型将输入的图片分割成S×S网格状检测单元格,CNN网络模型负责检测中心点落在检测单元格内的目标,即每个单元格会预测B个边界框以及边界框的置信度,每个单元格对应是预测总的类别数,共为C种类别,其中,置信度包含两个方面的含义,一是边界框含有目标的可能性大小,二是这个边界框的准确度,边界框的准确度可以用预测框(即边界框)与实际框(ground truth,即预设类别边界框)的交并比(intersection over union,IOU)来表征。
实施例五
如图5所示,根据本申请的又一个实施例,提出了一种检测方法,以站房为监测区域,站房设置有多个摄像头,以行人为目标检测对象,检测方法包括:
步骤502,站房摄像头实时采集图片数据;
步骤504,多进程并行下载多路摄像头数据;
步骤506,对来自同一路摄像头的相邻两帧图像做帧间差分;
步骤508,判断所有像素差值的绝对值之和是否小于预设阈值,若所有像素差值的绝对值之和是小于预设阈值,进入步骤510,若所有像素差值的绝对值之和大于或等于预设阈值,进入步骤512;
步骤510,将存储的上次检测结果返回;
步骤512,多进程调用YOLO v3模型进行检测并将检测结果和摄像头ID保存并替换更新;
步骤514,将检测到行人的结果返回给智能识别系统,形成事件记录。
在该实施例中,为了尽可能减小图片下载时延及支持尽可能多的多路摄像头,采用多进程并行下载多路摄像头数据(摄像头ID、图片及采集时间),然后进行图片帧间差分,判断同一路摄像头图像序列中的连续两帧图像的所有像素差值的绝对值之和是否大于或等于预设阈值(预设像素值),对所有像素差值的绝对值之和大于预设阈值的图片,采用多进程调用YOLO v3模型进行检测并将检测结果和摄像头ID保存并替换更新;对所有像素差值的绝对值之和小于预设阈值的图片,直接将存储的上次检测结果返回,供后续运用图片帧间差分的图片调用。检测处理选用精度和速度都很高的YOLO v3模型。
本实施例提供的检测方法,利用帧间差分的快速性及基于深度学习YOLO v3目标检测算法的高精度,配合多进程下载图片和多进程检测,大大提高了行人进入站房实时检测系统的检测速度和精度,可同时支持上百路摄像头实时检测,极大提高了实际工程的投入产出比,解决相关技术中的行人检测方法在准确率、速度、经济效益投入产出比等方面存在不足的问题。
在一实施例中,系统部署前,利用采集的大量图片进行训练迭代优化,得到YOLO v3模型。
实施例六
如图6所示,根据本申请第二方面的实施例,提出了一种检测装置600,包括存储器602、处理器604及存储在存储器602上并可在处理器604上运行的计算机程序,处理器604执行计算机程序时实现上述任一实施例的检测 方法。
实施例七
根据本申请第三方面的实施例,提出了一种监控设备,包括:至少一个摄像装置,摄像装置设置为采集图像数据;以及上述检测装置,检测装置设置为连接于至少一个摄像装置,检测装置设置为执行计算机程序时能够实现以下步骤:获取至少一个摄像装置的图像数据;识别图像数据中目标图像的每一个像素点的像素值;根据预设像素值以及相邻两帧目标图像对应的所有的目标绝对值之和,确定待检测图像,其中,相邻两帧目标图像对应的目标绝对值为所述相邻两帧目标图像中同一位置的像素点之间的像素差值的绝对值;采用YOLO v3模型对待检测图像进行检测运算,以识别待检测图像中存在的目标检测对象;记录目标检测对象。
本实施例提供的监控设备,能够获取至少一个摄像装置的图像数据,识别同一路摄像装置采集目标图像中全部像素点的像素值,并计算相邻两帧目标图像中同一位置的像素点的像素值之间的像素差值,根据所有像素差值的绝对值之和与预设像素值之间的大小关系,判断是否存在运动物体,从而对采集于同一路摄像装置的相邻两帧目标图像进行帧间差分处理,同时将检测到运动物体的目标图像作为待检测图像,并送入YOLO v3模型进行检测运算,通过YOLO v3模型识别出待检测图像中存在的目标检测对象,并进行记录,以供用户对监测区域进行实时监控。该监控设备利用帧间差分的快速性、广泛性及基于深度学习的目标检测算法的高精度,大大提高了检测对象进入监测区域时实时检测的检测速度和精度,而且可同时支持大量摄像装置进行实时检测,极大提高了实际工程的投入产出比,并且能够排除非检测对象造成的误检,解决相关技术中的行人检测方法在准确率、速度、经济效益投入产出比等方面存在不足的问题。
实施例八
根据本申请第四方面的实施例,提出了一种计算机可读存储介质,存储有计算机程序,计算机程序被处理器执行时实现如上述任一实施例的检测方法的步骤。
在本说明书的描述中,术语“第一”、“第二”仅用于描述的目的,而不能理解为指示或暗示相对重要性,除非另有明确的规定和限定;术语“连 接”、“安装”、“固定”等均应做广义理解,例如,“连接”可以是固定连接,也可以是可拆卸连接,或一体地连接;可以是直接相连,也可以通过中间媒介间接相连。对于本领域的普通技术人员而言,可以根据不同情况理解上述术语在本申请中的含义。
在本说明书的描述中,术语“一个实施例”、“一些实施例”、“具体实施例”等的描述意指结合该实施例或示例描述的特征、结构、材料或特点包含于本申请的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施例或实例。而且,描述的特征、结构、材料或特点可以在任何的一个或多个实施例或示例中以合适的方式结合。

Claims (10)

  1. 一种检测方法,包括:
    获取至少一个摄像装置的图像数据;
    识别所述图像数据中目标图像的每一个像素点的像素值;
    根据预设像素值以及相邻两帧目标图像对应的所有的目标绝对值之和,确定待检测图像,其中,所述相邻两帧目标图像对应的目标绝对值为所述相邻两帧目标图像中同一位置的像素点之间的像素差值的绝对值;
    采用YOLO v3模型对所述待检测图像进行检测运算,以识别所述待检测图像中存在的目标检测对象;
    记录所述目标检测对象。
  2. 根据权利要求1所述的检测方法,其中,所述获取至少一个摄像装置的图像数据,包括:
    获取所述图像数据的配置信息;
    根据所述配置信息确定下载所述图像数据所需的进程数量;
    根据所述进程数量并行下载所述图像数据。
  3. 根据权利要求1所述的检测方法,其中,所述根据预设像素值以及相邻两帧目标图像对应的所有的目标绝对值之和,确定待检测图像,包括:
    根据当前帧目标图像的像素点与前一帧目标图像的像素点之间的对应关系,计算所述当前帧目标图像与所述前一帧目标图像中同一位置的像素点的像素值之间的像素差值;
    比较所有所述像素差值的绝对值之和与所述预设像素值之间的大小关系;
    基于所有所述像素差值的绝对值之和大于所述预设像素值的判断结果,将所述当前帧目标图像作为所述待检测图像。
  4. 根据权利要求3所述的检测方法,还包括:
    基于所有所述像素差值的绝对值之和小于或等于所述预设像素值的判断结果,记录上一次采用YOLO v3模型对待检测图像进行检测运算识别的待检测图像中存在的目标检测对象。
  5. 根据权利要求1至4中任一项所述的检测方法,其中,
    所述图像数据包括所述摄像装置的身份信息、所述摄像装置采集的所述目标图像以及所述目标图像的采集时间。
  6. 根据权利要求5所述的检测方法,所述采用YOLO v3模型对所述待 检测图像进行检测运算之后,还包括:
    根据预设尺寸分割所述待检测图像,得到检测单元格;
    将所述检测单元格输入卷积神经网络模型,确定所述检测单元格的边界框;
    根据所述边界框和预设类别边界框,确定所述边界框的定位置信度和分类置信度;
    采用非极大值抑制算法对所述定位置信度和所述分类置信度进行处理,得到所述目标检测对象的类别信息。
  7. 根据权利要求6所述的检测方法,其中,所述记录所述目标检测对象,包括:
    根据所述待检测图像、所述目标检测对象的类别信息、所述摄像装置的身份信息及所述目标图像的采集时间生成并上传事件记录。
  8. 一种检测装置,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器设置为执行所述计算机程序时执行如权利要求1至7中任一项所述的检测方法。
  9. 一种监控设备,包括:
    至少一个摄像装置,所述摄像装置设置为采集图像数据;如权利要求8所述的检测装置,所述检测装置与所述至少一个摄像装置连接。
  10. 一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,实现如权利要求1至7中任一项所述的检测方法。
PCT/CN2020/087212 2020-01-10 2020-04-27 检测方法、检测装置、监控设备和计算机可读存储介质 WO2021139049A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010027424.2A CN111223129A (zh) 2020-01-10 2020-01-10 检测方法、检测装置、监控设备和计算机可读存储介质
CN202010027424.2 2020-01-10

Publications (1)

Publication Number Publication Date
WO2021139049A1 true WO2021139049A1 (zh) 2021-07-15

Family

ID=70831383

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/087212 WO2021139049A1 (zh) 2020-01-10 2020-04-27 检测方法、检测装置、监控设备和计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN111223129A (zh)
WO (1) WO2021139049A1 (zh)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112131966A (zh) * 2020-09-01 2020-12-25 深圳中兴网信科技有限公司 泥头车监控方法、系统和存储介质
CN112183397A (zh) * 2020-09-30 2021-01-05 四川弘和通讯有限公司 基于空洞卷积神经网络的坐防护栏行为的识别方法
CN112380962A (zh) * 2020-11-11 2021-02-19 成都摘果子科技有限公司 一种基于深度学习的动物图像识别方法及系统
CN113949830B (zh) * 2021-09-30 2023-11-24 国家能源集团广西电力有限公司 一种图像处理方法
CN114897762B (zh) * 2022-02-18 2023-04-07 众信方智(苏州)智能技术有限公司 一种煤矿工作面采煤机自动定位方法及装置
CN114898044B (zh) * 2022-05-19 2024-01-23 同方威视技术股份有限公司 检测对象成像方法、装置、设备及介质
CN116824514B (zh) * 2023-08-30 2023-12-08 四川弘和数智集团有限公司 一种目标识别方法、装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101094413A (zh) * 2007-07-06 2007-12-26 浙江大学 用于视频监控的实时运动检测方法
CN106937090A (zh) * 2017-04-01 2017-07-07 广东浪潮大数据研究有限公司 一种视频存储的方法以及装置
CN109117794A (zh) * 2018-08-16 2019-01-01 广东工业大学 一种运动目标行为跟踪方法、装置、设备及可读存储介质
CN110084173A (zh) * 2019-04-23 2019-08-02 精伦电子股份有限公司 人头检测方法及装置
CN110490910A (zh) * 2019-08-13 2019-11-22 顺丰科技有限公司 目标检测方法、装置、电子设备及存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110580428A (zh) * 2018-06-08 2019-12-17 Oppo广东移动通信有限公司 图像处理方法、装置、计算机可读存储介质和电子设备
CN108985170A (zh) * 2018-06-15 2018-12-11 山东信通电子股份有限公司 基于三帧差分法及深度学习的输电线路悬挂物识别方法
CN109584264B (zh) * 2018-11-19 2023-10-31 南京航空航天大学 一种基于深度学习的无人机视觉引导空中加油方法
CN109725310B (zh) * 2018-11-30 2022-11-15 中船(浙江)海洋科技有限公司 一种基于yolo算法以及岸基雷达系统的船舶定位监管系统
CN110321853B (zh) * 2019-07-05 2021-05-11 杭州巨骐信息科技股份有限公司 基于视频智能检测的分布式电缆防外破系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101094413A (zh) * 2007-07-06 2007-12-26 浙江大学 用于视频监控的实时运动检测方法
CN106937090A (zh) * 2017-04-01 2017-07-07 广东浪潮大数据研究有限公司 一种视频存储的方法以及装置
CN109117794A (zh) * 2018-08-16 2019-01-01 广东工业大学 一种运动目标行为跟踪方法、装置、设备及可读存储介质
CN110084173A (zh) * 2019-04-23 2019-08-02 精伦电子股份有限公司 人头检测方法及装置
CN110490910A (zh) * 2019-08-13 2019-11-22 顺丰科技有限公司 目标检测方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN111223129A (zh) 2020-06-02

Similar Documents

Publication Publication Date Title
WO2021139049A1 (zh) 检测方法、检测装置、监控设备和计算机可读存储介质
CN104303193B (zh) 基于聚类的目标分类
KR102155182B1 (ko) 비디오 리코딩 방법, 서버, 시스템 및 저장 매체
CN110136449B (zh) 基于深度学习的交通视频车辆违停自动识别抓拍的方法
Asha et al. Vehicle counting for traffic management system using YOLO and correlation filter
CN109784306B (zh) 一种基于深度学习的智能停车管理方法及系统
TW202013252A (zh) 車牌辨識系統與方法
CN105930822A (zh) 一种人脸抓拍方法及系统
CN111161206A (zh) 一种图像抓拍方法、监控相机及监控系统
CN111401311A (zh) 一种基于图像检测的高空抛物识别方法
KR101884611B1 (ko) 이동객체의 메타데이터 필터링을 이용한 cctv 영상의 관심객체 추출 방법
CN110490043A (zh) 一种基于区域划分和特征提取的森林烟火检测方法
CN110569754A (zh) 图像目标检测方法、装置、存储介质及设备
CN104966304A (zh) 基于卡尔曼滤波与非参数背景模型的多目标检测跟踪方法
CN110781964A (zh) 一种基于视频图像的人体目标检测方法及系统
KR20190046351A (ko) 침입 탐지방법 및 그 장치
CN105844659A (zh) 运动部件的跟踪方法和装置
CN111723773B (zh) 遗留物检测方法、装置、电子设备及可读存储介质
CN111291587A (zh) 一种基于密集人群的行人检测方法、存储介质及处理器
US11756303B2 (en) Training of an object recognition neural network
CN110717408A (zh) 一种基于tof相机的人流计数方法
KR20200060868A (ko) 객체 탐지 및 자동 추적이 가능한 다시점 영상 감시 시스템
CN111723656B (zh) 一种基于YOLO v3与自优化的烟雾检测方法及装置
CN113052055A (zh) 一种基于光流改进与Yolov3的烟雾检测方法
CN113112479A (zh) 基于关键区块提取的渐进式目标检测方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20911873

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20911873

Country of ref document: EP

Kind code of ref document: A1