WO2020258978A1 - Object detection method and device - Google Patents

Object detection method and device Download PDF

Info

Publication number
WO2020258978A1
WO2020258978A1 PCT/CN2020/083515 CN2020083515W WO2020258978A1 WO 2020258978 A1 WO2020258978 A1 WO 2020258978A1 CN 2020083515 W CN2020083515 W CN 2020083515W WO 2020258978 A1 WO2020258978 A1 WO 2020258978A1
Authority
WO
WIPO (PCT)
Prior art keywords
video image
detected
objects
current video
rectangular frame
Prior art date
Application number
PCT/CN2020/083515
Other languages
French (fr)
Chinese (zh)
Inventor
陈奕名
苏睿
张为明
Original Assignee
北京海益同展信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京海益同展信息科技有限公司 filed Critical 北京海益同展信息科技有限公司
Publication of WO2020258978A1 publication Critical patent/WO2020258978A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands

Definitions

  • the present invention relates to the field of computer vision technology, in particular to an object detection method and device.
  • Image classification, target detection, and image segmentation are three major tasks in the field of computer vision.
  • the image classification model divides the image into a single category, which usually corresponds to the most prominent object in the image.
  • many pictures in the real world usually contain more than one object.
  • Using an image classification model to assign a single label to an image is actually very crude and inaccurate.
  • the target detection model can be used to identify multiple objects in a picture and locate the identified different objects.
  • Target detection is the current research hotspot in the field of computer vision. From the past ten years, the image target detection algorithm can be roughly divided into a period based on traditional manual features and a target detection period based on deep learning. After Girshick et al. proposed a regional convolutional network target detection framework (Regions with CNN features, R-CNN), the field of target detection began to develop at an unprecedented speed.
  • Regular convolutional network target detection framework (Regions with CNN features, R-CNN)
  • Target detection is applied in many scenarios, such as unmanned driving and security systems, but there is no technical solution for using target detection to detect farmed objects in videos in intelligent farming scenarios.
  • the object of the present invention is to provide an object detection method and device, which can detect all video images in the video of the surveillance area, so as to accurately identify and count the objects to be detected in the surveillance area.
  • the present invention provides the following technical solutions:
  • an object detection method provided by an embodiment of the present invention includes:
  • an object detection device provided by an embodiment of the present invention includes:
  • the acquiring unit is used to acquire the video image of the monitoring area
  • the recognition unit is used to identify and determine all the objects to be detected in the current video image in combination with the recognition result of the acquired previous video image;
  • the comparison unit is used to compare all objects to be detected in the current video image and the previous video image, and determine the number of objects to be detected in the current video image that do not belong to the previous video image;
  • the statistical unit is configured to increase the number of objects to be detected in the monitoring area currently counted based on the number of objects to be detected that do not belong to the previous video image in the current video image.
  • an embodiment of the present invention provides an electronic device, including: at least one processor, and a memory connected to the at least one processor through a bus; the memory stores the memory that can be executed by the at least one processor One or more computer programs; when the at least one processor executes the one or more computer programs, the steps in the above object detection method are implemented.
  • an embodiment of the present invention provides a computer-readable storage medium that stores one or more computer programs, and when the one or more computer programs are executed by a processor, the foregoing object detection method is implemented .
  • an embodiment of the present invention provides a chip that executes instructions.
  • the chip includes a memory and a processor.
  • the memory stores code and data.
  • the memory is coupled with the processor.
  • the code in the memory enables the chip to execute the steps of the object detection method described above.
  • an embodiment of the present invention provides a program product containing instructions, when the program product runs on a computer, the computer executes the steps of the above object detection method.
  • an embodiment of the present invention provides a computer program, when the computer program is executed by a processor, it is used to execute the steps of the above object detection method.
  • the object detection method and device in combination with the recognition result of the previous acquired video image, identify the object to be detected in the currently acquired video image, starting from the previous two frames By comparing the identified objects to be detected in the video image, the number of new objects to be detected in the next frame of video image can be determined, thereby correspondingly increasing the number of objects to be detected in the currently statistic monitoring area. It can be seen that in the present invention, by calculating and determining the increments of the objects to be detected in the two frames of video images before and after, the number of objects to be detected in the entire monitoring area can be accurately identified and counted.
  • FIG. 1 is a flowchart of an object detection method provided by an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a detection result of a pair of video images according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of a detection result of a video image in Embodiment 2 of the present invention.
  • FIG. 4 is a schematic structural diagram of an object detection device provided by an embodiment of the present invention.
  • Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
  • the use of machine vision to count the number of breeding objects can minimize the expenditure of human resources.
  • the technical scheme provided by the present invention can be used to count the number of farming objects in the intelligent farming scene to reduce labor costs.
  • the present invention mainly analyzes the surveillance video in the surveillance area, identifies all objects to be detected in the surveillance area and counts the total number of objects to be detected. The following description will be given in conjunction with FIG.
  • Fig. 1 is a flowchart of an object detection method provided by an embodiment of the present invention. As shown in Fig. 1, the method mainly includes the following steps:
  • Step 101 Obtain a video image of a monitored area.
  • a movable camera is used to shoot a video of the entire monitoring area, and all the objects to be detected in the monitoring area are determined by detecting and tracking the video frames in the video frame by frame.
  • each frame of video image in the video is acquired, and combined with the recognition result of the previous frame of video image, detection and target tracking are performed on the currently acquired frame of video image.
  • Step 102 Detect and determine all objects to be detected in the current video image in combination with the recognition result of the acquired previous video image.
  • the shooting time interval between the previous and subsequent video images is very small, and therefore, the position of the same object to be detected included therein is also very small.
  • the recognition result of the previous frame of video image can be superimposed on the current video image to ensure that two adjacent The object to be detected in the frame video image has a relatively accurate recall rate.
  • step S01 the determination of the rectangular frame surrounding each object to be detected in the video image can be implemented by using various methods in image processing technology.
  • the convolutional neural network Rotational Region Convolutional Neural Networks, R2CNN
  • R2CNN Resolutional Region Convolutional Neural Networks
  • multiple objects to be detected can be used in advance.
  • the training samples are trained to obtain the R2CNN detection model, and then the R2CNN detection model can be used in the object detection process of the present invention.
  • the R2CNN detection model is used to determine the rectangular frame surrounding each object to be detected in the video image, namely:
  • the video image is input to the R2CNN detection model, and the R2CNN detection model performs image detection on the input video image, and then the rectangular frame surrounding each object to be detected in the video image can be output.
  • the recognition result of the previous frame of video image is also superimposed on the current Video image.
  • a preferred implementation method of the above step S01 is as follows:
  • S011 Use the candidate area network RPN algorithm to determine a horizontal rectangular frame surrounding each object to be detected in the current video image
  • S012 Superimpose the oblique rectangular frame identified from the previous video image and surrounding each object to be detected on the current video image;
  • Figures 2 and 3 respectively show the detection results of a frame of video image using the above three steps.
  • step S01 can also be implemented by other methods. For example, first superimpose the recognition result of the previous frame of video image into the current video image, and then use the pre-trained R2CNN detection model to perform image detection on the current video image, namely Follow the order of steps S012, S011, and S013. For another example, first use the pre-trained R2CNN detection model to perform image detection on the current video image, and then superimpose the recognition result of the previous frame of video image into the recognition result of the current video image, that is, follow the steps S011, S013, S012 Sequence execution.
  • the candidate area network (RPN) algorithm is used to determine the horizontal rectangular frame surrounding each object to be detected in the current video image, and the convolution algorithm is mainly used to extract image features at different scales, including low-level edge textures.
  • Features also include advanced semantic features. By fusing these two features together, it is possible to generate complete information surrounding each object to be detected and a rectangular box parallel to the boundary of the current video image (called a horizontal rectangular box).
  • step S013 In order to fully identify the information of the object to be detected, in step S013 above, for the rectangular frame surrounding each object to be detected in the current video image, image information detection can be performed through the ROI Pooling algorithm to generate the rectangular frame Then perform regression analysis on the image features generated by the ROI Pooling algorithm.
  • the regression analysis results obtained include the translation and rotation angle information corresponding to the rectangular frame. This translation and rotation angle information indicates that the rectangular frame needs to be Direction adjustment is the basis for adjusting the horizontal rectangular frame to the directional inclined rectangular frame.
  • the probability that each rectangular frame in the current video image hits the object to be counted surrounded by the rectangular frame can also be adjusted and set, which specifically includes:
  • the probability that the inclined rectangular frame superimposed in the current video image hits the object to be detected surrounded by the inclined rectangular frame is set to 1;
  • the above steps S0121 and S0122 are in no particular order.
  • the setting of the above probability can affect the execution result of the above step S013, which belongs to the R2CNN technology and will not be described in detail.
  • Step 103 Compare all objects to be detected in the current video image and the previous video image, and determine the number of objects to be detected in the current video image that do not belong to the previous video image.
  • the Euclidean distance of the two objects to be detected can be calculated based on the coordinates of the center position of the two objects to be detected that belong to the two frames of video images respectively. If the Euclidean distance between the two objects to be detected is small, then the two objects to be detected can be considered to be the same object to be detected.
  • the ratio of the object to be detected can be calculated The Euclidean distance of all objects to be detected in the previous frame of video image. If the Euclidean distance between the object to be detected and any object to be detected in the previous frame of video image is greater than the preset distance threshold, the object to be detected can be determined It is a new object to be detected in the current video image. It does not appear in the previous frame of video image, so it does not belong to the previous frame of video image. Otherwise, it can be determined that the object to be detected has appeared in the previous frame of video image and belongs to The previous video image.
  • all objects to be detected in the current video image and the previous video image are compared to determine the number of objects to be detected in the current video image that do not belong to the previous video image, specifically including: for the current video image Calculate the Euclidean distance between the object to be detected and all objects to be detected in the previous video image. If the minimum Euclidean distance between the object to be detected and each object to be detected in the previous video image is greater than the preset distance Threshold, the number of objects to be detected that do not belong to the previous video image in the current video image is increased by one.
  • Step 104 Based on the number of objects to be detected in the current video image that do not belong to the previous video image, increase the number of objects to be detected in the currently counted monitoring area.
  • the above steps 101 to 104 are performed for each frame of video image captured in the surveillance area to determine the number of new objects to be detected in each frame of video image compared to the previous frame of video image. This number is accumulated, so that the number of all objects to be detected in the entire monitoring area can be obtained.
  • the video always includes 10 frames of video images.
  • the next frame of video image is compared with the previous frame of video image, and the number of new objects to be detected
  • the numbers are: 10 (because there is no 0th frame of video image, therefore, the number of objects to be detected in the first frame of video image is the new object to be detected in the first frame of video image than the 0th frame of video image
  • the object counting method in the embodiment of the present invention has been described in detail above.
  • the present invention also provides an object counting device, which will be described in detail below with reference to FIG. 4.
  • FIG. 4 is a schematic structural diagram of an object detection device according to an embodiment of the present invention. As shown in FIG. 4, the device includes:
  • the acquiring unit 401 is configured to acquire a video image of the monitoring area
  • the recognition unit 402 is configured to identify and determine all objects to be detected in the current video image in combination with the recognition result of the acquired previous video image;
  • the comparing unit 403 is configured to compare all objects to be detected in the current video image and the previous video image, and determine the number of objects to be detected in the current video image that do not belong to the previous video image;
  • the statistical unit 404 is configured to increase the number of objects to be detected in the monitoring area currently counted based on the number of objects to be detected that do not belong to the previous video image in the current video image.
  • the identification unit 402 includes a detection subunit 4021 and a suppression subunit 4022;
  • the detection sub-unit 4021 is configured to combine the recognition result of the acquired previous video image and use the pre-trained R2CNN detection model to detect and determine the rectangular frame surrounding each object to be detected in the current video image;
  • the suppression subunit 4022 is configured to perform non-maximum suppression NMS on each rectangular frame surrounding each object to be detected in the current video image to obtain the recognition result of the current video image.
  • the detection subunit 4021 is used to combine the recognition result of the acquired previous video image and use the pre-trained R2CNN detection model to detect and determine the rectangular frame surrounding each object to be detected in the current video image, specifically:
  • the detection subunit 4021 is specifically configured to:
  • the region of interest pooling ROI Pooling algorithm to generate the image characteristics of each rectangular frame in the current video image, perform regression analysis on the image characteristics, and adjust the horizontal rectangular frame to an inclined rectangular frame according to the regression analysis result; the regression analysis result Including the translation and rotation angle information corresponding to the horizontal rectangular frame.
  • the detection sub-unit 4021 is used for superimposing the oblique rectangular frame identified from the previous video image and surrounding each object to be detected on the current video image, further used for:
  • the probability that the horizontal rectangular frame surrounding each object to be detected in the current video image hits the object to be detected is reduced by a preset probability threshold.
  • the comparison unit 403 is configured to compare all objects to be detected in the current video image and the previous video image, and determine the number of objects to be detected in the current video image that do not belong to the previous video image, specifically:
  • the comparison unit 403 is specifically configured to calculate the Euclidean distance between the object to be detected and all objects to be detected in the previous video image for each object to be detected in the current video image. If the object to be detected is compared with the previous video image If the minimum Euclidean distance in each object to be detected in is greater than the preset distance threshold, the number of objects to be detected that do not belong to the previous video image in the current video image is increased by one.
  • the comparison unit 403 is specifically configured to calculate the Euclidean distance of the two objects to be detected based on the coordinates of the center positions of the two objects to be detected in the respective video images.
  • the electronic device 500 includes: at least one processor 501, and a memory 502 connected to the at least one processor 501 through a bus; the memory 502 Stored with one or more computer programs that can be executed by the at least one processor 501; when the at least one processor 501 executes the one or more computer programs, the steps in the object detection method shown in FIG. 1 are implemented .
  • An embodiment of the present invention also provides a computer-readable storage medium that stores one or more computer programs, and when the one or more computer programs are executed by a processor, the object shown in FIG. 1 is realized Detection method.
  • An embodiment of the present invention also provides a chip for executing instructions.
  • the chip includes a memory and a processor.
  • the memory stores code and data.
  • the memory is coupled to the processor.
  • the processor runs in the memory.
  • the code enables the chip to be used to execute the steps of the object detection method shown in Figure 1 above.
  • the embodiment of the present invention also provides a program product containing instructions, when the program product runs on a computer, the computer executes the steps of the object detection method shown in FIG. 1.
  • the embodiment of the present invention also provides a computer program, when the computer program is executed by a processor, it is used to execute the steps of the object detection method shown in FIG. 1.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The present invention provides an object detection method and device. Said method comprises: acquiring a video image of a monitoring area; in view of a recognition result of an acquired previous video image, recognizing and determining all objects to be detected in a current video image; comparing the previous video image with all said objects in the current video image, and determining the number of said objects, which do not belong to the previous video image, in the current video image; on the basis of the number of said objects, which do not belong to the previous video image, in the current video image, increasing the number of said objects in the currently counted monitoring area. The present invention can detect all video images in the video of the monitoring area, thereby accurately recognizing and counting the number of said objects in the monitoring area.

Description

对象检测方法和装置Object detection method and device
本申请要求于2019年06月28日提交中国专利局、申请号为201910572201.1、申请名称为“一种对象检测方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on June 28, 2019, the application number is 201910572201.1, and the application name is "an object detection method and device", the entire content of which is incorporated into this application by reference .
技术领域Technical field
本发明涉及计算机视觉技术领域,特别涉及一种对象检测方法和装置。The present invention relates to the field of computer vision technology, in particular to an object detection method and device.
背景技术Background technique
图像分类、目标检测、及图像分割是计算机视觉领域的三大任务。图像分类模型是将图像划分为单个类别,通常对应于图像中最突出的物体。但是现实世界的很多图片通常包含不只一个物体,仅使用图像分类模型为图像分配一个单一标签其实是非常粗糙的,并不准确。对于这样的情况,可以使用目标检测模型识别一张图片的多个物体,并定位出识别的不同物体。Image classification, target detection, and image segmentation are three major tasks in the field of computer vision. The image classification model divides the image into a single category, which usually corresponds to the most prominent object in the image. However, many pictures in the real world usually contain more than one object. Using an image classification model to assign a single label to an image is actually very crude and inaccurate. For such a situation, the target detection model can be used to identify multiple objects in a picture and locate the identified different objects.
目标检测是当前计算机视觉领域的研究热点,从过去的十多年来看,图像的目标检测算法大体上可分为基于传统手工特征的时期以及基于深度学习的目标检测时期。Girshick等人提出了区域卷积网络目标检测框架(Regionswith CNN features,R-CNN)后,目标检测领域开始以前所未有的速度发展。Target detection is the current research hotspot in the field of computer vision. From the past ten years, the image target detection algorithm can be roughly divided into a period based on traditional manual features and a target detection period based on deep learning. After Girshick et al. proposed a regional convolutional network target detection framework (Regions with CNN features, R-CNN), the field of target detection began to develop at an unprecedented speed.
目标检测应用在了很多场景,如无人驾驶和安防系统,但尚不存在智能养殖场景中利用目标检测对视频中养殖对象进行检测的技术方案。Target detection is applied in many scenarios, such as unmanned driving and security systems, but there is no technical solution for using target detection to detect farmed objects in videos in intelligent farming scenarios.
发明内容Summary of the invention
有鉴于此,本发明的目的在于提供一种对象检测方法和装置,能够对监控区域的视频中所有视频图像进行检测,从而对监控区域中的待检测对象进行准确识别和数量统计。In view of this, the object of the present invention is to provide an object detection method and device, which can detect all video images in the video of the surveillance area, so as to accurately identify and count the objects to be detected in the surveillance area.
为了达到上述目的,本发明提供了如下技术方案:In order to achieve the above objective, the present invention provides the following technical solutions:
第一方面,本发明实施例提供的一种对象检测方法,包括:In the first aspect, an object detection method provided by an embodiment of the present invention includes:
获取监控区域的视频图像;Obtain video images of the surveillance area;
结合对获取的前一视频图像的识别结果,识别确定当前视频图像中的所有待检测对象;Combining the recognition result of the acquired previous video image, identify and determine all the objects to be detected in the current video image;
比对当前视频图像和前一视频图像中的所有待检测对象,确定当前视频图像中不属于前一视频图像的待检测对象的个数;Compare all objects to be detected in the current video image and the previous video image, and determine the number of objects to be detected in the current video image that do not belong to the previous video image;
基于当前视频图像中不属于前一视频图像的待检测对象的个数,增加当前统计的监控区域中的待检测对象的数量。Based on the number of objects to be detected in the current video image that do not belong to the previous video image, increase the number of objects to be detected in the currently counted monitoring area.
第二方面,本发明实施例提供的一种对象检测装置,包括:In the second aspect, an object detection device provided by an embodiment of the present invention includes:
获取单元,用于获取监控区域的视频图像;The acquiring unit is used to acquire the video image of the monitoring area;
识别单元,用于结合对获取的前一视频图像的识别结果,识别确定当前视频图像中的所有待检测对象;The recognition unit is used to identify and determine all the objects to be detected in the current video image in combination with the recognition result of the acquired previous video image;
比对单元,用于比对当前视频图像和前一视频图像中的所有待检测对象,确定当前视频图像中不属于前一视频图像的待检测对象的个数;The comparison unit is used to compare all objects to be detected in the current video image and the previous video image, and determine the number of objects to be detected in the current video image that do not belong to the previous video image;
统计单元,用于基于当前视频图像中不属于前一视频图像的待检测对象的个数,增加当前统计的监控区域中的待检测对象的数量。The statistical unit is configured to increase the number of objects to be detected in the monitoring area currently counted based on the number of objects to be detected that do not belong to the previous video image in the current video image.
第三方面,本发明实施例提供一种电子设备,包括:至少一个处理器,以及与所述至少一个处理器通过总线相连的存储器;所述存储器存储有可被所述至少一个处理器执行的一个或多个计算机程序;所述至少一个处理器执行所述一个或多个计算机程序时实现上述对象检测方法中的步骤。In a third aspect, an embodiment of the present invention provides an electronic device, including: at least one processor, and a memory connected to the at least one processor through a bus; the memory stores the memory that can be executed by the at least one processor One or more computer programs; when the at least one processor executes the one or more computer programs, the steps in the above object detection method are implemented.
第四方面,本发明实施例提供一种计算机可读存储介质,所述计算机可读存储介质存储一个或多个计算机程序,所述一个或多个计算机程序被处理器执行时实现上述对象检测方法。In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium that stores one or more computer programs, and when the one or more computer programs are executed by a processor, the foregoing object detection method is implemented .
第五方面,本发明实施例提供一种运行指令的芯片,所述芯片包括存储器、处理器,所述存储器中存储代码和数据,所述存储器与所述处理器耦合,所述处理器运行所述存储器中的代码使得所述芯片用于执行上述对象检测方法的步骤。In a fifth aspect, an embodiment of the present invention provides a chip that executes instructions. The chip includes a memory and a processor. The memory stores code and data. The memory is coupled with the processor. The code in the memory enables the chip to execute the steps of the object detection method described above.
第六方面,本发明实施例提供一种包含指令的程序产品,当所述程序产品在计算机上运行时,使得所述计算机执行上述对象检测方法步骤。In a sixth aspect, an embodiment of the present invention provides a program product containing instructions, when the program product runs on a computer, the computer executes the steps of the above object detection method.
第七方面,本发明实施例提供一种计算机程序,当所述计算机程序被处理器执行时,用于执行上述对象检测方法的步骤。In a seventh aspect, an embodiment of the present invention provides a computer program, when the computer program is executed by a processor, it is used to execute the steps of the above object detection method.
由上面的技术方案可知,本发明实施例提供的对象检测方法和装置,结 合对获取的前一视频图像的识别结果,对当前获取的视频图像中的待检测对象进行识别,将从前后两帧视频图像中识别的待检测对象进行比较,即可确定后一帧视频图像中新增的待检测对象个数,从而相应增加当前统计的监控区域中的待检测对象的数量。可以看出,本发明中,通过计算确定前后两帧视频图像中待检测对象的增量,可以准确识别和统计出整个监控区域中的待检测对象的数量。It can be seen from the above technical solutions that the object detection method and device provided by the embodiments of the present invention, in combination with the recognition result of the previous acquired video image, identify the object to be detected in the currently acquired video image, starting from the previous two frames By comparing the identified objects to be detected in the video image, the number of new objects to be detected in the next frame of video image can be determined, thereby correspondingly increasing the number of objects to be detected in the currently statistic monitoring area. It can be seen that in the present invention, by calculating and determining the increments of the objects to be detected in the two frames of video images before and after, the number of objects to be detected in the entire monitoring area can be accurately identified and counted.
附图说明Description of the drawings
图1是本发明实施例提供的对象检测方法的流程图;FIG. 1 is a flowchart of an object detection method provided by an embodiment of the present invention;
图2是本发明实施例一对视频图像的检测结果示意图;FIG. 2 is a schematic diagram of a detection result of a pair of video images according to an embodiment of the present invention;
图3是本发明实施例二对视频图像的检测结果示意图;FIG. 3 is a schematic diagram of a detection result of a video image in Embodiment 2 of the present invention;
图4是本发明实施例提供的对象检测装置的结构示意图;FIG. 4 is a schematic structural diagram of an object detection device provided by an embodiment of the present invention;
图5是本发明实施例电子设备的结构示意图。Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
具体实施方式Detailed ways
为了使本发明的目的、技术方案及优点更加清楚明白,下面结合附图并据实施例,对本发明的技术方案进行详细说明。In order to make the objectives, technical solutions, and advantages of the present invention clearer, the technical solutions of the present invention will be described in detail below with reference to the drawings and embodiments.
在智能养殖场景中,使用机器视觉统计养殖对象的数量,能够最大限度上减少人力资源的支出。本发明提供的技术方案可用于对智能养殖场景的养殖对象进行数量统计,以降低人工成本。In the intelligent breeding scenario, the use of machine vision to count the number of breeding objects can minimize the expenditure of human resources. The technical scheme provided by the present invention can be used to count the number of farming objects in the intelligent farming scene to reduce labor costs.
本发明主要通过对监控区域的监控视频进行分析,识别出监控区域中所有待检测对象并统计待检测对象的总数量。以下结合图1进行说明。The present invention mainly analyzes the surveillance video in the surveillance area, identifies all objects to be detected in the surveillance area and counts the total number of objects to be detected. The following description will be given in conjunction with FIG.
参见图1,图1是本发明实施例提供的对象检测方法的流程图,如图1所示,该方法主要包括以下步骤:Referring to Fig. 1, Fig. 1 is a flowchart of an object detection method provided by an embodiment of the present invention. As shown in Fig. 1, the method mainly includes the following steps:
步骤101、获取监控区域的视频图像。Step 101: Obtain a video image of a monitored area.
本发明中,使用可以移动的摄像头拍摄整个监控区域的视频,通过对视频中的视频图像逐帧进行检测和目标追踪,确定监控区域中的所有待检测对象。In the present invention, a movable camera is used to shoot a video of the entire monitoring area, and all the objects to be detected in the monitoring area are determined by detecting and tracking the video frames in the video frame by frame.
本步骤中,每次获取视频中的一帧视频图像,并结合对前一帧视频图像的识别结果,对当前获取的一帧视频图像进行检测和目标追踪。In this step, each frame of video image in the video is acquired, and combined with the recognition result of the previous frame of video image, detection and target tracking are performed on the currently acquired frame of video image.
步骤102、结合对获取的前一视频图像的识别结果,检测确定当前视频图像中的所有待检测对象。Step 102: Detect and determine all objects to be detected in the current video image in combination with the recognition result of the acquired previous video image.
在实际应用中,前后两帧视频图像之间的拍摄时间间隔很小,因此,其中包括的同一待检测对象的位置变化也很小。为了在当前视频图像中找出新出现的待检测对象,检测确定当前视频图像中的所有待检测对象时,可以将前一帧视频图像的识别结果叠加到当前视频图像中,从而保证相邻两帧视频图像中的待检测对象有较为准确的召回率。In practical applications, the shooting time interval between the previous and subsequent video images is very small, and therefore, the position of the same object to be detected included therein is also very small. In order to find new objects to be detected in the current video image, when detecting all objects to be detected in the current video image, the recognition result of the previous frame of video image can be superimposed on the current video image to ensure that two adjacent The object to be detected in the frame video image has a relatively accurate recall rate.
结合对获取的前一帧视频图像的识别结果,检测确定当前视频图像中的所有待检测对象,具体可以采用以下两个步骤实现:Combining the recognition result of the acquired previous frame of video image, the detection and determination of all objects to be detected in the current video image can be implemented in the following two steps:
S01、结合对获取的前一视频图像的识别结果,利用预先训练的R2CNN检测模型检测确定当前视频图像中包围每一待检测对象的矩形框;S01: Combining the recognition result of the acquired previous video image, use the pre-trained R2CNN detection model to detect and determine the rectangular frame surrounding each object to be detected in the current video image;
S02、对当前视频图像中包围每一待检测对象的各矩形框进行非极大值抑制NMS,得到对当前视频图像的识别结果。S02. Perform non-maximum suppression NMS on each rectangular frame surrounding each object to be detected in the current video image to obtain a recognition result of the current video image.
在实际应用中,上述步骤S01中,确定视频图像中包围每个待检测对象的矩形框,可以使用图像处理技术中的多种方法实现。In practical applications, in the above step S01, the determination of the rectangular frame surrounding each object to be detected in the video image can be implemented by using various methods in image processing technology.
本发明实施例中,利用旋转区域的卷积神经网络(Rotational Region Convolutional Neural Networks,R2CNN)技术确定视频图像中包围每个待检测对象的矩形框,具体地,可以预先使用多个待检测对象的训练样本进行训练得到R2CNN检测模型,之后就可以将该R2CNN检测模型用于本发明的对象检测过程中,具体是利用R2CNN检测模型确定视频图像中包围每个待检测对象的矩形框,即:将视频图像输入到R2CNN检测模型,R2CNN检测模型对输入的视频图像进行图像检测,即可输出视频图像中包围每个待检测对象的矩形框。In the embodiment of the present invention, the convolutional neural network (Rotational Region Convolutional Neural Networks, R2CNN) technology is used to determine the rectangular frame surrounding each object to be detected in the video image. Specifically, multiple objects to be detected can be used in advance. The training samples are trained to obtain the R2CNN detection model, and then the R2CNN detection model can be used in the object detection process of the present invention. Specifically, the R2CNN detection model is used to determine the rectangular frame surrounding each object to be detected in the video image, namely: The video image is input to the R2CNN detection model, and the R2CNN detection model performs image detection on the input video image, and then the rectangular frame surrounding each object to be detected in the video image can be output.
另外,为了保证相邻两帧视频图像中待检测对象的召回率,在确定视频图像中包围每个待检测对象的矩形框的过程中,还将对前一帧视频图像的识别结果叠加到当前视频图像中。In addition, in order to ensure the recall rate of the object to be detected in two adjacent frames of video images, in the process of determining the rectangular frame surrounding each object to be detected in the video image, the recognition result of the previous frame of video image is also superimposed on the current Video image.
因此,上述步骤S01的一种较佳实施方法如下:Therefore, a preferred implementation method of the above step S01 is as follows:
S011、利用候选区域网络RPN算法确定当前视频图像中包围每个待检测对象的水平矩形框;S011: Use the candidate area network RPN algorithm to determine a horizontal rectangular frame surrounding each object to be detected in the current video image;
S012、将从前一视频图像识别出的包围每个待检测对象的倾斜矩形框,叠加到当前视频图像中;S012: Superimpose the oblique rectangular frame identified from the previous video image and surrounding each object to be detected on the current video image;
S013、利用感兴趣区域池化ROI Pooling算法生成当前视频图像中每个矩形框的图像特征,对该图像特征进行回归分析,根据回归分析结果将该水平矩形框调整为倾斜矩形框;所述回归分析结果包括该水平矩形框对应的平移和旋转角度信息。S013. Use the ROI Pooling algorithm to generate the image feature of each rectangular frame in the current video image, perform regression analysis on the image feature, and adjust the horizontal rectangular frame to an inclined rectangular frame according to the regression analysis result; the regression The analysis result includes the translation and rotation angle information corresponding to the horizontal rectangular frame.
图2和图3分别示出了采用上述三个步骤对一帧视频图像的检测结果。Figures 2 and 3 respectively show the detection results of a frame of video image using the above three steps.
实际应用,上述步骤S01也可以采用其它方法实现,例如,先将对前一帧视频图像的识别结果叠加到当前视频图像中,再利用预先训练的R2CNN检测模型对当前视频图像进行图像检测,即按照步骤S012、S011、S013的顺序执行。又如,先利用预先训练的R2CNN检测模型对当前视频图像进行图像检测,再将对前一帧视频图像的识别结果叠加到对当前视频图像的识别结果中,即按照步骤S011、S013、S012的顺序执行。In practical applications, the above step S01 can also be implemented by other methods. For example, first superimpose the recognition result of the previous frame of video image into the current video image, and then use the pre-trained R2CNN detection model to perform image detection on the current video image, namely Follow the order of steps S012, S011, and S013. For another example, first use the pre-trained R2CNN detection model to perform image detection on the current video image, and then superimpose the recognition result of the previous frame of video image into the recognition result of the current video image, that is, follow the steps S011, S013, S012 Sequence execution.
上述步骤S011中,利用候选区域网络(RPN)算法确定当前视频图像中包围每个待检测对象的水平矩形框,主要是通过卷积算法提取不同尺度下的图像特征,其中既包括低级的边缘纹理特征,也包括高级的语义特征,通过将这两种特征融合起来,可以生成包围每个待检测对象的完整信息以及和当前视频图像边界平行的矩形框(称为水平矩形框)。In the above step S011, the candidate area network (RPN) algorithm is used to determine the horizontal rectangular frame surrounding each object to be detected in the current video image, and the convolution algorithm is mainly used to extract image features at different scales, including low-level edge textures. Features also include advanced semantic features. By fusing these two features together, it is possible to generate complete information surrounding each object to be detected and a rectangular box parallel to the boundary of the current video image (called a horizontal rectangular box).
现有大多数对活体检测的方法所检测的结果没有显示方向性,仅有水平或垂直方向的检测结果,而人工对养殖对象进行数数时,通常是俯视的视角,因此,在智能养殖这种实际生产场景中的活体检测,不同于普通目标的检测任务,除了框出养殖对象信息之外,还应添加面向任意方向场景的活体检测。Most of the existing methods for living body detection do not show directivity, and only have horizontal or vertical detection results. When manually counting farm objects, they usually have a bird’s-eye view. Therefore, in intelligent farming, This kind of live detection in the actual production scene is different from the detection task of ordinary targets. In addition to framing the culture object information, live detection for scenes in any direction should also be added.
为了充分识别待检测对象信息,上述步骤S013中,针对当前视频图像中的包围每个待检测对象的矩形框,可以通过感兴趣区域池化(ROI Pooling)算法进行图片信息检测,生成该矩形框的图像特征,然后对利用ROI Pooling算法生成的图像特征进行回归分析,得到的回归分析结果包括该矩形框对应的平移和旋转角度信息,此平移和旋转角度信息表明了需要对该矩形框进行的方向调整,是将水平矩形框调整为具有方向性的倾斜矩形框的依据。In order to fully identify the information of the object to be detected, in step S013 above, for the rectangular frame surrounding each object to be detected in the current video image, image information detection can be performed through the ROI Pooling algorithm to generate the rectangular frame Then perform regression analysis on the image features generated by the ROI Pooling algorithm. The regression analysis results obtained include the translation and rotation angle information corresponding to the rectangular frame. This translation and rotation angle information indicates that the rectangular frame needs to be Direction adjustment is the basis for adjusting the horizontal rectangular frame to the directional inclined rectangular frame.
另外,在执行上述步骤S012时,还可以对当前视频图像中的各个矩形框命中该矩形框包围的待统计对象的概率进行调整和设置,具体包括:In addition, when performing the above step S012, the probability that each rectangular frame in the current video image hits the object to be counted surrounded by the rectangular frame can also be adjusted and set, which specifically includes:
S0121、将叠加到当前视频图像中的倾斜矩形框命中该倾斜矩形框包围的待检测对象的概率设置为1;S0121, the probability that the inclined rectangular frame superimposed in the current video image hits the object to be detected surrounded by the inclined rectangular frame is set to 1;
S0122、将当前视频图像中包围每个待检测对象的水平矩形框命中该待检 测对象的概率减少预设概率阈值。S0122. Reduce the probability that the horizontal rectangular frame surrounding each object to be detected in the current video image hits the object to be detected by a preset probability threshold.
上述步骤S0121和S0122不分先后顺序,上述概率的设置可以影响上述步骤S013的执行结果,这属于R2CNN技术,不再详述。The above steps S0121 and S0122 are in no particular order. The setting of the above probability can affect the execution result of the above step S013, which belongs to the R2CNN technology and will not be described in detail.
步骤103、比对当前视频图像和前一视频图像中的所有待检测对象,确定当前视频图像中不属于前一视频图像的待检测对象的个数。Step 103: Compare all objects to be detected in the current video image and the previous video image, and determine the number of objects to be detected in the current video image that do not belong to the previous video image.
由于视频中的相邻两帧视频图像之间的拍摄时间间隔很小,其中包括的同一待检测对象的位置变化也很小。因此,可以基于分别属于前后两帧视频图像的两个待检测对象在各自所属视频图像的中心位置坐标,计算该两个待检测对象的欧式距离,如果该两个待检测对象之间的欧式距离较大,超过一定阈值,则可认为是不同的待检测对象,如果该两个待检测对象之间的欧式距离较小,则可认为该两个待检测对象是同一待检测对象。Since the shooting time interval between two adjacent frames of video images in the video is very small, the position of the same object to be detected included therein also changes very little. Therefore, the Euclidean distance of the two objects to be detected can be calculated based on the coordinates of the center position of the two objects to be detected that belong to the two frames of video images respectively. If the Euclidean distance between the two objects to be detected is small, then the two objects to be detected can be considered to be the same object to be detected.
基于上述判断分属于前后两帧视频图像的两个待检测对象是否为同一待检测对象的原理,本发明实施例中,对于当前视频图像中的每个待检测对象,可以计算该待检测对象与前一帧视频图像中的所有待检测对象的欧式距离,如果该待检测对象与前一帧视频图像中的任一待检测对象的欧式距离都大于预设距离阈值,则可以确定该待检测对象是当前视频图像中新出现的待检测对象,未在前一帧视频图像中出现,因此不属于前一帧视频图像,否则,可以确定该待检测对象在前一帧视频图像中已经出现,属于前一帧视频图像。Based on the foregoing principle of judging whether the two objects to be detected that belong to the two frames before and after the video image are the same object to be detected, in the embodiment of the present invention, for each object to be detected in the current video image, the ratio of the object to be detected can be calculated The Euclidean distance of all objects to be detected in the previous frame of video image. If the Euclidean distance between the object to be detected and any object to be detected in the previous frame of video image is greater than the preset distance threshold, the object to be detected can be determined It is a new object to be detected in the current video image. It does not appear in the previous frame of video image, so it does not belong to the previous frame of video image. Otherwise, it can be determined that the object to be detected has appeared in the previous frame of video image and belongs to The previous video image.
为此,本步骤中,比对当前视频图像和前一视频图像中的所有待检测对象,确定当前视频图像中不属于前一视频图像的待检测对象的个数,具体包括:针对当前视频图像中每个待检测对象,计算该待检测对象与前一视频图像中所有待检测对象的欧式距离,如果该待检测对象与前一视频图像中各待检测对象中的最小欧式距离大于预设距离阈值,则将当前视频图像中不属于前一视频图像的待检测对象的个数增加1。For this reason, in this step, all objects to be detected in the current video image and the previous video image are compared to determine the number of objects to be detected in the current video image that do not belong to the previous video image, specifically including: for the current video image Calculate the Euclidean distance between the object to be detected and all objects to be detected in the previous video image. If the minimum Euclidean distance between the object to be detected and each object to be detected in the previous video image is greater than the preset distance Threshold, the number of objects to be detected that do not belong to the previous video image in the current video image is increased by one.
步骤104、基于当前视频图像中不属于前一视频图像的待检测对象的个数,增加当前统计的监控区域中的待检测对象的数量。Step 104: Based on the number of objects to be detected in the current video image that do not belong to the previous video image, increase the number of objects to be detected in the currently counted monitoring area.
基于当前视频图像中不属于前一视频图像的待检测对象的个数,即当前视频图像与前一帧视频图像相比,新出现的待检测对象的个数。Based on the number of objects to be detected in the current video image that do not belong to the previous video image, that is, the number of objects to be detected newly appearing in the current video image compared with the previous frame of video image.
本发明中,对监控区域拍摄的视频中每一帧视频图像均执行以上步骤101至步骤104,以确定每一帧视频图像比前一帧视频图像新出现的待检测对象的个数,通过对此个数进行累计,从而可以得到整个监控区域中所有待检测对 象的数量。例如视频中总包括10帧视频图像,假设对第1-10帧视频图像均执行上述步骤101-步骤104得到后一帧视频图像与前一帧视频图像相比,新出现的待检测对象的个数分别为:10(由于不存在第0帧视频图像,因此,第1帧视频图像中的待检测对象的个数,即为第1帧视频图像比第0帧视频图像新出现的待检测对象的个数)、1、0、2、1、3、0、1、2、1,则通过累计计算,可以最终得到监控区域中的所有待检测对象的数量为10+1+0+2+1+3+0+1+2+1=21个。In the present invention, the above steps 101 to 104 are performed for each frame of video image captured in the surveillance area to determine the number of new objects to be detected in each frame of video image compared to the previous frame of video image. This number is accumulated, so that the number of all objects to be detected in the entire monitoring area can be obtained. For example, the video always includes 10 frames of video images. Assuming that the above steps 101 to 104 are performed on the first to 10 frames of video images, the next frame of video image is compared with the previous frame of video image, and the number of new objects to be detected The numbers are: 10 (because there is no 0th frame of video image, therefore, the number of objects to be detected in the first frame of video image is the new object to be detected in the first frame of video image than the 0th frame of video image The number of ), 1, 0, 2, 1, 3, 0, 1, 2, 1, then through cumulative calculation, the number of all objects to be detected in the monitoring area can be finally obtained as 10+1+0+2+ 1+3+0+1+2+1=21 pieces.
以上对本发明实施例对象统计方法进行了详细说明,本发明还提供了一种对象统计装置,以下结合图4进行详细说明。The object counting method in the embodiment of the present invention has been described in detail above. The present invention also provides an object counting device, which will be described in detail below with reference to FIG. 4.
参见图4,图4是本发明实施例对象检测装置的结构示意图,如图4所示,该装置包括:Refer to FIG. 4, which is a schematic structural diagram of an object detection device according to an embodiment of the present invention. As shown in FIG. 4, the device includes:
获取单元401,用于获取监控区域的视频图像;The acquiring unit 401 is configured to acquire a video image of the monitoring area;
识别单元402,用于结合对获取的前一视频图像的识别结果,识别确定当前视频图像中的所有待检测对象;The recognition unit 402 is configured to identify and determine all objects to be detected in the current video image in combination with the recognition result of the acquired previous video image;
比对单元403,用于比对当前视频图像和前一视频图像中的所有待检测对象,确定当前视频图像中不属于前一视频图像的待检测对象的个数;The comparing unit 403 is configured to compare all objects to be detected in the current video image and the previous video image, and determine the number of objects to be detected in the current video image that do not belong to the previous video image;
统计单元404,用于基于当前视频图像中不属于前一视频图像的待检测对象的个数,增加当前统计的监控区域中的待检测对象的数量。The statistical unit 404 is configured to increase the number of objects to be detected in the monitoring area currently counted based on the number of objects to be detected that do not belong to the previous video image in the current video image.
图4所示装置中,In the device shown in Figure 4,
所述识别单元402,包括检测子单元4021和抑制子单元4022;The identification unit 402 includes a detection subunit 4021 and a suppression subunit 4022;
所述检测子单元4021,用于结合对获取的前一视频图像的识别结果,利用预先训练的R2CNN检测模型检测确定当前视频图像中包围每一待检测对象的矩形框;The detection sub-unit 4021 is configured to combine the recognition result of the acquired previous video image and use the pre-trained R2CNN detection model to detect and determine the rectangular frame surrounding each object to be detected in the current video image;
所述抑制子单元4022,用于对当前视频图像中包围每一待检测对象的各矩形框进行非极大值抑制NMS,得到对当前视频图像的识别结果。The suppression subunit 4022 is configured to perform non-maximum suppression NMS on each rectangular frame surrounding each object to be detected in the current video image to obtain the recognition result of the current video image.
图4所示装置中,In the device shown in Figure 4,
所述检测子单元4021,用于结合对获取的前一视频图像的识别结果,利用预先训练的R2CNN检测模型检测确定当前视频图像中包围每一待检测对象的矩形框,具体为:The detection subunit 4021 is used to combine the recognition result of the acquired previous video image and use the pre-trained R2CNN detection model to detect and determine the rectangular frame surrounding each object to be detected in the current video image, specifically:
所述检测子单元4021,具体用于:The detection subunit 4021 is specifically configured to:
利用候选区域网络RPN算法确定当前视频图像中包围每个待检测对象的 水平矩形框;Use the candidate area network RPN algorithm to determine the horizontal rectangular frame surrounding each object to be detected in the current video image;
将从前一视频图像识别出的包围每个待检测对象的倾斜矩形框,叠加到当前视频图像中;Superimpose the oblique rectangular frame identified from the previous video image and surrounding each object to be detected on the current video image;
利用感兴趣区域池化ROI Pooling算法生成当前视频图像中每个矩形框的图像特征,对该图像特征进行回归分析,根据回归分析结果将该水平矩形框调整为倾斜矩形框;所述回归分析结果包括该水平矩形框对应的平移和旋转角度信息。Use the region of interest pooling ROI Pooling algorithm to generate the image characteristics of each rectangular frame in the current video image, perform regression analysis on the image characteristics, and adjust the horizontal rectangular frame to an inclined rectangular frame according to the regression analysis result; the regression analysis result Including the translation and rotation angle information corresponding to the horizontal rectangular frame.
图4所示装置中,In the device shown in Figure 4,
所述检测子单元4021,用于将从前一视频图像识别出的包围每个待检测对象的倾斜矩形框,叠加到当前视频图像中时,进一步用于:The detection sub-unit 4021 is used for superimposing the oblique rectangular frame identified from the previous video image and surrounding each object to be detected on the current video image, further used for:
将叠加到当前视频图像中的倾斜矩形框命中该倾斜矩形框包围的待检测对象的概率设置为1;Set the probability that the inclined rectangular frame superimposed on the current video image hits the object to be detected surrounded by the inclined rectangular frame to 1;
将当前视频图像中包围每个待检测对象的水平矩形框命中该待检测对象的概率减少预设概率阈值。The probability that the horizontal rectangular frame surrounding each object to be detected in the current video image hits the object to be detected is reduced by a preset probability threshold.
图4所示装置中,In the device shown in Figure 4,
所述比对单元403,用于比对当前视频图像和前一视频图像中的所有待检测对象,确定当前视频图像中不属于前一视频图像的待检测对象的个数,具体为:The comparison unit 403 is configured to compare all objects to be detected in the current video image and the previous video image, and determine the number of objects to be detected in the current video image that do not belong to the previous video image, specifically:
所述比对单元403,具体用于针对当前视频图像中每个待检测对象,计算该待检测对象与前一视频图像中所有待检测对象的欧式距离,如果该待检测对象与前一视频图像中各待检测对象中的最小欧式距离大于预设距离阈值,则将当前视频图像中不属于前一视频图像的待检测对象的个数增加1。The comparison unit 403 is specifically configured to calculate the Euclidean distance between the object to be detected and all objects to be detected in the previous video image for each object to be detected in the current video image. If the object to be detected is compared with the previous video image If the minimum Euclidean distance in each object to be detected in is greater than the preset distance threshold, the number of objects to be detected that do not belong to the previous video image in the current video image is increased by one.
图4所示装置中,In the device shown in Figure 4,
所述比对单元403,具体用于基于两个待检测对象在各自所属视频图像的中心位置坐标,计算该两个待检测对象的欧式距离。The comparison unit 403 is specifically configured to calculate the Euclidean distance of the two objects to be detected based on the coordinates of the center positions of the two objects to be detected in the respective video images.
本发明实施例还提供了一种电子设备,如图5所示,该电子设备500包括:至少一个处理器501,以及与所述至少一个处理器501通过总线相连的存储器502;所述存储器502存储有可被所述至少一个处理器501执行的一个或多个计算机程序;所述至少一个处理器501执行所述一个或多个计算机程序时实现如上述图1所示对象检测方法中的步骤。An embodiment of the present invention also provides an electronic device. As shown in FIG. 5, the electronic device 500 includes: at least one processor 501, and a memory 502 connected to the at least one processor 501 through a bus; the memory 502 Stored with one or more computer programs that can be executed by the at least one processor 501; when the at least one processor 501 executes the one or more computer programs, the steps in the object detection method shown in FIG. 1 are implemented .
本发明实施例还提供一种计算机可读存储介质,所述计算机可读存储介 质存储一个或多个计算机程序,所述一个或多个计算机程序被处理器执行时实现如上述图1所示对象检测方法。An embodiment of the present invention also provides a computer-readable storage medium that stores one or more computer programs, and when the one or more computer programs are executed by a processor, the object shown in FIG. 1 is realized Detection method.
本发明实施例还提供一种运行指令的芯片,所述芯片包括存储器、处理器,所述存储器中存储代码和数据,所述存储器与所述处理器耦合,所述处理器运行所述存储器中的代码使得所述芯片用于执行上述图1所示对象检测方法的步骤。An embodiment of the present invention also provides a chip for executing instructions. The chip includes a memory and a processor. The memory stores code and data. The memory is coupled to the processor. The processor runs in the memory. The code enables the chip to be used to execute the steps of the object detection method shown in Figure 1 above.
本发明实施例还提供一种包含指令的程序产品,当所述程序产品在计算机上运行时,使得所述计算机执行上述图1所示对象检测方法的步骤。The embodiment of the present invention also provides a program product containing instructions, when the program product runs on a computer, the computer executes the steps of the object detection method shown in FIG. 1.
本发明实施例还提供了一种计算机程序,当所述计算机程序被处理器执行时,用于执行上述图1所示对象检测方法的步骤。The embodiment of the present invention also provides a computer program, when the computer program is executed by a processor, it is used to execute the steps of the object detection method shown in FIG. 1.
以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明保护的范围之内。The above are only preferred embodiments of the present invention and are not intended to limit the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included in the present invention Within the scope of protection.

Claims (17)

  1. 一种对象检测方法,其特征在于,该方法包括:An object detection method, characterized in that the method includes:
    获取监控区域的视频图像;Obtain video images of the surveillance area;
    结合对获取的前一视频图像的识别结果,识别确定当前视频图像中的所有待检测对象;Combining the recognition result of the acquired previous video image, identify and determine all the objects to be detected in the current video image;
    比对当前视频图像和前一视频图像中的所有待检测对象,确定当前视频图像中不属于前一视频图像的待检测对象的个数;Compare all objects to be detected in the current video image and the previous video image, and determine the number of objects to be detected in the current video image that do not belong to the previous video image;
    基于当前视频图像中不属于前一视频图像的待检测对象的个数,增加当前统计的监控区域中的待检测对象的数量。Based on the number of objects to be detected in the current video image that do not belong to the previous video image, increase the number of objects to be detected in the currently counted monitoring area.
  2. 根据权利要求1所述的方法,其特征在于,所述结合对获取的前一视频图像的识别结果,识别并确定当前视频图像中的所有待检测对象,包括:The method according to claim 1, wherein the identifying and determining all objects to be detected in the current video image in combination with the recognition result of the acquired previous video image includes:
    结合对获取的前一视频图像的识别结果,利用预先训练的R2CNN检测模型检测确定当前视频图像中包围每一待检测对象的矩形框;Combining the recognition result of the acquired previous video image, use the pre-trained R2CNN detection model to detect and determine the rectangular frame surrounding each object to be detected in the current video image;
    对当前视频图像中包围每一待检测对象的各矩形框进行非极大值抑制NMS,得到对当前视频图像的识别结果。Non-maximum suppression NMS is performed on each rectangular frame surrounding each object to be detected in the current video image to obtain the recognition result of the current video image.
  3. 根据权利要求2所述的方法,其特征在于,所述结合对获取的前一视频图像的识别结果,利用预先训练的R2CNN检测模型检测确定当前视频图像中包围每一待检测对象的矩形框,包括:The method according to claim 2, wherein the combination of the recognition result of the acquired previous video image is used to detect and determine the rectangular frame surrounding each object to be detected in the current video image using a pre-trained R2CNN detection model, include:
    利用候选区域网络RPN算法确定当前视频图像中包围每个待检测对象的水平矩形框;Use the candidate area network RPN algorithm to determine the horizontal rectangular frame surrounding each object to be detected in the current video image;
    将从前一视频图像识别出的包围每个待检测对象的倾斜矩形框,叠加到当前视频图像中;Superimpose the oblique rectangular frame identified from the previous video image and surrounding each object to be detected on the current video image;
    利用感兴趣区域池化ROI Pooling算法生成当前视频图像中每个矩形框的图像特征,对该图像特征进行回归分析,根据回归分析结果将该水平矩形框调整为倾斜矩形框;所述回归分析结果包括该水平矩形框对应的平移和旋转角度信息。Use the region of interest pooling ROI Pooling algorithm to generate the image characteristics of each rectangular frame in the current video image, perform regression analysis on the image characteristics, and adjust the horizontal rectangular frame to an inclined rectangular frame according to the regression analysis result; the regression analysis result Including the translation and rotation angle information corresponding to the horizontal rectangular frame.
  4. 根据权利要求3所述的方法,其特征在于,The method according to claim 3, wherein:
    将从前一视频图像识别出的包围每个待检测对象的倾斜矩形框,叠加到当前视频图像中时,进一步包括:When superimposing the oblique rectangular frame identified from the previous video image and surrounding each object to be detected on the current video image, it further includes:
    将叠加到当前视频图像中的倾斜矩形框命中该倾斜矩形框包围的待检测 对象的概率设置为1;Set the probability that the inclined rectangular frame superimposed on the current video image hits the object to be detected surrounded by the inclined rectangular frame to 1;
    将当前视频图像中包围每个待检测对象的水平矩形框命中该待检测对象的概率减少预设概率阈值。The probability that the horizontal rectangular frame surrounding each object to be detected in the current video image hits the object to be detected is reduced by a preset probability threshold.
  5. 根据权利要求1-4任一项所述的方法,其特征在于,所述比对当前视频图像和前一视频图像中的所有待检测对象,确定当前视频图像中不属于前一视频图像的待检测对象的个数,包括:The method according to any one of claims 1 to 4, wherein the comparison of all the objects to be detected in the current video image and the previous video image to determine that the objects in the current video image do not belong to the previous video image The number of detected objects, including:
    针对当前视频图像中每个待检测对象,计算该待检测对象与前一视频图像中所有待检测对象的欧式距离,如果该待检测对象与前一视频图像中各待检测对象中的最小欧式距离大于预设距离阈值,则将当前视频图像中不属于前一视频图像的待检测对象的个数增加1。For each object to be detected in the current video image, calculate the Euclidean distance between the object to be detected and all objects to be detected in the previous video image. If the object to be detected is the smallest Euclidean distance between each object to be detected in the previous video image If the distance is greater than the preset distance threshold, the number of objects to be detected that do not belong to the previous video image in the current video image is increased by one.
  6. 根据权利要求5所述的方法,其特征在于,所述针对当前视频图像中每个待检测对象,计算该待检测对象与前一视频图像中所有待检测对象的欧式距离,包括:The method according to claim 5, wherein, for each object to be detected in the current video image, calculating the Euclidean distance between the object to be detected and all objects to be detected in the previous video image comprises:
    基于两个待检测对象在各自所属视频图像的中心位置坐标,计算该两个待检测对象的欧式距离。The Euclidean distance of the two objects to be detected is calculated based on the coordinates of the center positions of the two objects to be detected in the respective video images.
  7. 一种对象检测装置,其特征在于,该装置包括:An object detection device, characterized in that the device includes:
    获取单元,用于获取监控区域的视频图像;The acquiring unit is used to acquire the video image of the monitoring area;
    识别单元,用于结合对获取的前一视频图像的识别结果,识别确定当前视频图像中的所有待检测对象;The recognition unit is used to identify and determine all the objects to be detected in the current video image in combination with the recognition result of the acquired previous video image;
    比对单元,用于比对当前视频图像和前一视频图像中的所有待检测对象,确定当前视频图像中不属于前一视频图像的待检测对象的个数;The comparison unit is used to compare all objects to be detected in the current video image and the previous video image, and determine the number of objects to be detected in the current video image that do not belong to the previous video image;
    统计单元,用于基于当前视频图像中不属于前一视频图像的待检测对象的个数,增加当前统计的监控区域中的待检测对象的数量。The statistical unit is configured to increase the number of objects to be detected in the monitoring area currently counted based on the number of objects to be detected that do not belong to the previous video image in the current video image.
  8. 根据权利要求7所述的装置,其特征在于,The device according to claim 7, wherein:
    所述识别单元,包括检测子单元和抑制子单元;The identification unit includes a detection subunit and a suppression subunit;
    所述检测子单元,用于结合对获取的前一视频图像的识别结果,利用预先训练的R2CNN检测模型检测确定当前视频图像中包围每一待检测对象的矩形框;The detection subunit is used to combine the recognition result of the acquired previous video image and use a pre-trained R2CNN detection model to detect and determine the rectangular frame surrounding each object to be detected in the current video image;
    所述抑制子单元,用于对当前视频图像中包围每一待检测对象的各矩形框进行非极大值抑制NMS,得到对当前视频图像的识别结果。The suppression subunit is used to perform non-maximum suppression NMS on each rectangular frame surrounding each object to be detected in the current video image to obtain the recognition result of the current video image.
  9. 根据权利要求8所述的装置,其特征在于,The device according to claim 8, wherein:
    所述检测子单元,用于结合对获取的前一视频图像的识别结果,利用预先训练的R2CNN检测模型检测确定当前视频图像中包围每一待检测对象的矩形框,具体为:The detection subunit is used to combine the recognition result of the acquired previous video image and use the pre-trained R2CNN detection model to detect and determine the rectangular frame surrounding each object to be detected in the current video image, specifically:
    所述检测子单元,具体用于:The detection subunit is specifically used for:
    利用候选区域网络RPN算法确定当前视频图像中包围每个待检测对象的水平矩形框;Use the candidate area network RPN algorithm to determine the horizontal rectangular frame surrounding each object to be detected in the current video image;
    将从前一视频图像识别出的包围每个待检测对象的倾斜矩形框,叠加到当前视频图像中;Superimpose the oblique rectangular frame identified from the previous video image and surrounding each object to be detected on the current video image;
    利用感兴趣区域池化ROI Pooling算法生成当前视频图像中每个矩形框的图像特征,对该图像特征进行回归分析,根据回归分析结果将该水平矩形框调整为倾斜矩形框;所述回归分析结果包括该水平矩形框对应的平移和旋转角度信息。Use the region of interest pooling ROI Pooling algorithm to generate the image characteristics of each rectangular frame in the current video image, perform regression analysis on the image characteristics, and adjust the horizontal rectangular frame to an inclined rectangular frame according to the regression analysis result; the regression analysis result Including the translation and rotation angle information corresponding to the horizontal rectangular frame.
  10. 根据权利要求9所述的装置,其特征在于,The device according to claim 9, wherein:
    所述检测子单元,用于将从前一视频图像识别出的包围每个待检测对象的倾斜矩形框,叠加到当前视频图像中时,进一步用于:The detection subunit is used for superimposing the oblique rectangular frame that surrounds each object to be detected, which is recognized from the previous video image, on the current video image, and is further used for:
    将叠加到当前视频图像中的倾斜矩形框命中该倾斜矩形框包围的待检测对象的概率设置为1;Set the probability that the inclined rectangular frame superimposed on the current video image hits the object to be detected surrounded by the inclined rectangular frame to 1;
    将当前视频图像中包围每个待检测对象的水平矩形框命中该待检测对象的概率减少预设概率阈值。The probability that the horizontal rectangular frame surrounding each object to be detected in the current video image hits the object to be detected is reduced by a preset probability threshold.
  11. 根据权利要求7-10任一项所述的装置,其特征在于,The device according to any one of claims 7-10, wherein:
    所述比对单元,用于比对当前视频图像和前一视频图像中的所有待检测对象,确定当前视频图像中不属于前一视频图像的待检测对象的个数,具体为:The comparison unit is configured to compare all objects to be detected in the current video image and the previous video image, and determine the number of objects to be detected in the current video image that do not belong to the previous video image, specifically:
    所述比对单元,具体用于针对当前视频图像中每个待检测对象,计算该待检测对象与前一视频图像中所有待检测对象的欧式距离,如果该待检测对象与前一视频图像中各待检测对象中的最小欧式距离大于预设距离阈值,则将当前视频图像中不属于前一视频图像的待检测对象的个数增加1。The comparison unit is specifically configured to calculate the Euclidean distance between the object to be detected and all objects to be detected in the previous video image for each object to be detected in the current video image. If the object to be detected is If the minimum Euclidean distance in each object to be detected is greater than the preset distance threshold, the number of objects to be detected that do not belong to the previous video image in the current video image is increased by one.
  12. 根据权利要求11所述的装置,其特征在于,The device according to claim 11, wherein:
    所述比对单元,具体用于基于两个待检测对象在各自所属视频图像的中心位置坐标,计算该两个待检测对象的欧式距离。The comparison unit is specifically configured to calculate the Euclidean distance of the two objects to be detected based on the coordinates of the center positions of the two objects to be detected in their respective video images.
  13. 一种电子设备,包括:至少一个处理器,以及与所述至少一个处理 器通过总线相连的存储器;所述存储器存储有可被所述至少一个处理器执行的一个或多个计算机程序;其特征在于,所述至少一个处理器执行所述一个或多个计算机程序时实现权利要求1-6任一权项所述的方法步骤。An electronic device, comprising: at least one processor, and a memory connected to the at least one processor through a bus; the memory stores one or more computer programs that can be executed by the at least one processor; and features It is that when the at least one processor executes the one or more computer programs, the method steps described in any one of claims 1 to 6 are implemented.
  14. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储一个或多个计算机程序,所述一个或多个计算机程序被处理器执行时实现权利要求1-6中任一项所述的方法。A computer-readable storage medium, wherein the computer-readable storage medium stores one or more computer programs, and when the one or more computer programs are executed by a processor, any one of claims 1 to 6 is realized The method described.
  15. 一种运行指令的芯片,其特征在于,所述芯片包括存储器、处理器,所述存储器中存储代码和数据,所述存储器与所述处理器耦合,所述处理器运行所述存储器中的代码使得所述芯片用于执行上述权利要求1-6任一项所述的方法。A chip for running instructions, wherein the chip includes a memory and a processor, the memory stores code and data, the memory is coupled to the processor, and the processor runs the code in the memory The chip is used to execute the method according to any one of claims 1-6.
  16. 一种包含指令的程序产品,其特征在于,当所述程序产品在计算机上运行时,使得所述计算机执行上述权利要求1-6任一项所述的方法。A program product containing instructions, characterized in that, when the program product is run on a computer, the computer is caused to execute the method according to any one of claims 1-6.
  17. 一种计算机程序,其特征在于,当所述计算机程序被处理器执行时,用于执行上述权利要求1-6任一项所述的方法。A computer program, characterized in that, when the computer program is executed by a processor, it is used to execute the method according to any one of claims 1-6.
PCT/CN2020/083515 2019-06-28 2020-04-07 Object detection method and device WO2020258978A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910572201.1 2019-06-28
CN201910572201.1A CN110287907B (en) 2019-06-28 2019-06-28 Object detection method and device

Publications (1)

Publication Number Publication Date
WO2020258978A1 true WO2020258978A1 (en) 2020-12-30

Family

ID=68019378

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/083515 WO2020258978A1 (en) 2019-06-28 2020-04-07 Object detection method and device

Country Status (2)

Country Link
CN (1) CN110287907B (en)
WO (1) WO2020258978A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113936256A (en) * 2021-10-15 2022-01-14 北京百度网讯科技有限公司 Image target detection method, device, equipment and storage medium
CN115115825A (en) * 2022-05-27 2022-09-27 腾讯科技(深圳)有限公司 Method and device for detecting object in image, computer equipment and storage medium

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110287907B (en) * 2019-06-28 2020-11-03 北京海益同展信息科技有限公司 Object detection method and device
CN110838134B (en) * 2019-10-10 2020-09-29 北京海益同展信息科技有限公司 Target object statistical method and device, computer equipment and storage medium
CN111080697B (en) * 2019-10-29 2024-04-09 京东科技信息技术有限公司 Method, apparatus, computer device and storage medium for detecting direction of target object
CN111753766B (en) * 2020-06-28 2024-08-27 平安科技(深圳)有限公司 Image processing method, device, equipment and medium
CN113627403B (en) * 2021-10-12 2022-03-08 深圳市安软慧视科技有限公司 Method, system and related equipment for selecting and pushing picture

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107316462A (en) * 2017-08-30 2017-11-03 济南浪潮高新科技投资发展有限公司 A kind of flow statistical method and device
CN108932496A (en) * 2018-07-03 2018-12-04 北京佳格天地科技有限公司 The quantity statistics method and device of object in region
US20190050667A1 (en) * 2017-03-10 2019-02-14 TuSimple System and method for occluding contour detection
CN110287907A (en) * 2019-06-28 2019-09-27 北京海益同展信息科技有限公司 A kind of method for checking object and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104281837B (en) * 2014-09-26 2018-05-25 哈尔滨工业大学深圳研究生院 With reference to Kalman filtering and the adjacent widened pedestrian tracting methods of interframe ROI
US10460470B2 (en) * 2017-07-06 2019-10-29 Futurewei Technologies, Inc. Recognition and reconstruction of objects with partial appearance
CN108062548B (en) * 2017-11-03 2020-11-03 中国科学院计算技术研究所 Braille square self-adaptive positioning method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190050667A1 (en) * 2017-03-10 2019-02-14 TuSimple System and method for occluding contour detection
CN107316462A (en) * 2017-08-30 2017-11-03 济南浪潮高新科技投资发展有限公司 A kind of flow statistical method and device
CN108932496A (en) * 2018-07-03 2018-12-04 北京佳格天地科技有限公司 The quantity statistics method and device of object in region
CN110287907A (en) * 2019-06-28 2019-09-27 北京海益同展信息科技有限公司 A kind of method for checking object and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHEN, YU: "The Hierarchical Coordination Control Strategy of AC-DC Micro-Network Hybrid Energy Storage System", CHINA DOCTORAL DISSERTATIONS FULL-TEXT DATABASE, no. 1, 15 January 2010 (2010-01-15), DOI: 20200623142532 *
YINGYING JIANG ET AL.: "R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection", "ARXIV:1706.09579 V2", 30 June 2017 (2017-06-30), XP055608197, DOI: 20200623142736 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113936256A (en) * 2021-10-15 2022-01-14 北京百度网讯科技有限公司 Image target detection method, device, equipment and storage medium
CN115115825A (en) * 2022-05-27 2022-09-27 腾讯科技(深圳)有限公司 Method and device for detecting object in image, computer equipment and storage medium
CN115115825B (en) * 2022-05-27 2024-05-03 腾讯科技(深圳)有限公司 Method, device, computer equipment and storage medium for detecting object in image

Also Published As

Publication number Publication date
CN110287907B (en) 2020-11-03
CN110287907A (en) 2019-09-27

Similar Documents

Publication Publication Date Title
WO2020258978A1 (en) Object detection method and device
US10769480B2 (en) Object detection method and system
US20220051000A1 (en) Method and apparatus for detecting face key point, computer device and storage medium
CN107909600B (en) Unmanned aerial vehicle real-time moving target classification and detection method based on vision
US9478039B1 (en) Background modeling and foreground extraction method based on depth image
CN108388885B (en) Multi-person close-up real-time identification and automatic screenshot method for large live broadcast scene
WO2020252974A1 (en) Method and device for tracking multiple target objects in motion state
WO2020094088A1 (en) Image capturing method, monitoring camera, and monitoring system
TWI679612B (en) Image tracking method
CN104376575B (en) A kind of pedestrian counting method and device based on multi-cam monitoring
US10922531B2 (en) Face recognition method
WO2019076187A1 (en) Video blocking region selection method and apparatus, electronic device, and system
CN109034247B (en) Tracking algorithm-based higher-purity face recognition sample extraction method
TWI798815B (en) Target re-identification method, device, and computer readable storage medium
CN110781785A (en) Traffic scene pedestrian detection method improved based on fast RCNN algorithm
US20190096066A1 (en) System and Method for Segmenting Out Multiple Body Parts
CN113255608B (en) Multi-camera face recognition positioning method based on CNN classification
Miller et al. Person tracking in UAV video
WO2020258977A1 (en) Object counting method and device
CN113255549A (en) Intelligent recognition method and system for pennisseum hunting behavior state
CN111160123B (en) Aircraft target identification method, device and storage medium
CN108875488A (en) Method for tracing object, object tracking device and computer readable storage medium
CN111325198B (en) Video object feature extraction method and device, and video object matching method and device
Wang et al. Multi-object tracking via multi-attention
WO2016136214A1 (en) Identifier learning device, remaining object detection system, identifier learning method, remaining object detection method, and program recording medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20831388

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 18.02.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20831388

Country of ref document: EP

Kind code of ref document: A1