WO2016061724A1 - All-weather video monitoring method based on deep learning - Google Patents

All-weather video monitoring method based on deep learning Download PDF

Info

Publication number
WO2016061724A1
WO2016061724A1 PCT/CN2014/088901 CN2014088901W WO2016061724A1 WO 2016061724 A1 WO2016061724 A1 WO 2016061724A1 CN 2014088901 W CN2014088901 W CN 2014088901W WO 2016061724 A1 WO2016061724 A1 WO 2016061724A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
sampling
map
velocity
statistical
Prior art date
Application number
PCT/CN2014/088901
Other languages
French (fr)
Chinese (zh)
Inventor
黄凯奇
康运锋
曹黎俊
张旭
Original Assignee
中国科学院自动化研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院自动化研究所 filed Critical 中国科学院自动化研究所
Priority to PCT/CN2014/088901 priority Critical patent/WO2016061724A1/en
Publication of WO2016061724A1 publication Critical patent/WO2016061724A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • the invention belongs to the field of pattern recognition technology, and particularly relates to an all-weather video monitoring method based on deep learning, which is especially suitable for analyzing the state of a large traffic crowd.
  • the crowd analysis based on intelligent video surveillance is to analyze the behavior of moving objects in a specific monitoring scene, and can describe its behavior rules, so as to realize the automatic detection of abnormal events by using machine intelligence, and also learn to establish related behavior models for public Provide space for design, intelligent environment, etc.
  • the intelligent monitoring system plays a minor role in all-weather monitoring.
  • Convolutional neural networks as a deep learning method, are a multilayer perceptron specially designed for 2D image processing. It has some advantages that traditional technology does not have: good fault tolerance, parallel processing and self-learning ability, can handle environmental information replication, background knowledge is unclear, and the problem of unclear inference rules allows for large defects and distortions. It has fast running speed, good adaptive performance and high resolution. Therefore, the convolutional neural network can solve the problem in the all-weather monitoring, and can ensure the high stability accuracy of the intelligent monitoring system under various conditions.
  • a depth learning-based all-weather video monitoring method proposed by the present invention includes the following steps:
  • Step 1 acquiring a video stream in real time, and obtaining a plurality of original sample map samples and a velocity sample map sample by line sampling based on the obtained video stream;
  • Step 2 performing time and space correction on the obtained sample of the velocity sampling map
  • Step 3 Obtain a deep learning model based on the original sampling map and the velocity sampling map, and the deep learning model includes a classification model and a statistical model;
  • Step 4 Perform the crowd state analysis on the real-time video stream by using the deep learning model obtained in step 3.
  • the invention Compared with the latest methods at home and abroad, the invention has several obvious advantages: 1) good adaptability to different environments, light intensity, weather conditions and camera angles; 2) gushing out for large traffic crowds If the crowd is crowded, high accuracy can be guaranteed. 3) The amount of calculation is small, which can meet the requirements of real-time video processing.
  • FIG. 1 is a flow chart of a method for monitoring all-weather video based on deep learning according to the present invention
  • Figure 2 is a schematic illustration of the geometric correction of the present invention.
  • the main points of the present invention are: 1) the behavior of a person entering and leaving the door (or virtual door), which can convert the dynamic behavior into a static picture by fixed position sampling to facilitate the analysis of the crowd; 2) the method of perspective correction and speed correction Guaranteed under different camera angle settings High accuracy; 3)
  • the deep learning model helps to automatically find the most effective features, and ensures the stability of the population state analysis in different scenarios by serial multi-features. The technical details involved in the present invention are explained below.
  • the flowchart of the all-weather video monitoring method based on deep learning is shown in FIG. 1 .
  • the depth learning-based all-weather video monitoring method includes the following steps:
  • Step 1 acquiring a video stream in real time, and obtaining a plurality of original sample map samples and a velocity sample map sample by line sampling based on the obtained video stream;
  • the pixel covered by the calibration line since the width of the calibration line is n pixels, each time a sampling is completed, n is obtained.
  • the speed sampling map is a pedestrian motion pattern.
  • the present invention uses different channels of the RBG to indicate different directions of motion of the pedestrian: wherein the R channel and the G channel represent pixel points of two different motion directions, and the B channel represents pixels having no motion.
  • the optical velocity method is used to calculate the velocity Speed (F t (l n )) and the motion direction Orient (F t of each pixel covered by the corresponding calibration line). (l n )), based on the calculated motion direction values of the pixel points, after similar accumulation of the same fixed time interval t, the velocity sampling pattern I s is obtained .
  • the crowd information in the video stream for a period of time can pass the original sampling map and speed Degree sampling map is obtained, namely:
  • F t (l n ) represents the pixel point covered by the calibration line l n in the image frame F at time t
  • Orient(F t (l n )) represents the calibration line l n covered in the image frame F at time t
  • Step 2 Perform time and space correction on the obtained sample of the velocity sampling map to ensure a higher accuracy of the final population state analysis
  • the projection of the scene on the image plane will have a more serious perspective phenomenon, that is, the same object, which looks close to the camera and looks away from the camera, needs to contribute to different pixels on the image plane.
  • Perform weighting processing In the present invention, it is assumed that the ground is a plane and the person is perpendicular to the ground.
  • FIG. 2 is a schematic diagram of geometric correction of the present invention.
  • XOY is an image coordinate system
  • p 1 p 2 p 3 p 4 is a coordinate of four points in the world coordinate system, assuming p 1 p 2 and p 3 p 4
  • y and y r are the reference lines of different objects
  • y v is the extinction point reference line
  • ⁇ W and ⁇ H are the length and width representation of the object at p 3 p 4
  • ⁇ Wr and ⁇ Hr are p
  • the length and width of the object at 1 p 2 means that, as shown in Fig.
  • the speed sampling map is time corrected by calculating the velocity of the pixel covered by the calibration line by the optical flow method, and the correction coefficient is expressed as:
  • N s is the standard speed values
  • speed an embodiment of the present invention is taken as 1 pixel / frame
  • Speed (F t (l n )) represents the time t image frame F calibration line pixel l n covered in size.
  • I' s I s *S C (x,y)*S(F t (l n )).
  • Step 3 Obtain a deep learning model based on the original sampling map and the velocity sampling map, and the deep learning model includes a classification model and a statistical model;
  • the velocity sampling sample can be used to train the classification model.
  • the velocity sampling map can be divided into four according to the walking direction of the person in the velocity sampling sample.
  • the other is a statistical model, which obtains a statistical model by training samples in and out of the original sampled map and the velocity sampling map, thereby obtaining the ratio of the total number of people in the original sampling map and the number of entering persons in the velocity sampling map, wherein The statistical model is divided into two types.
  • One is a model for counting the total number of people in the original sampling map, which is called the statistical population quantity model, and the other is the model for the proportion of the population entering the population in the speed sampling graph.
  • the two statistical models are trained using the same convolutional neural network. the same. After obtaining the classification model and the statistical model, by integrating the two types of model information, the cumulative quantity information of the entering and leaving population within a certain period of time can be obtained.
  • the convolutional neural network of the statistical model constructed by an embodiment of the present invention adopts a 9-layer network structure, including an input layer, five convolutional layers, namely: C1 to C5, two fully connected layers F6 and F7, and an output layer O8.
  • the network structure is constructed first, and the weight of the network is initialized with different small random numbers.
  • the small random number is generally in the range of [-1, 1], and the bias initialization is set to 0.
  • the input layer target image is I, and the size is different.
  • the image input to the first convolution layer is two images: the size normalized image of the target image, and the image of the size normalized image flipped left and right, in the present invention.
  • the normalized size is 224*224.
  • the convolution layer includes a convolution operation and a downsampling operation, where:
  • the convolution operation uses two convolution kernels to perform two-dimensional convolution on the input image, plus the offset, and then through the nonlinear excitation function, that is, the convolution result is obtained.
  • n represents the number of layers
  • S represents the number of neurons in the nth layer
  • w ij represents the convolution of the i-th input image and the j-th output image
  • the size of the C1 layer convolution kernel is 11*11
  • C2 The size of the layer convolution kernel is 5*5
  • the size of the C3, C4, and C5 convolution kernels is 3*3
  • ⁇ i is the threshold (offset) of the jth output image
  • the downsampling operation uses the stochastic pooling sampling method, namely:
  • R t is the sampling window size of the downsampling layer.
  • the downsampling layer sampling window size is set to 2*2
  • I j is an element value in the sampling window.
  • k is the number of cells in the output layer
  • ⁇ k is the threshold (offset) of the output cell
  • 1 is the number of cells in F7
  • V tk is the convolution of the output connected to the fully connected layer
  • f(*) is the softmax function.
  • the gradient descent method is used to inversely adjust the weights and thresholds of each layer of the neural network.
  • the statistical error function used is:
  • d denotes the corresponding target vector, ie the label of the velocity sample map or the original sample map sample
  • O k is the output of the deep learning network
  • m is the total number of samples.
  • the classification model also uses a convolutional neural network, and the velocity sampling map is also used as a sample to train the classification model.
  • the number of categories of the classification model is 4, so the established network depth does not need to be too deep.
  • the number of selected network layers is 6 layers, including an input layer, 3 convolution layers, 1 fully connected layer, and an output layer.
  • the input layer directly normalizes the RGB velocity sample map sample to 96*96 and then inputs it to the first layer of the convolution layer without any processing.
  • the training of classification models is also initialized using random data. Among them, the training method of the forward propagation stage and the training method of the reverse propagation stage are the same as those of the statistical model, and will not be described here. The difference is that the convolution kernels of the three convolutional layers in the classification model are both It is 5*5.
  • the classification model obtained from the final training can be used for the classification of velocity sampling maps.
  • Step 4 Perform the crowd state analysis on the real-time video stream by using the deep learning model obtained in step 3.
  • the step 4 further includes the following steps:
  • Step 41 Similar to the step 1, acquiring a plurality of original sampling images and a velocity sampling map based on the real-time video stream;
  • the real-time video stream is sampled to obtain the original sample map of the pixels at the virtual gate in the image frame, and the speed of the pixel at the corresponding position of the virtual gate in the original sample map is calculated by the optical flow method. And accumulate the calculated velocity into a velocity sample map.
  • Step 42 is similar to the step 2, and the speed sampling map obtained in the step 41 is separately corrected in time and space to ensure a high accuracy of the population state analysis.
  • Step 43 Perform classification on the speed sampling map by using a classification model in the deep learning model, and determine a category to which the speed sampling map belongs;
  • Step 44 Perform, according to the category to which the speed sampling map belongs, the population information in the original sampling map by using a statistical model in the deep learning model;
  • the step selects a corresponding statistical model according to the classification result of the classification model to perform population state statistics, for example, for the category of no-in and out of the speed sampling map, the population quantity is zero; for the speed sampling diagram, only the outgoing and only Enter the category, use the statistical population model in the statistical model to count the number of people; for the categories in the speed sampling chart, use the statistics in the statistical model to get the proportion of the number of people entering the population. And combined with the statistics of the number of people obtained by the statistical population model, the number of people entering and leaving is finally obtained.
  • Step 45 Integrate the crowd information corresponding to the plurality of original sampling images to obtain accurate crowd information in the corresponding time period of the real-time video stream.
  • the quantity information of the inbound and outbound people in the corresponding time period of the real-time video stream can be separately accumulated, and the accumulated population of the inbound and outbound people in the time period can be obtained.
  • the purpose of video warning can be achieved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

An all-weather video monitoring method based on deep learning. The method includes the following steps: collecting a video stream in real time, and obtaining multiple original sampling graph samples and speed sampling graph samples through line sampling on the basis of the obtained video stream; carrying out space-time correction on the obtained speed sampling graph samples; on the basis of original sampling graphs and speed sampling graphs, carrying out off-line training to obtain a deep learning model, wherein the deep learning model includes a classification model and a counting model; and carrying out a crowd state analysis on the real-time video stream by means of the obtained deep learning model. The method is well adaptive to different environments, illumination intensities, weather situations and camera angles. Higher accuracy can be guaranteed in terms of crowding environments, such as rushing out of mass flow crowds. A calculated amount is small, requirements for real-time video processing can be met, and the method can be widely applied to monitoring and management of public places, such as buses, subways and squares where stranded crowds are dense.

Description

[根据细则37.2由ISA制定的发明名称] 一种基于深度学习的全天候视频监控方法[Name of invention by ISA according to Rule 37.2] An all-weather video surveillance method based on deep learning 技术领域Technical field
本发明属于模式识别技术领域,特别涉及一种基于深度学习的全天候视频监控方法,尤其适用于大流量人群状态的分析。The invention belongs to the field of pattern recognition technology, and particularly relates to an all-weather video monitoring method based on deep learning, which is especially suitable for analyzing the state of a large traffic crowd.
背景技术Background technique
目前我国城市化水平已超过50%,大量流动人口的涌入使城市人口的密度越来越大,大规模人群活动变得日益频繁,由于人流拥挤踩踏而发生重大事故的现象屡见不鲜。因此,如何对人群进行监控管理,在群体性事件发生的初期进行主动识别和及时预警,成为当前各个国家视频监控领域的研究热点之一。为了更好地对群体性异常事件进行识别和预警,从而减少灾难的发生,实时对人群规模变化的掌握是一个关键的因素。基于智能视频监控的人群分析,是对特定监控场景内的运动物进行行为分析,可以对其行为规律作出描述,从而实现利用机器智能进行异常事件自动检测,还可以学习建立相关行为模型,为公共空间设计、智能环境等提供参考。然而,由于监控场景的不同、摄像机安装角度的差异、天气以及日照强度的变化,使得智能监控系统在全天候监控时,发挥作用甚小。At present, the level of urbanization in China has exceeded 50%. The influx of a large number of floating population has made the density of urban populations larger and larger, and large-scale crowd activities have become more frequent. It is not uncommon for major accidents to occur due to crowded people. Therefore, how to monitor and manage the crowd and actively identify and timely alert in the early stage of mass incidents has become one of the research hotspots in the field of video surveillance in various countries. In order to better identify and alert group abnormal events and reduce disasters, real-time control of population size changes is a key factor. The crowd analysis based on intelligent video surveillance is to analyze the behavior of moving objects in a specific monitoring scene, and can describe its behavior rules, so as to realize the automatic detection of abnormal events by using machine intelligence, and also learn to establish related behavior models for public Provide space for design, intelligent environment, etc. However, due to different monitoring scenarios, differences in camera installation angles, weather and changes in sunshine intensity, the intelligent monitoring system plays a minor role in all-weather monitoring.
卷积神经网络,作为一种深度学习方法,是为二维图像处理而特别设计的一个多层感知器。它具有一些传统技术所没有的优点:良好的容错能力、并行处理能力和自学能力,可处理环境信息复制,背景知识不清楚,推理规则不明确情况下的问题,允许有较大的缺损、畸变,运行速度快,自适应性能好,具有较高的分辨能力。因此,卷积神经网络,可以解决全天候监控中的问题,可以保证智能监控系统在各种情况下的较高的稳定的准确率。 Convolutional neural networks, as a deep learning method, are a multilayer perceptron specially designed for 2D image processing. It has some advantages that traditional technology does not have: good fault tolerance, parallel processing and self-learning ability, can handle environmental information replication, background knowledge is unclear, and the problem of unclear inference rules allows for large defects and distortions. It has fast running speed, good adaptive performance and high resolution. Therefore, the convolutional neural network can solve the problem in the all-weather monitoring, and can ensure the high stability accuracy of the intelligent monitoring system under various conditions.
发明内容Summary of the invention
本发明的目的是提供一种基于深度学习的全天候视频监控方法,可以全天候的分析视频中人群状态,尤其是人群的数量。It is an object of the present invention to provide an all-weather video monitoring method based on deep learning, which can analyze the state of a crowd in a video, especially the number of people, in an all-weather manner.
为了实现上述目的,本发明提出的一种基于深度学习的全天候视频监控方法包括以下步骤:In order to achieve the above object, a depth learning-based all-weather video monitoring method proposed by the present invention includes the following steps:
步骤1,实时采集视频流,基于得到的视频流通过线采样获得多幅原始采样图样本,以及速度采样图样本;Step 1: acquiring a video stream in real time, and obtaining a plurality of original sample map samples and a velocity sample map sample by line sampling based on the obtained video stream;
步骤2,对于得到的速度采样图样本进行时空矫正;Step 2: performing time and space correction on the obtained sample of the velocity sampling map;
步骤3,基于原始采样图和速度采样图,离线训练得到深度学习模型,所述深度学习模型包括分类模型和统计模型;Step 3: Obtain a deep learning model based on the original sampling map and the velocity sampling map, and the deep learning model includes a classification model and a statistical model;
步骤4,利用所述步骤3得到的深度学习模型对于实时视频流进行人群状态分析。Step 4: Perform the crowd state analysis on the real-time video stream by using the deep learning model obtained in step 3.
本发明与目前国内外最新方法相比具有几个明显的优点:1)对不同环境、光照强度、天气情况以及摄像头角度的不同设置,均具有良好的适应性;2)对大流量人群涌出等人群拥挤环境,可以保证较高的准确率;3)计算量小,可以满足实时视频处理的要求。Compared with the latest methods at home and abroad, the invention has several obvious advantages: 1) good adaptability to different environments, light intensity, weather conditions and camera angles; 2) gushing out for large traffic crowds If the crowd is crowded, high accuracy can be guaranteed. 3) The amount of calculation is small, which can meet the requirements of real-time video processing.
附图说明DRAWINGS
图1是本发明基于深度学习的全天候视频监控方法的流程图;1 is a flow chart of a method for monitoring all-weather video based on deep learning according to the present invention;
图2是本发明几何校正的示意图。Figure 2 is a schematic illustration of the geometric correction of the present invention.
具体实施方式detailed description
为使本发明的目的、技术方案和优点更加清楚明白,以下结合具体实施例,并参照附图,对本发明进一步详细说明。The present invention will be further described in detail below with reference to the specific embodiments of the invention.
本发明的思想要点是:1)人进出门(或虚拟门)行为,可以通过固定位置采样将动态行为转换成静态图片,以方便人群的分析;2)通过透视矫正以及速度矫正,使得该方法在不同摄像头角度设置下保证较 高的准确率;3)深度学习模型有助于自动发现最有效的特征,并通过串联多特征保证人群状态分析的准确率在不同场景下的稳定性。下面对于本发明中所涉及到的技术细节予以说明。The main points of the present invention are: 1) the behavior of a person entering and leaving the door (or virtual door), which can convert the dynamic behavior into a static picture by fixed position sampling to facilitate the analysis of the crowd; 2) the method of perspective correction and speed correction Guaranteed under different camera angle settings High accuracy; 3) The deep learning model helps to automatically find the most effective features, and ensures the stability of the population state analysis in different scenarios by serial multi-features. The technical details involved in the present invention are explained below.
本发明基于深度学习的全天候视频监控方法的流程图如图1所示,如图1所示,所述基于深度学习的全天候视频监控方法包括以下步骤:The flowchart of the all-weather video monitoring method based on deep learning is shown in FIG. 1 . As shown in FIG. 1 , the depth learning-based all-weather video monitoring method includes the following steps:
步骤1,实时采集视频流,基于得到的视频流通过线采样获得多幅原始采样图样本,以及速度采样图样本;Step 1: acquiring a video stream in real time, and obtaining a plurality of original sample map samples and a velocity sample map sample by line sampling based on the obtained video stream;
在本发明一实施例中,为了统计的方便,首先,对于所述视频流中的每帧图像,在行人进出门的位置处,设置一个宽度固定为n像素(在本发明一实施例中,n=3)、长度覆盖整个门的标定线ln,作为人进出的虚拟门界限,其中,所述标定线的位置根据视频场景中需要统计人数的位置而定,其可以是任意角度,优选为与门的长度方向垂直,比如,如果门正对着摄像头,则标定线可设置为横向放置,如果门与摄像头的拍摄方向垂直,则标定线可设置为纵向放置;然后,提取所述视频流中每隔f(在本发明一实施例中,f=2)帧的图像F中所述标定线覆盖的像素,由于标定线的宽度是n像素,因此每完成一次采样,就会得到n行的像素数据,经过固定时间间隔t(在本发明一实施例中,t=300帧),采样得到的所有像素累积组成原始采样图像I,进而对于视频流可以得到多幅原始采样图样本。在本发明一实施例中,按照时间采样的顺序,将采样得到的每行图像像素数据,由上到下按行填充,得到原始采样图像I。In an embodiment of the present invention, for statistical convenience, first, for each frame of the video stream, a width is fixed to n pixels at a position where the pedestrian enters and exits the door (in an embodiment of the present invention, n=3), the length covering the calibration line l n of the entire door, as a virtual door boundary of the person entering and leaving, wherein the position of the calibration line is determined according to the position of the number of people in the video scene, which may be any angle, preferably It is perpendicular to the length direction of the door. For example, if the door is facing the camera, the calibration line can be set to be placed horizontally. If the door is perpendicular to the shooting direction of the camera, the calibration line can be set to be placed vertically; then, the video is extracted. In the image F of every f in the stream (in an embodiment of the present invention, f=2), the pixel covered by the calibration line, since the width of the calibration line is n pixels, each time a sampling is completed, n is obtained. The pixel data of the line is subjected to a fixed time interval t (in one embodiment of the present invention, t=300 frames), and all the pixels obtained by sampling are accumulated to constitute the original sample image I, thereby obtaining more for the video stream. A sample of the original sampled image. In an embodiment of the invention, the sampled pixel data of each line is filled in a row from top to bottom in the order of time sampling to obtain an original sample image I.
所述速度采样图为行人运动方向图,本发明中,行人的运动方向有两种可能,即在垂直于标定线的方向上,向标定线的两侧行走。因此,在速度采样图中,本发明使用RBG不同的通道表示行人不同的运动方向:其中,R通道和G通道表示两个不同运动方向的像素点,B通道表示没有运动的像素点。具体地,当对于所述视频流进行采样得到原始采样图像的同时,使用光流法计算相应标定线覆盖的每个像素点的速度Speed(Ft(ln))与运动方向Orient(Ft(ln)),基于计算得到的像素点的运动方向值,经过类似的相同固定时间间隔t的累积,得到速度采样图IsThe speed sampling map is a pedestrian motion pattern. In the present invention, there are two possibilities for the pedestrian's moving direction, that is, walking to both sides of the calibration line in a direction perpendicular to the calibration line. Therefore, in the velocity sampling map, the present invention uses different channels of the RBG to indicate different directions of motion of the pedestrian: wherein the R channel and the G channel represent pixel points of two different motion directions, and the B channel represents pixels having no motion. Specifically, while the video stream is sampled to obtain the original sample image, the optical velocity method is used to calculate the velocity Speed (F t (l n )) and the motion direction Orient (F t of each pixel covered by the corresponding calibration line). (l n )), based on the calculated motion direction values of the pixel points, after similar accumulation of the same fixed time interval t, the velocity sampling pattern I s is obtained .
由上,一段时间的视频流中的人群信息,可以通过原始采样图和速 度采样图获得,即:From the above, the crowd information in the video stream for a period of time, can pass the original sampling map and speed Degree sampling map is obtained, namely:
I(n*t%3/3)=Ft(ln),I(n*t%3/3)=F t (l n ),
Is(n*t%3/3)=Orient(Ft(ln)),I s (n*t%3/3)=Orient(F t (l n )),
其中,
Figure PCTCN2014088901-appb-000001
among them,
Figure PCTCN2014088901-appb-000001
其中,Ft(ln)表示时间t时,图像帧F中标定线ln覆盖的像素点,Orient(Ft(ln))表示时间t时,图像帧F中标定线ln覆盖的像素点的运动方向,%表示取余操作。Where F t (l n ) represents the pixel point covered by the calibration line l n in the image frame F at time t, and Orient(F t (l n )) represents the calibration line l n covered in the image frame F at time t The direction of movement of the pixel, and % indicates the remainder operation.
步骤2,对于得到的速度采样图样本进行时空矫正,以保证最终人群状态分析较高的准确率;Step 2: Perform time and space correction on the obtained sample of the velocity sampling map to ensure a higher accuracy of the final population state analysis;
(1)对于所述速度采样图样本进行空间矫正;(1) performing spatial correction on the velocity sample map sample;
由于相机安装角度的不同,场景在图像平面上的投影会存在比较严重的透视现象,即同样的物体,离摄像机近看起来大,离摄像机远看起来小时,需要对图像平面上不同像素的贡献进行加权处理。本发明中,假设地面是平面,而人垂直于地面。Due to the different installation angles of the camera, the projection of the scene on the image plane will have a more serious perspective phenomenon, that is, the same object, which looks close to the camera and looks away from the camera, needs to contribute to different pixels on the image plane. Perform weighting processing. In the present invention, it is assumed that the ground is a plane and the person is perpendicular to the ground.
图2是本发明几何校正的示意图,图2中,XOY为图像坐标系,p1p2p3p4为世界坐标系中的四个点坐标,假设在p1p2和p3p4处各有一个3D物体,且相同大小,y与yr为不同物体的参考线,yv为消逝点参考线,ΔW与ΔH为p3p4处物体的长宽表示,ΔWr与ΔHr为p1p2处物体的长宽表示,如图2所示,设消逝点Pv的坐标为(xv,yv),参考线为y=yr=H/2,其中,H为3D物体的高度,则图像平面上任意一个像素I(x,y)的几何贡献因子表示为:2 is a schematic diagram of geometric correction of the present invention. In FIG. 2, XOY is an image coordinate system, and p 1 p 2 p 3 p 4 is a coordinate of four points in the world coordinate system, assuming p 1 p 2 and p 3 p 4 There is a 3D object and the same size, y and y r are the reference lines of different objects, y v is the extinction point reference line, ΔW and ΔH are the length and width representation of the object at p 3 p 4 , ΔWr and ΔHr are p The length and width of the object at 1 p 2 means that, as shown in Fig. 2, the coordinates of the evanescent point P v are (x v , y v ), and the reference line is y=y r = H/2, where H is a 3D object. The height of the geometric contribution factor of any pixel I(x, y) on the image plane is expressed as:
Figure PCTCN2014088901-appb-000002
Figure PCTCN2014088901-appb-000002
(2)对于所述速度采样图样本进行时间矫正;(2) performing time correction on the sample of the velocity sampling map;
由于人的运动速度不同,因此会造成行人在所述速度采样图中显示出的高矮不同或胖瘦不同,这样会影响对于人群分析的准确率,因此需要对于所述速度采样图进行时间矫正。 Because people's speed of movement is different, it will cause pedestrians to show different heights or fats and thinness in the speed sampling map, which will affect the accuracy of the analysis of the crowd, so it is necessary to time correct the speed sampling map.
在本发明一实施例中,通过由光流法计算出标定线覆盖的像素点的速度对于所述速度采样图进行时间矫正,矫正系数表示为:In an embodiment of the invention, the speed sampling map is time corrected by calculating the velocity of the pixel covered by the calibration line by the optical flow method, and the correction coefficient is expressed as:
S(Ft(ln))=Speed(Ft(ln))/NsS(F t (l n ))=Speed(F t (l n ))/N s ,
其中,Ns为标准速度值,在本发明一实施例中取为1像素/帧,Speed(Ft(ln))表示时间t时图像帧F中标定线ln覆盖的像素点的速度大小。Wherein, N s is the standard speed values, speed an embodiment of the present invention is taken as 1 pixel / frame, Speed (F t (l n )) represents the time t image frame F calibration line pixel l n covered in size.
经过上述空间和时间矫正后的速度采样图I’s表示为:After the above space and time correction, the velocity sampling map I 's is expressed as:
I’s=Is*SC(x,y)*S(Ft(ln))。I' s =I s *S C (x,y)*S(F t (l n )).
步骤3,基于原始采样图和速度采样图,离线训练得到深度学习模型,所述深度学习模型包括分类模型和统计模型;Step 3: Obtain a deep learning model based on the original sampling map and the velocity sampling map, and the deep learning model includes a classification model and a statistical model;
在人群状态分析模型中,深度学习模型有两种,一种是分类模型,通过速度采样图样本可以训练得到分类模型,比如可根据速度采样图样本中人的行走方向将速度采样图分为四个类别:速度采样图中只有进入的人、速度采样图中只有出的人、速度采样图中有进有出的人、速度采样图中无人进出,以方便统计进出虚拟门的人群信息;另一种是统计模型,通过原始采样图样本和速度采样图中有进有出的样本训练得到统计模型,从而获得原始采样图中的总人数和速度采样图中的进入人数的比例,其中,统计模型又分为两种,一种是统计原始采样图中人群总人数的模型,称为统计人群数量模型,另一种是统计有进有出的速度采样图中进入人群所占比例的模型,称为统计进出人群模型,在本发明一实施例中,这两种统计模型使用相同的卷积神经网络,训练过程相同。得到所述分类模型和统计模型后,综合两种模型信息,就可以获得一定时间段内进出人群的累积数量信息。In the crowd state analysis model, there are two types of deep learning models. One is the classification model. The velocity sampling sample can be used to train the classification model. For example, the velocity sampling map can be divided into four according to the walking direction of the person in the velocity sampling sample. Category: In the speed sampling chart, only the entering person, only the person in the speed sampling chart, the person in the speed sampling chart, and the speed sampling chart are not in and out, in order to facilitate statistics of the crowd information entering and leaving the virtual door; The other is a statistical model, which obtains a statistical model by training samples in and out of the original sampled map and the velocity sampling map, thereby obtaining the ratio of the total number of people in the original sampling map and the number of entering persons in the velocity sampling map, wherein The statistical model is divided into two types. One is a model for counting the total number of people in the original sampling map, which is called the statistical population quantity model, and the other is the model for the proportion of the population entering the population in the speed sampling graph. In the embodiment of the present invention, the two statistical models are trained using the same convolutional neural network. the same. After obtaining the classification model and the statistical model, by integrating the two types of model information, the cumulative quantity information of the entering and leaving population within a certain period of time can be obtained.
(1)统计模型的训练(1) Training of statistical models
本发明一实施例构建的统计模型的卷积神经网络采用9层网络结构,包括输入层,5个卷积层,即:C1~C5,2个全连接层F6和F7以及输出层O8。模型训练的初期先构建网络结构,同时对于网络的权值采用不同的小随机数进行初始化,所述小随机数一般介于[-1,1]范围内,偏置初始化置为0。The convolutional neural network of the statistical model constructed by an embodiment of the present invention adopts a 9-layer network structure, including an input layer, five convolutional layers, namely: C1 to C5, two fully connected layers F6 and F7, and an output layer O8. At the beginning of the model training, the network structure is constructed first, and the weight of the network is initialized with different small random numbers. The small random number is generally in the range of [-1, 1], and the bias initialization is set to 0.
A)前向传播阶段 A) Forward propagation phase
输入层目标图像为I,大小各异,输入到第一个卷积层的图像为两幅:所述目标图像的大小归一化图、大小归一化图左右翻转的图像,在本发明一实施例中,归一化大小为224*224。卷积层包括卷积操作以及下采样操作,其中:The input layer target image is I, and the size is different. The image input to the first convolution layer is two images: the size normalized image of the target image, and the image of the size normalized image flipped left and right, in the present invention. In the embodiment, the normalized size is 224*224. The convolution layer includes a convolution operation and a downsampling operation, where:
卷积操作是使用多个卷积核对输入图像进行二维卷积,加上偏置,再通过非线性激励函数实现,即得到卷积结果
Figure PCTCN2014088901-appb-000003
The convolution operation uses two convolution kernels to perform two-dimensional convolution on the input image, plus the offset, and then through the nonlinear excitation function, that is, the convolution result is obtained.
Figure PCTCN2014088901-appb-000003
Figure PCTCN2014088901-appb-000004
Figure PCTCN2014088901-appb-000004
其中,n代表层数,S代表第n层的神经元数,wij表示连接第i个输入图像和第j个输出图像的卷积,其中C1层卷积核的大小为11*11,C2层卷积核的大小为5*5,C3、C4和C5层卷积核的大小为3*3,φi是第j个输出图像的阈值(偏置),f(*)为ReLU函数:f(x)=max(x,0);Where n represents the number of layers, S represents the number of neurons in the nth layer, and w ij represents the convolution of the i-th input image and the j-th output image, wherein the size of the C1 layer convolution kernel is 11*11, C2 The size of the layer convolution kernel is 5*5, and the size of the C3, C4, and C5 convolution kernels is 3*3, φ i is the threshold (offset) of the jth output image, and f(*) is the ReLU function: f(x)=max(x,0);
下采样操作采用stochastic pooling采样方法,即:The downsampling operation uses the stochastic pooling sampling method, namely:
Figure PCTCN2014088901-appb-000005
Figure PCTCN2014088901-appb-000005
其中,t表示第t个输出图像,
Figure PCTCN2014088901-appb-000006
Rt为下采样层的采样窗口大小,在本发明一实施例中,下采样层采样窗口大小均设为2*2,Ij为采样窗口中的一元素值。
Where t represents the tth output image,
Figure PCTCN2014088901-appb-000006
R t is the sampling window size of the downsampling layer. In an embodiment of the invention, the downsampling layer sampling window size is set to 2*2, and I j is an element value in the sampling window.
对全连接层F6和F7进行全连接操作后,计算得到输出层O8的实际输出Ok为:After the full connection operation of the full connection layers F6 and F7, the actual output O k of the output layer O8 is calculated as:
Figure PCTCN2014088901-appb-000007
Figure PCTCN2014088901-appb-000007
其中,k为输出层的单元数,θk为输出单元的阈值(偏置),1为F7的单元数,Vtk为连接全连接层的输出的卷积,f(*)为softmax函数。Where k is the number of cells in the output layer, θ k is the threshold (offset) of the output cell, 1 is the number of cells in F7, V tk is the convolution of the output connected to the fully connected layer, and f(*) is the softmax function.
B)反向传播阶段B) Back propagation phase
反向传播阶段采用梯度下降法反向调整神经网络各层的权值和阈值,其中,所使用的统计误差函数为:In the backpropagation phase, the gradient descent method is used to inversely adjust the weights and thresholds of each layer of the neural network. The statistical error function used is:
Figure PCTCN2014088901-appb-000008
Figure PCTCN2014088901-appb-000008
其中,d表示对应目标矢量,即速度采样图或原始采样图样本的标签,Ok为深度学习网络的输出,m为样本总数。 Where d denotes the corresponding target vector, ie the label of the velocity sample map or the original sample map sample, O k is the output of the deep learning network, and m is the total number of samples.
当E<ε时,其中,ε为预先设置的最小误差参数,训练结束,并将得到的各层权值和阈值保存下来。When E < ε, where ε is the preset minimum error parameter, the training ends, and the obtained layer weights and thresholds are saved.
这时所述统计模型的卷积神经网络结构的各个参数已经稳定。At this time, the parameters of the convolutional neural network structure of the statistical model have been stabilized.
(2)分类模型的训练(2) Training of classification models
分类模型同样使用卷积神经网络,同样将速度采样图作为样本来训练分类模型,在本发明一实施例中,所述分类模型的类别数为4类,因此建立的网络深度不需要太深,该实施例中,选择网络层数为6层,包括输入层、3个卷积层、1个全连接层,以及输出层。输入层不做任何处理直接将RGB速度采样图样本归一化为96*96后输入到第一层卷积层。与统计模型的训练相同,分类模型的训练也使用随机数据进行初始化。其中,前向传播阶段训练方式,以及反向传播阶段训练方式与统计模型的训练方式均相同,此处不再赘述,不同的地方在于:分类模型中3个卷积层的卷积核大小均为5*5。最终训练得到的分类模型可以用于速度采样图的分类。The classification model also uses a convolutional neural network, and the velocity sampling map is also used as a sample to train the classification model. In an embodiment of the invention, the number of categories of the classification model is 4, so the established network depth does not need to be too deep. In this embodiment, the number of selected network layers is 6 layers, including an input layer, 3 convolution layers, 1 fully connected layer, and an output layer. The input layer directly normalizes the RGB velocity sample map sample to 96*96 and then inputs it to the first layer of the convolution layer without any processing. As with the training of statistical models, the training of classification models is also initialized using random data. Among them, the training method of the forward propagation stage and the training method of the reverse propagation stage are the same as those of the statistical model, and will not be described here. The difference is that the convolution kernels of the three convolutional layers in the classification model are both It is 5*5. The classification model obtained from the final training can be used for the classification of velocity sampling maps.
步骤4,利用所述步骤3得到的深度学习模型对于实时视频流进行人群状态分析。Step 4: Perform the crowd state analysis on the real-time video stream by using the deep learning model obtained in step 3.
所述步骤4进一步包括以下步骤:The step 4 further includes the following steps:
步骤41,与所述步骤1类似,基于所述实时视频流获取多幅原始采样图以及速度采样图;Step 41: Similar to the step 1, acquiring a plurality of original sampling images and a velocity sampling map based on the real-time video stream;
与所述步骤1类似,该步骤中,将对实时视频流采样得到图像帧中虚拟门处的像素累积得到原始采样图,并使用光流法计算原始采样图中虚拟门相应位置处像素的速度,并将计算得到的速度累积成速度采样图。Similar to the step 1, in this step, the real-time video stream is sampled to obtain the original sample map of the pixels at the virtual gate in the image frame, and the speed of the pixel at the corresponding position of the virtual gate in the original sample map is calculated by the optical flow method. And accumulate the calculated velocity into a velocity sample map.
步骤42,与所述步骤2类似,对于所述步骤41得到的速度采样图分别进行时空矫正,以保证人群状态分析较高的准确率。Step 42 is similar to the step 2, and the speed sampling map obtained in the step 41 is separately corrected in time and space to ensure a high accuracy of the population state analysis.
步骤43,利用所述深度学习模型中的分类模型对于所述速度采样图分别进行分类,判断得到所述速度采样图所属的类别;Step 43: Perform classification on the speed sampling map by using a classification model in the deep learning model, and determine a category to which the speed sampling map belongs;
使用所述深度学习模型中的分类模型对于所述速度采样图进行分类,得到所述速度采样图所属的类别:速度采样图中只有进入的人、速度采样图中只有出去的人、速度采样图中有进有出的人、速度采样图中 无人进出。Using the classification model in the deep learning model to classify the velocity sampling map, and obtain the category to which the velocity sampling graph belongs: only the entering person in the velocity sampling graph, only the outgoing person in the velocity sampling graph, and the velocity sampling graph There are people in and out, speed sampling chart No one is in or out.
步骤44,根据所述速度采样图所属的类别,使用所述深度学习模型中的统计模型分别分析原始采样图中的人群信息;Step 44: Perform, according to the category to which the speed sampling map belongs, the population information in the original sampling map by using a statistical model in the deep learning model;
具体地,该步骤根据所述分类模型的分类结果选择相应的统计模型进行人群状态统计,比如对于速度采样图中无人进出的类别,人群数量统计为零;对于速度采样图中只有出去和只有进入的类别,使用所述统计模型中的统计人群数量模型统计人群数量;对于速度采样图中有进有出的类别,使用所述统计模型中的统计进出人群模型统计得到进入人数所占的比例,并结合所述统计人群数量模型得到的人群数量统计结果,最终分别获得进入和出去的人数。Specifically, the step selects a corresponding statistical model according to the classification result of the classification model to perform population state statistics, for example, for the category of no-in and out of the speed sampling map, the population quantity is zero; for the speed sampling diagram, only the outgoing and only Enter the category, use the statistical population model in the statistical model to count the number of people; for the categories in the speed sampling chart, use the statistics in the statistical model to get the proportion of the number of people entering the population. And combined with the statistics of the number of people obtained by the statistical population model, the number of people entering and leaving is finally obtained.
步骤45,对于多幅原始采样图对应的人群信息进行整合,获得所述实时视频流对应时段内的精确人群信息。Step 45: Integrate the crowd information corresponding to the plurality of original sampling images to obtain accurate crowd information in the corresponding time period of the real-time video stream.
根据所述统计模型和分类模型的判断结果,可以分别累积得到所述实时视频流对应时段内进出人群的数量信息,进而可以获得该时段内累积的进出人群规模。通过检测人群规模的异常,可以达到视频预警的目的。According to the judgment result of the statistical model and the classification model, the quantity information of the inbound and outbound people in the corresponding time period of the real-time video stream can be separately accumulated, and the accumulated population of the inbound and outbound people in the time period can be obtained. By detecting the abnormality of the population size, the purpose of video warning can be achieved.
以上所述的具体实施例,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本发明的具体实施例而已,并不用于限制本发明,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。 The specific embodiments of the present invention have been described in detail, and are not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and scope of the present invention are intended to be included within the scope of the present invention.

Claims (10)

  1. 一种基于深度学习的全天候视频监控方法,其特征在于,该方法包括以下步骤:An all-weather video monitoring method based on deep learning, characterized in that the method comprises the following steps:
    步骤1,实时采集视频流,基于得到的视频流通过线采样获得多幅原始采样图样本,以及速度采样图样本;Step 1: acquiring a video stream in real time, and obtaining a plurality of original sample map samples and a velocity sample map sample by line sampling based on the obtained video stream;
    步骤2,对于得到的速度采样图样本进行时空矫正;Step 2: performing time and space correction on the obtained sample of the velocity sampling map;
    步骤3,基于原始采样图和速度采样图,离线训练得到深度学习模型,所述深度学习模型包括分类模型和统计模型;Step 3: Obtain a deep learning model based on the original sampling map and the velocity sampling map, and the deep learning model includes a classification model and a statistical model;
    步骤4,利用所述步骤3得到的深度学习模型对于实时视频流进行人群状态分析。Step 4: Perform the crowd state analysis on the real-time video stream by using the deep learning model obtained in step 3.
  2. 根据权利要求1所述的方法,其特征在于,所述步骤1进一步包括以下步骤:The method of claim 1 wherein said step 1 further comprises the step of:
    首先,对于所述视频流中的每帧图像,在行人进出门的位置处,设置一个宽度固定为n像素、长度覆盖整个门的标定线ln,作为人进出的虚拟门界限;First, for each frame of the video stream, at a position where the pedestrian enters and exits the door, a calibration line l n having a width of n pixels and a length covering the entire door is set as a virtual door boundary for the person to enter and exit;
    然后,提取所述视频流中每隔f帧的图像F中所述标定线覆盖的像素,每经过固定时间间隔t,采样得到的所有像素组成原始采样图像I;Then, the pixels covered by the calibration line in the image F of every f frame in the video stream are extracted, and each pixel that is sampled after a fixed time interval t constitutes the original sample image I;
    在采样标定线覆盖的像素时,使用光流法计算每个像素的速度与运动方向,每经过固定时间间隔t,采样得到的所有像素的运动方向组成速度采样图。When sampling the pixels covered by the calibration line, the optical flow method is used to calculate the velocity and the moving direction of each pixel. For each fixed time interval t, the motion direction of all the pixels sampled constitutes a velocity sampling map.
  3. 根据权利要求1所述的方法,其特征在于,所述速度采样图中,使用RBG不同的通道表示行人不同的运动方向,其中,R通道和G通道表示两个不同运动方向的像素点,B通道表示没有运动的像素点。The method according to claim 1, wherein in the velocity sampling map, different channels of the RBG are used to indicate different moving directions of the pedestrian, wherein the R channel and the G channel represent pixels of two different moving directions, B The channel represents a pixel that has no motion.
  4. 根据权利要求1所述的方法,其特征在于,所述步骤2中,利用图像平面上不同像素的贡献对于所述速度采样图样本进行空间矫正,利用不同像素点的速度值对于所述速度采样图进行时间矫正。The method according to claim 1, wherein in the step 2, the velocity sample map samples are spatially corrected by using contributions of different pixels on the image plane, and the velocity samples are sampled by using velocity values of different pixel points. The figure performs time correction.
  5. 根据权利要求4所述的方法,其特征在于,经过空间和时间矫正后的速度采样图I’s表示为: The method of claim 4 wherein the spatially and time corrected velocity sample map I 's is represented as:
    I’s=Is*SC(x,y)*S(Ft(ln)),I' s =I s *S C (x,y)*S(F t (l n )),
    其中,Is表示空间和时间矫正前的速度采样图,SC(x,y)表示图像平面上任意一个像素I(x,y)的几何贡献因子,S(Ft(ln))表示时间矫正系数:S(Ft(ln))=Speed(Ft(ln))/Ns,Ns为标准速度值,Speed(Ft(ln))表示时间t时图像帧F中标定线ln覆盖的像素点的速度大小。Where I s represents the velocity sample map before space and time correction, and S C (x, y) represents the geometric contribution factor of any pixel I(x, y) on the image plane, S(F t (l n )) Time correction coefficient: S(F t (l n ))=Speed(F t (l n ))/N s , N s is the standard speed value, and Speed(F t (l n )) represents the image frame F at time t The speed at which the pixel of the alignment line l n is covered.
  6. 根据权利要求1所述的方法,其特征在于,所述分类模型可将速度采样图分为4类:速度采样图中只有进入的人、速度采样图中只有出的人、速度采样图中有进有出的人、速度采样图中无人进出。The method according to claim 1, wherein the classification model divides the velocity sampling map into four categories: only the incoming person in the speed sampling map, only the person in the speed sampling map, and the speed sampling map. No one enters or exits the incoming and outgoing people.
  7. 根据权利要求1所述的方法,其特征在于,所述统计模型进一步包括统计人群数量模型和统计进出人群模型,其中,所述统计人群数量模型用于统计原始采样图中人群的总数量;所述统计进出人群模型用于统计有进有出类别的速度采样图中进入人群所占的比例。The method according to claim 1, wherein the statistical model further comprises a statistical population size model and a statistical in-and-out population model, wherein the statistical population quantity model is used to count the total number of people in the original sampling map; The statistical in-and-out population model is used to calculate the proportion of people entering the population in the speed sampling chart with and without the category.
  8. 根据权利要求1所述的方法,其特征在于,所述统计模型使用卷积神经网络训练得到,其中,用于训练统计人群数量模型的卷积神经网络包括输入层、5个卷积层、2个全连接层以及输出层;用于训练统计进出人群模型的卷积神经网络包括输入层、3个卷积层、1个全连接层以及输出层。The method of claim 1 wherein said statistical model is trained using a convolutional neural network, wherein the convolutional neural network for training the statistical population size model comprises an input layer, five convolutional layers, 2 A fully connected layer and an output layer; a convolutional neural network for training a statistical in and out crowd model includes an input layer, three convolutional layers, one fully connected layer, and an output layer.
  9. 根据权利要求1所述的方法,其特征在于,所述步骤4进一步包括以下步骤:The method of claim 1 wherein said step 4 further comprises the step of:
    步骤41,与所述步骤1类似,基于所述实时视频流获取多幅原始采样图以及速度采样图;Step 41: Similar to the step 1, acquiring a plurality of original sampling images and a velocity sampling map based on the real-time video stream;
    步骤42,与所述步骤2类似,对于所述步骤41得到的速度采样图分别进行时空矫正;Step 42 is similar to the step 2, and the speed sampling map obtained in the step 41 is separately subjected to space-time correction;
    步骤43,利用所述深度学习模型中的分类模型对于所述速度采样图分别进行分类,判断得到所述速度采样图所属的类别;Step 43: Perform classification on the speed sampling map by using a classification model in the deep learning model, and determine a category to which the speed sampling map belongs;
    步骤44,根据所述速度采样图所属的类别,使用所述深度学习模型中的统计模型分别分析原始采样图中的人群信息;Step 44: Perform, according to the category to which the speed sampling map belongs, the population information in the original sampling map by using a statistical model in the deep learning model;
    步骤45,对于多幅原始采样图对应的人群信息进行整合,获得所述实时视频流对应时段内的精确人群信息。 Step 45: Integrate the crowd information corresponding to the plurality of original sampling images to obtain accurate crowd information in the corresponding time period of the real-time video stream.
  10. 根据权利要求9所述的方法,其特征在于,所述步骤44中,对于无人进出的类别,人群数量统计为零;对于-只有出去和只有进入的类别,使用所述统计模型中的统计人群数量模型统计人群数量;对于有进有出的类别,使用所述统计模型中的统计进出人群模型统计得到进入人数所占的比例,并结合所述统计人群数量模型得到的人群数量统计结果,最终分别获得进入和出去的人数。 The method according to claim 9, wherein in the step 44, the population quantity is zero for the unmanned entry category; the statistics in the statistical model are used for the - only outgoing and only incoming categories. The population quantity model counts the number of people in the population; for the categories that are in and out, the statistics of the number of people entering the population using the statistical in-and-out population model in the statistical model, and the population quantity statistics obtained by combining the statistical population quantity model, The number of people who entered and went out was finally obtained.
PCT/CN2014/088901 2014-10-20 2014-10-20 All-weather video monitoring method based on deep learning WO2016061724A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2014/088901 WO2016061724A1 (en) 2014-10-20 2014-10-20 All-weather video monitoring method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2014/088901 WO2016061724A1 (en) 2014-10-20 2014-10-20 All-weather video monitoring method based on deep learning

Publications (1)

Publication Number Publication Date
WO2016061724A1 true WO2016061724A1 (en) 2016-04-28

Family

ID=55760022

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/088901 WO2016061724A1 (en) 2014-10-20 2014-10-20 All-weather video monitoring method based on deep learning

Country Status (1)

Country Link
WO (1) WO2016061724A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108803328A (en) * 2018-06-14 2018-11-13 广东惠禾科技发展有限公司 Camera self-adapting regulation method, device and camera
CN111079488A (en) * 2019-05-27 2020-04-28 陕西科技大学 Bus passenger flow detection system and method based on deep learning
CN111275592A (en) * 2020-01-16 2020-06-12 浙江工业大学 Classroom behavior analysis method based on video images
CN112668532A (en) * 2021-01-05 2021-04-16 重庆大学 Crowd counting method based on multi-stage mixed attention network
CN117574133A (en) * 2024-01-11 2024-02-20 湖南工商大学 Unsafe production behavior identification method and related equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101635835A (en) * 2008-07-25 2010-01-27 深圳市信义科技有限公司 Intelligent video monitoring method and system thereof
CN101751553A (en) * 2008-12-03 2010-06-23 中国科学院自动化研究所 Method for analyzing and predicting large-scale crowd density
WO2012008176A1 (en) * 2010-07-12 2012-01-19 株式会社日立国際電気 Monitoring system and method of monitoring
CN102819764A (en) * 2012-07-18 2012-12-12 郑州金惠计算机系统工程有限公司 Method for counting pedestrian flow from multiple views under complex scene of traffic junction
CN102930248A (en) * 2012-10-22 2013-02-13 中国计量学院 Crowd abnormal behavior detection method based on machine learning
CN103218816A (en) * 2013-04-18 2013-07-24 中山大学 Crowd density estimation method and pedestrian volume statistical method based on video analysis
CN103984937A (en) * 2014-05-30 2014-08-13 无锡慧眼电子科技有限公司 Pedestrian counting method based on optical flow method
CN104320617A (en) * 2014-10-20 2015-01-28 中国科学院自动化研究所 All-weather video monitoring method based on deep learning

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101635835A (en) * 2008-07-25 2010-01-27 深圳市信义科技有限公司 Intelligent video monitoring method and system thereof
CN101751553A (en) * 2008-12-03 2010-06-23 中国科学院自动化研究所 Method for analyzing and predicting large-scale crowd density
WO2012008176A1 (en) * 2010-07-12 2012-01-19 株式会社日立国際電気 Monitoring system and method of monitoring
CN102819764A (en) * 2012-07-18 2012-12-12 郑州金惠计算机系统工程有限公司 Method for counting pedestrian flow from multiple views under complex scene of traffic junction
CN102930248A (en) * 2012-10-22 2013-02-13 中国计量学院 Crowd abnormal behavior detection method based on machine learning
CN103218816A (en) * 2013-04-18 2013-07-24 中山大学 Crowd density estimation method and pedestrian volume statistical method based on video analysis
CN103984937A (en) * 2014-05-30 2014-08-13 无锡慧眼电子科技有限公司 Pedestrian counting method based on optical flow method
CN104320617A (en) * 2014-10-20 2015-01-28 中国科学院自动化研究所 All-weather video monitoring method based on deep learning

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108803328A (en) * 2018-06-14 2018-11-13 广东惠禾科技发展有限公司 Camera self-adapting regulation method, device and camera
CN108803328B (en) * 2018-06-14 2021-11-09 广东惠禾科技发展有限公司 Camera self-adaptive adjusting method and device and camera
CN111079488A (en) * 2019-05-27 2020-04-28 陕西科技大学 Bus passenger flow detection system and method based on deep learning
CN111079488B (en) * 2019-05-27 2023-09-26 广东快通信息科技有限公司 Deep learning-based bus passenger flow detection system and method
CN111275592A (en) * 2020-01-16 2020-06-12 浙江工业大学 Classroom behavior analysis method based on video images
CN111275592B (en) * 2020-01-16 2023-04-18 浙江工业大学 Classroom behavior analysis method based on video images
CN112668532A (en) * 2021-01-05 2021-04-16 重庆大学 Crowd counting method based on multi-stage mixed attention network
CN117574133A (en) * 2024-01-11 2024-02-20 湖南工商大学 Unsafe production behavior identification method and related equipment
CN117574133B (en) * 2024-01-11 2024-04-02 湖南工商大学 Unsafe production behavior identification method and related equipment

Similar Documents

Publication Publication Date Title
CN104320617B (en) A kind of round-the-clock video frequency monitoring method based on deep learning
CN112561146B (en) Large-scale real-time traffic flow prediction method based on fuzzy logic and depth LSTM
CN109376637B (en) People counting system based on video monitoring image processing
CN108615027B (en) Method for counting video crowd based on long-term and short-term memory-weighted neural network
WO2016061724A1 (en) All-weather video monitoring method based on deep learning
CN111209892A (en) Crowd density and quantity estimation method based on convolutional neural network
CN109657581B (en) Urban rail transit gate traffic control method based on binocular camera behavior detection
CN104751486B (en) A kind of moving target relay tracking algorithm of many ptz cameras
CN112257609B (en) Vehicle detection method and device based on self-adaptive key point heat map
CN108710875A (en) A kind of take photo by plane road vehicle method of counting and device based on deep learning
US10432896B2 (en) System and method for activity monitoring using video data
CN107729799A (en) Crowd&#39;s abnormal behaviour vision-based detection and analyzing and alarming system based on depth convolutional neural networks
CN107818326A (en) A kind of ship detection method and system based on scene multidimensional characteristic
CN107229929A (en) A kind of license plate locating method based on R CNN
Kawano et al. Road marking blur detection with drive recorder
CN103425967A (en) Pedestrian flow monitoring method based on pedestrian detection and tracking
CN113536972B (en) Self-supervision cross-domain crowd counting method based on target domain pseudo label
CN115797873B (en) Crowd density detection method, system, equipment, storage medium and robot
CN103473554A (en) People flow statistical system and people flow statistical method
Li et al. A traffic state detection tool for freeway video surveillance system
CN114267082B (en) Bridge side falling behavior identification method based on depth understanding
CN116167625B (en) Trampling risk assessment method based on deep learning
CN111540203B (en) Method for adjusting green light passing time based on fast-RCNN
CN114092866A (en) Method for predicting space-time distribution of airport passenger flow
CN110688924A (en) RFCN-based vertical monocular passenger flow volume statistical method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14904565

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14904565

Country of ref document: EP

Kind code of ref document: A1