WO2019210555A1 - 一种基于深度神经网络的人数统计方法及装置、存储介质 - Google Patents

一种基于深度神经网络的人数统计方法及装置、存储介质 Download PDF

Info

Publication number
WO2019210555A1
WO2019210555A1 PCT/CN2018/091569 CN2018091569W WO2019210555A1 WO 2019210555 A1 WO2019210555 A1 WO 2019210555A1 CN 2018091569 W CN2018091569 W CN 2018091569W WO 2019210555 A1 WO2019210555 A1 WO 2019210555A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
detected
background
human body
people
Prior art date
Application number
PCT/CN2018/091569
Other languages
English (en)
French (fr)
Inventor
袁誉乐
曹建民
崔小乐
叶青松
Original Assignee
深圳技术大学(筹)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳技术大学(筹) filed Critical 深圳技术大学(筹)
Publication of WO2019210555A1 publication Critical patent/WO2019210555A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Definitions

  • the present invention relates to the field of image processing, and in particular, to a method and device for counting people based on a deep neural network, and a storage medium.
  • the method of using computer vision technology to monitor the number of people in monitoring images or videos has been realized, and can be widely applied in project scenarios such as stepping on warning, traffic diversion, shop flow evaluation, and attendance rate statistics.
  • the existing population statistics system often has large statistical errors for the crowded environment. This is because individuals in the crowd will block each other in a crowded environment, and the limb characteristics below the shoulders of the human body cannot be reliably utilized.
  • the feature extraction and positioning of the limb features of the head and shoulders are performed, the background texture features are easily confusing due to factors such as the relatively simple shape of the head and shoulders, and a large number of missed detections or false detections are generated.
  • the technical problem mainly solved by the present invention is how to overcome the deficiencies of the prior art and improve the accuracy and real-time performance of the statistical results of the population in a complicated situation of the crowd scene.
  • the present application provides a method for counting people based on deep neural networks.
  • an embodiment provides a method for counting people based on a deep neural network, including the following steps:
  • the number of people in the image to be detected is obtained according to statistical results of key parts of the human body.
  • the acquiring the image to be detected includes: acquiring a video of the to-be-monitored group; and selecting one frame image one by one from the image sequence of the video as the image to be detected.
  • the background model includes all background information of the video, and the background information is image information of a non-human object
  • the determination result is yes, all the background information in the background model is used as the background image, and the background image includes image information of all non-human objects in the video;
  • Performing area detection on the image to be detected, and constructing a background model according to the area detection result comprising: inputting the image to be detected into an object detection model based on YOLO V3, obtaining a human area and an unmanned area; constructing one and The image to be detected has a background model of pixel-to-one correspondence, and the pixel value of the pixel corresponding to the unmanned area in the background model is set as the pixel value of each pixel in the unmanned area, The pixel value of the pixel corresponding to the human region in the background model is set to the first value.
  • Determining whether the background model includes all background information of the video including: determining whether a pixel of the first value exists in the background model, and if not, determining that the background model includes all of the video Background information, on the other hand, considers that the background model does not include all background information of the video.
  • the image to be detected of the next frame is input into the object detection model based on YOLO V3 to obtain a new unmanned area;
  • the background model is repeatedly updated until it is determined that there is no pixel point of the first value in the background model.
  • Performing deep neural network processing on the foreground image to count the number of key parts of the human body in the foreground image including:
  • the topology includes filters connected in series, multiple convolutional structures, channels, and a softmax function processor.
  • the obtaining the number of people in the image to be detected according to the statistical result of the key parts of the human body comprises: separately counting the number of key parts of each human body, and calculating the number of persons corresponding to the key parts of the human body according to the number of each body part; The maximum number of people corresponding to the key parts of the human body, and the maximum value is used as the number of people in the image to be detected.
  • an embodiment provides a population computing device based on a deep neural network, including:
  • An image acquiring unit to be detected configured to acquire an image to be detected
  • a foreground background image acquiring unit configured to obtain a background image and a foreground image according to the image to be detected
  • a neural network processing unit configured to perform deep neural network processing on the foreground image to count the number of key parts of the human body in the foreground image
  • the number of people is used for comparing the number of people in the image to be detected according to the statistical result of the key parts of the human body.
  • the person counting device further includes a display unit; the display unit is configured to display the number of people in the image to be detected and the current image to be detected in real time.
  • an embodiment provides a computer readable storage medium, comprising a program executable by a processor to implement the method of the first aspect.
  • a method and device for counting people based on deep neural network comprising: acquiring an image to be detected, obtaining a background image and a foreground image according to the image to be detected, and performing deep neural network processing on the foreground image to obtain a statistical foreground
  • the number of key parts of the human body in the image is compared with the number of people in the image to be detected based on the statistical results of the key parts of the human body. Since the background image is updated according to the unmanned area in the image to be detected each time the background image of the image to be detected is acquired, the background image can maintain the real-time integrity, which facilitates the background difference.
  • the method quickly obtains a foreground image from the image to be detected.
  • the training of the deep neural network is carried out by using the training data marked with the key parts of the human body, which improves the accuracy of the acquisition of key parts of the human body in the foreground image, and facilitates comparison of the number of people in the image according to the statistical results of the number of key parts of the human body. Even when some parts of the human body are blocked, the human body can be better recognized, thereby improving the accuracy of the number of people.
  • Figure 1 is a flow chart of the method of counting people
  • 2 is a flow chart of acquiring an image to be detected
  • 3 is a flow chart of acquiring a foreground image
  • Figure 4 is a flow chart for constructing a background model
  • Figure 5 is a flow chart of deep neural network processing
  • Figure 6 is a flow chart for comparing the number of people obtained
  • FIG. 7 is a structural diagram of a topology of a deep neural network
  • Figure 8 is a structural view of a key part model of the human body
  • Figure 9 is a structural diagram of a convolution unit
  • FIG. 10 is a schematic structural view of a person counting device.
  • the present application discloses a method for counting people based on a deep neural network, which can obtain a number of people in an image from a to-be-detected image after being processed by a deep neural network, and has a fast and accurate processing effect.
  • the method includes steps S100-S400, which are separately described below.
  • step S100 the image to be detected is acquired.
  • the image of the monitored crowd is often acquired by an image capturing device such as a camera or a camera.
  • S100 may include steps S110-S120, which are respectively described below.
  • step S110 the mobile camera, the surveillance camera, the mobile phone camera and the like continue to perform camera shooting on the crowded places such as the venue and the channel to obtain the video of the people to be monitored.
  • the obtained video content herein includes a situation of no one, a small number of people, a majority of people, and the like, and the human and environmental objects in the video may be in a continuous positional movement or posture change state,
  • the video should have good picture quality and smoothness.
  • step S120 the video of the to-be-monitored crowd is often composed of a frame image of a continuous time frame, and the human and environmental objects in each frame of the image are in a relatively static state. Therefore, the frame image in the video sequence can be used as the frame image.
  • the method of reading the image of the image to be detected belongs to the prior art and will not be described in detail herein.
  • one frame of image should be selected one by one from the image sequence of the video as the image to be detected, and each frame of image is processed to obtain the number of people in the current time image, so that continuous frame images can be obtained.
  • the number of people in the image to be detected is obtained in real time, and the dynamic monitoring effect of the number of people is realized.
  • step S200 a background image and a foreground image are obtained according to the image to be detected.
  • the step S200 may include steps S210-S250, which are specifically described below.
  • Step S210 performing area detection on the image to be detected obtained in step S120, and constructing a background model according to the area detection result.
  • the step S210 may include steps S211-S213.
  • the image to be detected is input into an image detecting program to determine which regions in the image to be detected belong to the human body and which regions belong to the environmental object.
  • the image to be detected is input into an object detection model based on YOLO V3 to obtain a human area and an unmanned area, wherein the unmanned area includes other objects other than the human body (such as buildings, natural landscapes, etc.).
  • YOLO V3 is the third version published by YOLO official website. It is a classic algorithm for target detection. It has the training and learning features of deep neural network. It can divide the input image into many image blocks. The classifier determines whether each image block contains an object, and identifies the category to which the object belongs, and has the advantages of detecting the object very fast, avoiding background errors, and learning the generalization of the object category. Then, in the present embodiment, when the object to be detected is processed by the object detection model based on YOLO V3, it is easy to obtain the human area and the unmanned area in the image to be detected according to the generalization characteristics of the human body and the non-human body.
  • Step S212 construct a background model having a one-to-one correspondence with the image to be detected, and set a pixel value of the pixel corresponding to the human region in the background model to a first value (such as -1).
  • the background model of the monitored area when the background model of the monitored area is first constructed, the pixel value of the pixel corresponding to the human area in the background model may be set to the first value, and after the background model has been constructed, the step may be omitted. S212, the background model is updated only by step S213.
  • Step S213 since each pixel in the image to be detected has a specific pixel value (as in the image encoding mode, 8 bits are commonly used to represent one pixel, each pixel has 256 gray levels, ranging from 0 to 255. The pixel value is taken between the pixel values), then the pixel value of the pixel corresponding to the unmanned area in the background model is set as the pixel value of each pixel in the unmanned area.
  • the pixel value of each pixel in the image to be detected can be represented by the following formula:
  • Bg(i)[x,y] represents the pixel value of the ith frame image at pixel coordinates [x, y]
  • Cr(i)[x, y] represents the ith frame image at pixel coordinates [x, y]
  • the initial pixel value at , Bg(i-1)[x,y] represents the pixel value of the image of the previous frame at pixel coordinates [x,y], and Bg(i-2)[x,y] represents the previous
  • Step S220 determining whether the background model includes all background information of the video, where the background information refers to image information of the non-human object, and then determining whether the background model includes all environmental objects except the human body in the video monitoring range. If the result of the determination is no, the process proceeds to step S230, and if not, the process proceeds to step S240.
  • the pixel value of the pixel corresponding to the human region in the background model is set to a first value (such as -1), and the pixel value of the pixel corresponding to the unmanned region in the background model is set to none.
  • the pixel value of each pixel in the human region for example, 0 to 255
  • the pixel value of each pixel in the background model can be determined. For details, see step S221 in FIG. 4, and it is determined whether the first value exists in the background model.
  • the pixel point (that is, whether the pixel value of each pixel is less than 0), if there is a pixel point of the first value (that is, the pixel value of a certain pixel point is -1), it indicates that there is a pixel corresponding to the human area in the background model. Point, then it is considered that all background information of the video is not included in the background model, and the process proceeds to step S230; if there is no pixel point of the first value (ie, the pixel values of all the pixels are greater than -1), then the background model is considered to include all of the video.
  • the background information proceeds to step S240.
  • step S230 in order to make all the background information of the video included in the background model, the area to be detected of the next frame is detected here, and the background model is updated according to the area detection result until it is determined that the background model includes all background information of the video.
  • step S230 includes steps S231-S232.
  • step S231 the image to be detected of the next frame is input into the object detection model based on YOLO V3 to obtain a new unmanned area.
  • the method of acquiring the new unmanned area refer to step S211.
  • Step S232 updating, according to the pixel value of each pixel in the new unmanned area, the pixel value of the pixel corresponding to the new unmanned area in the background model to eliminate the pixel corresponding to the new unmanned area in the background model.
  • the human body in the video of the to-be-monitored population is in a state of positional movement and posture change.
  • the environmental object that is blocked by the human body in the current frame image will be in the next frame image or the next
  • the multi-frame image is revealed, and the background information corresponding to the gradually exposed environmental object can be updated to the background model in time to gradually eliminate the value of the pixel corresponding to the human region in the background model.
  • steps S221-S231-S232 may be performed cyclically to repeatedly update the background model, and finally include all background information in the background model until it is determined in step S221 that there is no pixel of the first value in the background model. point.
  • the image of the location where the monitored person is located is photographed in advance by the camera, and then the frame image including only the environmental object is present in the video sequence of the to-be-monitored group.
  • the frame image may be selected to be constructed.
  • the background model is such that the pixel of the first value does not exist in the background model.
  • Step S240 all background information in the background model is used as the background image, where the background image includes image information of all non-human objects in the video, that is, the background image includes image information corresponding to all environmental objects in the monitoring range.
  • the human body in the video of the people to be monitored will be in a state of change, but the environmental objects tend to be in a state of static or slight change. Therefore, the obtained background image does not change in a short time, so the background can be The image is used as the base template for acquiring the unmanned area in the image to be detected next time.
  • Step S250 Perform background difference processing on the image to be detected according to the background image to obtain a foreground image, where the foreground image includes image information of all human bodies in the image to be detected.
  • the background difference processing is a common image processing method and belongs to the prior art.
  • the unmanned area in the image to be detected is matched with the background image to obtain an unmanned area with a more precise area. Then, the unmanned area is differentially removed in the image to be detected, that is, a human area with a relatively accurate area range is obtained.
  • steps S210-S250 not only a relatively complete background image including all background information but also a more accurate foreground image can be obtained according to the background image, and the currently obtained background image has a reference function.
  • the background model is updated in real time through step S213, so that the background image is updated in real time, and then, when the foreground image of the next frame to be detected is obtained,
  • the updated background image may be used as a basis to perform background difference processing to obtain a foreground image of the next frame to be detected, so that the update background model process of step S230 may be omitted.
  • step S200 may ignore sub-steps S220-S240, and obtain foreground images directly from sub-steps S210 and S250.
  • the first solution is: acquiring the human area and the unmanned area in the image to be detected according to the method disclosed in step S210, and setting the pixel value of the pixel corresponding to the unmanned area in the background model as each pixel in the unmanned area.
  • the pixel value of the point, and the background information corresponding to the unmanned area in the background model is used as the background image.
  • the background image at this time includes only the environmental objects that can be displayed in the image to be detected, the disclosure may be disclosed in step S250.
  • the method differentially processes the background image in the image to be detected to obtain a foreground image.
  • the second solution is: acquiring the human area and the unmanned area in the image to be detected according to the method disclosed in step S210, without constructing the background model, but integrating the image information corresponding to the direct human area in step S250, the part is integrated
  • the integrated image information is used as the foreground image, which saves time for building the background model, but it also causes the problem that the area of the person is not accurate enough.
  • the user can select the solution according to actual needs.
  • Step S300 performing deep neural network processing on the foreground image obtained in step S250 to count the number of key parts of the human body in the foreground image.
  • the step S300 may include steps S310-S350, which are specifically described as follows.
  • Step S310 setting a topology of a deep neural networks (DNN), as shown in FIG. 7, the topology includes a filter connected in series, and a plurality of convolution structures (preferably adopting 7 bottleneck convolution structures) , channel and softmax function processor.
  • DNN deep neural networks
  • the filter is a common technical method in the image processing process, including linear filtering, high-pass filtering and low-pass filtering.
  • the filter is used to filter the input foreground image and eliminate the foreground image.
  • Convolution structure is a common functional unit in neural networks. Its main function is to obtain image classification or regression required features after training.
  • the convolution unit in this application adds a parallel 1x1 convolution unit based on the bottleneck convolution concept, which makes the acquired image features richer and the final model recognition rate more accurate.
  • the Softmax function is a typical classification method, and classification or regression is determined according to probability, which belongs to the prior art.
  • Step S320 construct a model of a key part of the human body, and obtain a head A, shoulders B and C, arms D and E, hands F and G, and legs I, H, J, and K in the model.
  • the generalization features of the other parts, and the generalized features are used as training data to calibrate key parts of the human body.
  • Step S330 training the model parameters of the topology according to the training data acquired in step S320.
  • the obtained model parameters are shown in Table 1.
  • each convolution unit in Table 1 can be seen in Figure 9, where BN is a normalization processing function for normalizing each neuron, which belongs to the prior art; RELU is an activation function for It is a prior art to ensure the efficiency of the training process, and will not be described in detail here.
  • Step S340 the foreground image acquired in step S250 is input into the obtained topology, and the key parts of the human body in the foreground image are constructed according to the distribution features of the topology structure on the training data.
  • 11 analogous key parts of the human body including head A, shoulders B and C, arms D and E, hands F and G, legs I, H, J, and K) are constructed.
  • the key parts of the human body are marked in the foreground image.
  • Step S350 Obtain the number of key parts of the human body in the foreground image, and use the letter N to count the number of key parts of the human body.
  • the statistical results of the key parts of each human body are N A , N B , N C , N D , N E , N F , N G , N H , N I , N J , N K .
  • step S400 the number of people in the image to be detected is obtained according to the statistical result comparison of the key parts of the human body.
  • the step S400 can include steps S410-S420.
  • step S410 the number of key parts of each human body is separately counted, and the number of people corresponding to the key parts of the human body is calculated according to the number of each human body part.
  • the maximum value function of max(N B , N C ) is used to obtain the number of people corresponding to the shoulder
  • the maximum value function of max(N D , N E ) is used to obtain the number of people corresponding to the arm
  • max(N F ) The maximum value function of N G ) obtains the number of people corresponding to the hand
  • the maximum value function of max(N H , N I ) is used to obtain the number of people corresponding to the leg. It is also possible to use max(N J , N K ).
  • the value function gets the number of people corresponding to the leg.
  • Step S420 Obtain a maximum value of the number of people corresponding to the key parts of the human body, and use the maximum value as the number of people in the image to be detected.
  • the maximum value is obtained by the following formula
  • the maximum value is taken as the number of people in the image to be detected.
  • the present application discloses a depth neural network based demographic device, see FIG.
  • the person counting means 5 includes a to-be-detected image acquiring unit 51, a foreground background image acquiring unit 52, a neural network processing unit 53, and a person counting unit 54, which are respectively described below.
  • the to-be-detected image acquiring unit 51 is configured to acquire an image to be detected, and the specific process of acquiring the image to be detected may refer to step S100, and details are not described herein again.
  • the foreground background image obtaining unit 52 is in communication with the image to be detected 51.
  • the background image and the foreground image are obtained according to the image to be detected.
  • the neural network processing unit 53 is in communication with the foreground background image acquiring unit 52, and is configured to perform depth neural network processing on the foreground image to calculate the number of key parts of the human body in the foreground image. For the specific process, refer to step S300, and details are not described herein.
  • the number of people statistic unit 54 is in communication with the neural network processing unit 53 for comparing the number of people in the image to be detected according to the statistical result of the key parts of the human body.
  • the neural network processing unit 53 for comparing the number of people in the image to be detected according to the statistical result of the key parts of the human body.
  • the to-be-detected image acquiring unit 51, the foreground background image acquiring unit 52, the neural network processing unit 53, and the number of people counting unit 54 may be program processing modules in the program, respectively, and may respectively implement corresponding processing according to respective processing logics.
  • the number counting device 5 may further include a display unit 55, which is communicably connected to the person counting unit 54 for displaying the image to be detected and the number of persons in the current image to be detected in real time, even the display unit.
  • the 55 can also display the key parts of the human body in the image to be detected in real time, so that the user can observe the moving state of the crowd in the display screen more intuitively and vividly.
  • the display unit 55 can be various types of display devices that can display screens such as televisions, display screens, projectors, and the like.
  • the program may be stored in a computer readable storage medium, and the storage medium may include: a read only memory, a random access memory, a magnetic disk, an optical disk, a hard disk, etc.
  • the computer executes the program to implement the above functions.
  • the program is stored in the memory of the device, and when the program in the memory is executed by the processor, all or part of the above functions can be realized.
  • the program may also be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk or a mobile hard disk, and may be saved by downloading or copying.
  • the system is updated in the memory of the local device, or the system of the local device is updated.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

一种基于深度神经网络的人数统计方法及装置、存储介质,包括获取待检测图像,根据待检测图像得到背景图像和前景图像,对前景图像进行深度神经网络处理,以统计前景图像中人体关键部位的数量,根据人体关键部位的统计结果比较获得待检测图像中的人数。由于仅对前景图像进行深度神经网络处理来识别图像中的人体关键部位,可避免背景图像的信息干扰作用,省去背景像素的检测耗时,加快了算法的运算速度。采用标记有人体关键部位的训练数据对构建的深度神经网络进行训练,提高人体关键部位的获取准确性,利于根据多种人体关键部位的数量统计结果比较得到图像中的人数,即使在某些部位被遮挡时,也能较好地识别出该人体,从而提高人数统计的准确率。

Description

一种基于深度神经网络的人数统计方法及装置、存储介质 技术领域
本发明涉及图像处理领域,具体涉及一种基于深度神经网络的人数统计方法及装置、存储介质。
背景技术
随着当代社会人口快速增长的作用,由于人群聚集而引起的慌乱情形已经不止一次地发生,监控人群数量和维护社会治安已变得越来越重要,人群数量统计是人群监控的重要研究方向之一,侧重于统计人数,通常要求统计结果能精确到具体人数,统计结果也可作为人群密度估计的影响参数。传统的依靠人力来进行人群监控的方式,容易产生视觉疲劳、容易受到个人主观因素影响,致使统计结果不准确。但随着社会科技的日新月异,特别是机器视觉等技术的发展,使得实时统计图像中人数称为可能。
当前,众多场合对人群数量统计提出了越来越高的要求,比如对博物馆或者著名旅游景点的人数统计以监控并及时处理人群拥挤情形,对车站等公共场所进行人数统计以及时安排疏散安全通道并避免人流过大引起拥堵,对商场的人群统计可以合理地布局商品摆放位置以增加购买度,对广告位前的停留人数进行统计以广告厂商便合理地布局其广告策略,总之,人群数量统计具有广泛的市场需求和应用前景。
近年来,利用计算机视觉技术对监控图像或视频进行人数统计的方法得以实现,可广泛应用在如踩踏预警、交通疏导、商铺人流评估、出勤率统计等项目场景中。然而,现有的人数统计系统对于人群拥挤环境还时常存在较大的统计误差,这是因为在拥挤环境下人群中的个体之间会相互遮挡,导致人体肩膀以下的肢体特征无法被可靠的利用,而仅对头肩部位的肢体特征进行特征提取和定位时,因头肩形状曲线相对简单等因素致使容易发生混淆背景纹理特征的情况,产生大量的漏检或误检问题。
此外,也有采用全卷积网络模型、金字塔图模型、神经网络训练模型来进行人数统计的方式,但是,现有的此类模型需要融合大量的人工特征,设计特征复杂,使用起来步骤繁琐,致使计算量大、输出速度慢,尚不能在实时性要求较高的监控场景中进行应用。
发明内容
本发明主要解决的技术问题是如何克服现有技术的不足,提高人群场景复杂情形下人数统计结果的准确性和实时性。为解决上述问题,本申请提供了一种基于深度神经网络的人数统计方法。
根据第一方面,一种实施例中提供一种基于深度神经网络的人数统计方法,包括以下步骤:
获取待检测图像;
根据所述待检测图像得到背景图像和前景图像;
对所述前景图像进行深度神经网络处理,以统计所述前景图像中人体关键部位的数量;
根据人体关键部位的统计结果比较获得所述待检测图像中的人数。
所述获取待检测图像,包括:获取待监控人群的视频;从所述视频的图像序列中逐一选择一帧图像以作为所述待检测图像。
所述根据所述待检测图像得到背景图像和前景图像,包括:
对所述待检测图像进行区域检测,根据区域检测结果构建背景模型;
判断所述背景模型是否包括所述视频的所有背景信息,所述背景信息为非人物体的图像信息;
若判断结果为是,则将所述背景模型中的所有背景信息作为所述背景图像,所述背景图像包括所述视频中所有非人物体的图像信息;
若判断结果为否,则对下一帧的待检测图像进行区域检测,根据区域检测结果更新所述背景模型,直至判断所述背景模型包括所述视频的所有背景信息;
根据所述背景图像对所述待检测图像进行背景差分处理,获得所述前景图像,所述前景图像包括所述待检测图像中所有人体的图像信息。
所述对所述待检测图像进行区域检测,根据区域检测结果构建背景模型,包括:将所述待检测图像输入基于YOLO V3的物体检测模型,获得有人区域和无人区域;构建一与所述待检测图像具有像素一一对应关系的背景模型,将所述背景模型中与所述无人区域相对应像素点的像素值设定为所述无人区域中各像素点的像素值,将所述背景模型中与所述有人区域相对应像素点的像素值设定为第一值。
所述判断所述背景模型是否包括所述视频的所有背景信息,包括:判断所述背景模型中是否存在第一值的像素点,若不存在,则认为所述 背景模型包括所述视频的所有背景信息,反之,则认为所述背景模型未包括所述视频的所有背景信息。
所述若判断结果为否,则对下一帧的待检测图像进行区域检测,根据区域检测结果更新背景模型,直至判断所述背景模型包括所述视频的所有背景信息,包括:
判断结果为否,则将下一帧的待检测图像输入基于YOLO V3的物体检测模型,获得新无人区域;
根据新无人区域中各像素点的像素值,对所述背景模型中与新无人区域相对应像素点的像素值进行更新,以消除所述背景模型中与新无人区域相对应像素点之中存在的第一值;
重复更新所述背景模型,直至判断所述背景模型中不存在第一值的像素点。
所述对所述前景图像进行深度神经网络处理,以统计所述前景图像中人体关键部位的数量,包括:
设置深度神经网络的拓扑结构;
获取标定人体关键部位的训练数据;
根据所述训练数据训练所述拓扑结构的模型参数;
将所述前景图像输入所述拓扑结构,根据所述拓扑结构在所述训练数据上的分布特征构建所述前景图像中的人体关键部位;
获取所述前景图像中人体关键部位的数量。
所述拓扑结构包括串联连接的滤波器、多个卷积结构、通道和softmax函数处理器。
所述根据人体关键部位的统计结果比较获得所述待检测图像中的人数,包括:分别统计每种人体关键部位的数量,根据每种人体部位的数量计算该人体关键部位对应的人数;获取各种人体关键部位对应的人数的最大值,将最大值作为所述待检测图像中的人数。
根据第二方面,一种实施例中提供一种基于深度神经网络的人数统计装置,包括:
待检测图像获取单元,用于获取待检测图像;
前景背景图像获取单元,用于根据所述待检测图像得到背景图像和前景图像;
神经网络处理单元,用于对所述前景图像进行深度神经网络处理, 以统计所述前景图像中人体关键部位的数量;
人数统计单元,用于根据人体关键部位的统计结果比较获得所述待检测图像中的人数。
所述人数统计装置还包括显示单元;所述显示单元用于实时显示所述待检测图像以及当前待检测图像中的人数数值。
根据第三方面,一种实施例中提供一种计算机可读存储介质,其特征在于,包括程序,所述程序能够被处理器执行以实现如第一方面所述的方法。
本申请的有益效果是:
依据上述实施例的一种基于深度神经网络的人数统计方法及装置、存储介质,包括获取待检测图像,根据待检测图像得到背景图像和前景图像,对前景图像进行深度神经网络处理,以统计前景图像中人体关键部位的数量,根据人体关键部位的统计结果比较获得待检测图像中的人数。由于在获取待检测图像的背景图像时,利用帧图像之间的相关性,每次都根据待检测图像中的无人区域更新背景模型,使得背景图像能够保持实时的完整性,利于通过背景差分法快捷地从待检测图像中获取前景图像。而且,仅对前景图像进行深度神经网络处理来识别图像中的人体关键部位,可避免背景图像的信息干扰作用,省去背景像素点的检测耗时,加快了算法的运算速度,以使算法能够在较低性能的硬件平台上持续运行,降低应用成本。此外,采用标记有人体关键部位的训练数据对构建的深度神经网络进行训练,提高了前景图像中人体关键部位的获取准确性,利于根据多种人体关键部位的数量统计结果比较得到图像中的人数,即使在人体某些部位被遮挡时,也能较好地识别出该人体,从而提高人数统计结果的准确率。
附图说明
图1为人数统计方法的流程图;
图2为获取待检测图像的流程图;
图3为获取前景图像的流程图;
图4为构建背景模型的流程图;
图5为深度神经网络处理的流程图;
图6为比较获得人数的流程图;
图7为深度神经网络的拓扑结构的结构图;
图8为人体关键部位模型的结构图;
图9为卷积单元的结构图;
图10为人数统计装置的结构示意图。
具体实施方式
下面通过具体实施方式结合附图对本发明作进一步详细说明。其中不同实施方式中类似元件采用了相关联的类似的元件标号。在以下的实施方式中,很多细节描述是为了使得本申请能被更好的理解。然而,本领域技术人员可以毫不费力的认识到,其中部分特征在不同情况下是可以省略的,或者可以由其他元件、材料、方法所替代。在某些情况下,本申请相关的一些操作并没有在说明书中显示或者描述,这是为了避免本申请的核心部分被过多的描述所淹没,而对于本领域技术人员而言,详细描述这些相关操作并不是必要的,他们根据说明书中的描述以及本领域的一般技术知识即可完整了解相关操作。
另外,说明书中所描述的特点、操作或者特征可以以任意适当的方式结合形成各种实施方式。同时,方法描述中的各步骤或者动作也可以按照本领域技术人员所能显而易见的方式进行顺序调换或调整。因此,说明书和附图中的各种顺序只是为了清楚描述某一个实施例,并不意味着是必须的顺序,除非另有说明其中某个顺序是必须遵循的。
本文中为部件所编序号本身,例如“第一”、“第二”等,仅用于区分所描述的对象,不具有任何顺序或技术含义。而本申请所说“连接”、“联接”,如无特别说明,均包括直接和间接连接(联接)。
请参考图1,本申请公开了一种基于深度神经网络的人数统计方法,能够通过深度神经网络处理后从待检测图像之中获取图像中的人数,具有快速、准确的处理效果,该人数统计方法包括步骤S100-S400,下面分别说明。
步骤S100,获取待检测图像,在采用电子设备对人群数量进行监控时,往往需要通过照相机、摄像机等图像采集装置获取所监控人群的图像,那么,在一实施例中,见图2,该步骤S100可包括步骤S110-S120,分别说明如下。
步骤S110,通过移动摄像机、监控摄像头、手机相机等装置持续对会场、通道等人群易集聚场所进行摄像,以获取待监控人群的视频。本领域的技术人员应当理解,这里的所获取视频内容包括无人、少数人的 人群、多数人的人群等情况,而且视频中的人和环境物体可能处于连续的位置移动或姿态变化状态,因此,视频应当具有较好的画质和流畅度。
步骤S120,待监控人群的视频往往是由时间上连续的一幅幅的帧图像构成,每帧图像中的人和环境物体都处于相对静止的状态,因此,可将视频序列中的帧图像作为待检测图像,读取帧图像的方法属于现有技术,这里不进行详细说明。为达到连续监控人群数量的效果,应当从视频的图像序列中逐一选择一帧图像以作为待检测图像,对每帧图像进行处理以获得当前时刻图像中的人数,如此,可通过连续的帧图像实时获取待检测图像中的人数,实现人群数量的动态监控效果。
步骤S200,根据待检测图像得到背景图像和前景图像,在一实施例中,见图3,该步骤S200可包括步骤S210-S250,具体说明如下。
步骤S210,对步骤S120中所获得的待检测图像进行区域检测,根据区域检测结果构建背景模型,在一实施例中,见图4,该步骤S210可包括步骤S211-S213。
步骤S211,将待检测图像输入一图像检测程序,以判断待检测图像中哪些区域属于人体,哪些区域属于环境物体。在一具体实施例中,将待检测图像输入基于YOLO V3的物体检测模型,获得有人区域和无人区域,其中无人区域包括非人体的其它物体(比如建筑物、自然景观等)。
需要说明的是,YOLO V3是YOLO官网公布的第三个版本,是一种用于目标检测的经典算法,具有深度神经网络的训练和学习特征,能够把输入的图像划分成众多图像块,用分类器去判断每个图像块中是否包含有物体,以及识别物体所属的类别,具有检测物体非常快、避免背景错误、物体类别泛化特征学习等优势。那么,本实施例中,采用基于YOLO V3的物体检测模型对待检测图像进行处理时,易于根据人体和非人体的泛化特征获得待检测图像中的有人区域和无人区域。
步骤S212,构建一与待检测图像具有像素一一对应关系的背景模型,将背景模型中与有人区域相对应像素点的像素值设定为第一值(比如-1)。
需要说明的是,在首次构建所监控区域的背景模型时,可将背景模型中与有人区域相对应像素点的像素值设定为第一值,而在背景模型已经构建之后,可省略该步骤S212,仅通过步骤S213对背景模型进行更新。
步骤S213,由于待检测图像中的每个像素点都具有特定的像素值(如在图像编码方式中,常用8位表示一个像素,则每个像素点具有256个灰度等级,在0~255之间取像素值),那么,将背景模型中与无人区域相对应像素点的像素值设定为无人区域中各像素点的像素值。
在一具体实施例中,待检测图像中各像素点的像素值可用下面的公式进行表示
Bg(i)[x,y]=(Cr(i)[x,y]+Bg(i-1)[x,y]+Bg(i-2)[x,y])/3
其中,Bg(i)[x,y]表示第i帧图像在像素坐标[x,y]处的像素值,Cr(i)[x,y]表示第i帧图像在像素坐标[x,y]处的初始像素值,Bg(i-1)[x,y]表示前一帧图像在像素坐标[x,y]处的像素值,Bg(i-2)[x,y]表示再前一帧图像在像素坐标[x,y]处的像素值;i为整数,表示图像序列中的帧图像编号;x取值范围是0~w,y的取值范围是0~h,w为帧图像的像素宽度,h为帧图像的像素高度。
那么,取Cr(i)[x,y]、Bg(i-1)[x,y]、Bg(i-2)[x,y]三者的平均值作为当前帧图像在像素坐标[x,y]处的像素值,或者取更多个前面的帧图像来计算平均值并将其作为当前帧图像在像素坐标[x,y]处的像素值,如此,有利于保证每帧图像中各像素点取值过程的平稳性,有效避免摄像环境突变而引起的有人区域和无人区域获取效果差的情形。
步骤S220,判断背景模型是否包括视频的所有背景信息,这里的背景信息是指非人物体的图像信息,那么,即判断背景模型中是否包括视频监控范围内除人体之外的所有环境物体对应的图像信息,如果判断结果为否,则进入步骤S230,反之,则进入步骤S240。
在一具体实施例中,背景模型中与有人区域相对应像素点的像素值设定为第一值(比如-1),背景模型中与无人区域相对应像素点的像素值设定为无人区域中各像素点的像素值(比如0~255),那么,可对背景模型中各像素点的像素值进行判断,详见图4中的步骤S221,判断背景模型中是否存在第一值的像素点(即判断每个像素点像素值是否小于0),若存在第一值的像素点(即某一个像素点的像素值为-1),则表明背景模型中存在有人区域对应的像素点,那么认为背景模型中未包括视频的所有背景信息,进入步骤S230;若不存在第一值的像素点(即所有像素点的像素值均大于-1),则认为背景模型包括视频的所有背景信息,进入步骤S240。
步骤S230,为使得背景模型中包括视频的所有背景信息,这里对下一帧的待检测图像进行区域检测,根据区域检测结果更新背景模型,直至判断背景模型包括视频的所有背景信息。在一具体实施例中,见图4,步骤S230包括步骤S231-S232。
步骤S231,则将下一帧的待检测图像输入基于YOLO V3的物体检测模型,获得新无人区域,获取新无人区域的方法可参考步骤S211。
步骤S232,根据新无人区域中各像素点的像素值,对背景模型中与新无人区域相对应像素点的像素值进行更新,以消除背景模型中与新无人区域相对应像素点之中存在的第一值。
需要说明的是,待监控人群的视频中的人体处于位置移动和姿态变化状态,随着人体的位置或姿态的改变,当前帧图像中被人体遮挡的环境物体将在下一帧图像或者接下来的多帧图像中显露出来,那么可将逐渐显露出来的环境物体对应的背景信息及时更新至背景模型中,来逐渐消除背景模型中与有人区域相对应的像素点的值。
在另一个实施例中,可循环执行步骤S221-S231-S232,来重复更新背景模型,最终使得背景模型中包括所有的背景信息,直至在步骤S221中判断背景模型中不存在第一值的像素点。
在另一个实施例中,通过摄像装置预先拍摄了被监控人群所处的场所图像,那么,待监控人群的视频序列中将存在仅包括环境物体的帧图像,此时,可选择该帧图像构建背景模型,使得背景模型中不存在第一值的像素点,由步骤S220进行判断时,直接进入步骤S240,而无需循环执行步骤S221-S231-S232,来重复更新背景模型。
步骤S240,将背景模型中的所有背景信息作为所述背景图像,这里的背景图像包括视频中所有非人物体的图像信息,即背景图像包括监控范围内所有环境物体对应的图像信息。
需要说明的是,待监控人群的视频中的人体会处于变动状态,但环境物体往往处于静止或轻微变动状态,那么,认为得到的背景图像在短时间内不会发生变化,因此,可将背景图像作为下一次获取待检测图像中无人区域的基础模板。
步骤S250,根据背景图像对待检测图像进行背景差分处理,获得前景图像,这里的前景图像包括待检测图像中所有人体的图像信息。背景差分处理是一种常见的图像处理方法,属于现有技术,在本实施例中, 是将待检测图像中的无人区域与背景图像进行匹配,得到区域范围更为精确的无人区域,然后在待检测图像中差分去除该无人区域,即得到区域范围较为准确的有人区域。
本领域的技术人员应当理解,通过步骤S210-S250,不但获得了较为完整的包括所有背景信息的背景图像,还可根据背景图像获得较为精确的前景图像,那么当前得到的背景图像具有参考作用,可作为获得下一帧待检测图像的前景图像的参考模板,通过步骤S213对该背景模型进行实时更新,从而对该背景图像进行实时更新,那么,在获得下一帧待检测图像的前景图像时,可将该更新的背景图像作为依据,进行背景差分处理以得到下一帧待检测图像的前景图像,如此可省略步骤S230的更新背景模型过程。
在另一个实施例中,步骤S200可忽略子步骤S220-S240,直接由子骤S210和步骤S250获取前景图像。第一种方案是:根据步骤S210所公开的方法获取待检测图像中的有人区域和无人区域,将背景模型中与无人区域相对应像素点的像素值设定为无人区域中各像素点的像素值,并将背景模型中无人区域对应的背景信息作为背景图像,虽然,此时的背景图像只包括待检测图像中能够显示的环境物体,但是,仍可以通过步骤S250所公开的方法在待检测图像中差分处理掉背景图像,以获取前景图像。第二种方案是,根据步骤S210所公开的方法获取待检测图像中的有人区域和无人区域,不构建背景模型,而是在步骤S250中直接有人区域对应的图像信息进行整合,将该部分整合的图像信息作为前景图像,如此可节省构建背景模型的时间,但也会造成有人区域不够精确的问题,用户可以根据实际需求选择该方案。
步骤S300,对步骤S250中得到的前景图像进行深度神经网络处理,以统计前景图像中人体关键部位的数量。在一实施例中,见图5,该步骤S300可包括步骤S310-S350,具体说明如下。
步骤S310,设置深度神经网络(deep neural networks,DNN)的拓扑结构,如图7所示,该拓扑结构包括串联连接的滤波器、多个卷积结构(优选地采用7个瓶颈卷积结构)、通道和softmax函数处理器。
需要说明的是,滤波器是图像处理过程中的一种常用技术手段,包括线性滤波、高通滤波和低通滤波等形式,这里采用滤波器是将输入的前景图像进行滤波处理,消除前景图像中的异常图像信息。卷积结构是 神经网络中常见的功能单元,其主要功能为通过训练后,获取图像分类或者回归所需特征。本申请中的卷积单元在瓶颈(bottleneck)卷积概念的基础上增加一个并行的1x1的卷积单元,这样会使得获取的图像特征更为丰富,最后的模型识别率更准确。Softmax函数是一种典型的分类方法,根据概率来决定分类或者回归,属于现有技术。
步骤S320,如图8所示,构建人体关键部位的模型,获取模型中头部A,肩部B和C,臂部D和E,手部F和G,腿部I、H、J和K等部位的泛化特征,并将该些泛化特征作为标定人体关键部位的训练数据。
步骤S330,根据步骤S320中获取的训练数据训练拓扑结构的模型参数,在一实施例中,得到的模型参数可见表1。
表1 拓扑结构的模型参数
Figure PCTCN2018091569-appb-000001
表1中每个卷积单元的具体结构可见图9,其中,BN为归一化处理函数,用于对每个神经元做归一化处理,属于现有技术;RELU为激活函数,用于保证训练过程的高效性,属于现有技术,这里不再进行详细说明。
经过步骤S330之后,最终得到的拓扑结构可见图7。
步骤S340,将步骤S250中获取的前景图像输入得到的拓扑结构,根据拓扑结构在训练数据上的分布特征构建前景图像中的人体关键部位。在一实施例中,主要构建11个类比的人体关键部位(包括头部A,肩部B和C,臂部D和E,手部F和G,腿部I、H、J和K),在前景图像中对该些人体关键部位进行标记。
步骤S350,获取前景图像中人体关键部位的数量,并用字母N对该些人体关键部位的数量进行统计,各个人体关键部位的统计结果为N A、N B、N C、N D、N E、N F、N G、N H、N I、N J、N K
步骤S400,根据人体关键部位的统计结果比较获得待检测图像中的人数。在一实施例中,见图6,该步骤S400可包括步骤S410-S420。
步骤S410,分别统计每种人体关键部位的数量,根据每种人体部位的数量计算该人体关键部位对应的人数。例如,采用max(N B,N C)的求最值函数获得肩部对应的人数,采用max(N D,N E)的求最值函数获得臂部对应的人数,采用max(N F,N G)的求最值函数获得手部对应的人数,采用max(N H,N I)的求最值函数获得腿部对应的人数,还可以采用max(N J,N K)的求最值函数获得腿部对应的人数。
步骤S420,获取各种人体关键部位对应的人数的最大值,将最大值作为待检测图像中的人数。在一实施例中,采用如下公式求取最大值
最大值=max{N A,max(N B,N C),max(N D,N E),max(N F,N G),max(N H,N I),max(N J,N K)
那么,将该最大值作为待检测图像中的人数。
在一个实施例中,本申请公开了一种基于深度神经网络的人数统计装置,见图10。该人数统计装置5包括待检测图像获取单元51、前景背景图像获取单元52、神经网络处理单元53和人数统计单元54,下面分别说明。
待检测图像获取单元51用于获取待检测图像,获取待检测图像的具 体过程可参考步骤S100,这里不再赘述。
前景背景图像获取单元52与待检测图像获取单元51通信连接,用于根据待检测图像得到背景图像和前景图像,具体过程可参考步骤S200,这里不再赘述。
神经网络处理单元53与前景背景图像获取单元52通信连接,用于对前景图像进行深度神经网络处理,以统计前景图像中人体关键部位的数量,具体过程可参考步骤S300,这里不再赘述。
人数统计单元54与神经网络处理单元53通信连接,用于根据人体关键部位的统计结果比较获得待检测图像中的人数,具体过程可参考步骤S400,这里不再进行赘述。
需要说明的是,待检测图像获取单元51、前景背景图像获取单元52、神经网络处理单元53和人数统计单元54可分别为程序中的程序处理模块,可分别根据各自的处理逻辑来实现相应的功能。
在另一个实施例中,人数统计装置5还可以包括显示单元55,显示单元55可与人数统计单元54通信连接,用于实时显示待检测图像以及当前待检测图像中的人数数值,甚至显示单元55还可以实时显示待检测图像中人体的关键部位,以便于用户更直观生动地观察到显示画面中人群的移动状态。此外,显示单元55可为电视、显示屏、投影仪等可进行画面展示的各种类型的显示设备。
本领域技术人员可以理解,上述实施方式中各种方法的全部或部分功能可以通过硬件的方式实现,也可以通过计算机程序的方式实现。当上述实施方式中全部或部分功能通过计算机程序的方式实现时,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:只读存储器、随机存储器、磁盘、光盘、硬盘等,通过计算机执行该程序以实现上述功能。例如,将程序存储在设备的存储器中,当通过处理器执行存储器中程序,即可实现上述全部或部分功能。另外,当上述实施方式中全部或部分功能通过计算机程序的方式实现时,该程序也可以存储在服务器、另一计算机、磁盘、光盘、闪存盘或移动硬盘等存储介质中,通过下载或复制保存到本地设备的存储器中,或对本地设备的系统进行版本更新,当通过处理器执行存储器中的程序时,即可实现上述实施方式中全部或部分功能。
以上应用了具体个例对本发明进行阐述,只是用于帮助理解本发 明,并不用以限制本发明。对于本发明所属技术领域的技术人员,依据本发明的思想,还可以做出若干简单推演、变形或替换。

Claims (12)

  1. 一种基于深度神经网络的人数统计方法,其特征在于,包括以下步骤:
    获取待检测图像;
    根据所述待检测图像得到背景图像和前景图像;
    对所述前景图像进行深度神经网络处理,以统计所述前景图像中人体关键部位的数量;
    根据人体关键部位的统计结果比较获得所述待检测图像中的人数。
  2. 如权利要求1所述的基于深度神经网络的人数统计方法,其特征在于,所述获取待检测图像,包括:
    获取待监控人群的视频;
    从所述视频的图像序列中逐一选择一帧图像以作为所述待检测图像。
  3. 如权利要求2所述的基于深度神经网络的人数统计方法,其特征在于,所述根据所述待检测图像得到背景图像和前景图像,包括:
    对所述待检测图像进行区域检测,根据区域检测结果构建背景模型;
    判断所述背景模型是否包括所述视频的所有背景信息,所述背景信息为非人物体的图像信息;
    若判断结果为是,则将所述背景模型中的所有背景信息作为所述背景图像,所述背景图像包括所述视频中所有非人物体的图像信息;
    若判断结果为否,则对下一帧的待检测图像进行区域检测,根据区域检测结果更新所述背景模型,直至判断所述背景模型包括所述视频的所有背景信息;
    根据所述背景图像对所述待检测图像进行背景差分处理,获得所述前景图像,所述前景图像包括所述待检测图像中所有人体的图像信息。
  4. 如权利要求3所述的基于深度神经网络的人数统计方法,其特征在于,所述对所述待检测图像进行区域检测,根据区域检测结果构建背景模型,包括:
    将所述待检测图像输入基于YOLO V3的物体检测模型,获得有人区域和无人区域;
    构建一与所述待检测图像具有像素一一对应关系的背景模型,将所述背景模型中与所述无人区域相对应像素点的像素值设定为所述无人区 域中各像素点的像素值,将所述背景模型中与所述有人区域相对应像素点的像素值设定为第一值。
  5. 如权利要求4所述的基于深度神经网络的人数统计方法,其特征在于,所述判断所述背景模型是否包括所述视频的所有背景信息,包括:
    判断所述背景模型中是否存在第一值的像素点,若不存在,则认为所述背景模型包括所述视频的所有背景信息,反之,则认为所述背景模型未包括所述视频的所有背景信息。
  6. 如权利要求5所述的基于深度神经网络的人数统计方法,其特征在于,所述若判断结果为否,则对下一帧的待检测图像进行区域检测,根据区域检测结果更新背景模型,直至判断所述背景模型包括所述视频的所有背景信息,包括:
    判断结果为否,则将下一帧的待检测图像输入基于YOLO V3的物体检测模型,获得新无人区域;
    根据新无人区域中各像素点的像素值,对所述背景模型中与新无人区域相对应像素点的像素值进行更新,以消除所述背景模型中与新无人区域相对应像素点之中存在的第一值;
    重复更新所述背景模型,直至判断所述背景模型中不存在第一值的像素点。
  7. 如权利要求1所述的基于深度神经网络的人数统计方法,其特征在于,所述对所述前景图像进行深度神经网络处理,以统计所述前景图像中人体关键部位的数量,包括:
    设置深度神经网络的拓扑结构;
    获取标定人体关键部位的训练数据;
    根据所述训练数据训练所述拓扑结构的模型参数;
    将所述前景图像输入所述拓扑结构,根据所述拓扑结构在所述训练数据上的分布特征构建所述前景图像中的人体关键部位;
    获取所述前景图像中人体关键部位的数量。
  8. 如权利要求7所述的基于深度神经网络的人数统计方法,其特征在于,所述拓扑结构包括串联连接的滤波器、多个卷积结构、通道和softmax函数处理器。
  9. 如权利要求7所述的基于深度神经网络的人数统计方法,其特 征在于,所述根据人体关键部位的统计结果比较获得所述待检测图像中的人数,包括:
    分别统计每种人体关键部位的数量,根据每种人体部位的数量计算该人体关键部位对应的人数;
    获取各种人体关键部位对应的人数的最大值,将最大值作为所述待检测图像中的人数。
  10. 一种基于深度神经网络的人数统计装置,其特征在于,包括:
    待检测图像获取单元,用于获取待检测图像;
    前景背景图像获取单元,用于根据所述待检测图像得到背景图像和前景图像;
    神经网络处理单元,用于对所述前景图像进行深度神经网络处理,以统计所述前景图像中人体关键部位的数量;
    人数统计单元,用于根据人体关键部位的统计结果比较获得所述待检测图像中的人数。
  11. 如权利要求10所述的基于深度神经网络的人数统计装置,其特征在于,还包括显示单元;
    所述显示单元用于实时显示所述待检测图像以及当前待检测图像中的人数数值。
  12. 一种计算机可读存储介质,其特征在于,包括程序,所述程序能够被处理器执行以实现如权利要求1-9中任一项所述的方法。
PCT/CN2018/091569 2018-05-04 2018-06-15 一种基于深度神经网络的人数统计方法及装置、存储介质 WO2019210555A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810420933.4A CN108830145B (zh) 2018-05-04 2018-05-04 一种基于深度神经网络的人数统计方法及存储介质
CN201810420933.4 2018-05-04

Publications (1)

Publication Number Publication Date
WO2019210555A1 true WO2019210555A1 (zh) 2019-11-07

Family

ID=64147419

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/091569 WO2019210555A1 (zh) 2018-05-04 2018-06-15 一种基于深度神经网络的人数统计方法及装置、存储介质

Country Status (2)

Country Link
CN (1) CN108830145B (zh)
WO (1) WO2019210555A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111353377A (zh) * 2019-12-24 2020-06-30 浙江工业大学 一种基于深度学习的电梯乘客数检测方法
CN111444896A (zh) * 2020-05-09 2020-07-24 北京碧拓科技有限公司 一种远红外热成像人体经络关键点定位方法
CN111950519A (zh) * 2020-08-27 2020-11-17 重庆科技学院 基于检测与密度估计的双列卷积神经网络人群计数方法
CN112001274A (zh) * 2020-08-06 2020-11-27 腾讯科技(深圳)有限公司 人群密度确定方法、装置、存储介质和处理器
CN113239772A (zh) * 2021-05-07 2021-08-10 南京甄视智能科技有限公司 自助银行或atm环境中的人员聚集预警方法与系统
CN113688925A (zh) * 2021-08-31 2021-11-23 惠州学院 出勤人数识别方法、电子设备及存储介质

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109598220B (zh) * 2018-11-26 2021-07-30 山东大学 一种基于多元输入多尺度卷积的人数统计方法
CN109886085A (zh) * 2019-01-03 2019-06-14 四川弘和通讯有限公司 基于深度学习目标检测的人群计数方法
CN110348422B (zh) * 2019-07-18 2021-11-09 北京地平线机器人技术研发有限公司 图像处理方法、装置、计算机可读存储介质和电子设备
JP7118934B2 (ja) * 2019-09-04 2022-08-16 株式会社東芝 物体数推定装置、物体数推定方法、および物体数推定プログラム
CN110765964B (zh) * 2019-10-30 2022-07-15 常熟理工学院 基于计算机视觉的电梯轿厢内异常行为的检测方法
CN112101287B (zh) * 2020-09-25 2023-11-28 北京市商汤科技开发有限公司 一种图像处理方法、装置、设备和存储介质
CN113139481B (zh) * 2021-04-28 2023-09-01 广州大学 基于yolov3的教室人数统计方法
CN113268024B (zh) * 2021-05-14 2023-10-13 广东工业大学 一种智能教室监管系统和方法
CN114495395A (zh) * 2021-12-24 2022-05-13 深圳市天视通视觉有限公司 一种人形检测方法、监控预警方法、装置及系统

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104318263A (zh) * 2014-09-24 2015-01-28 南京邮电大学 一种实时高精度人流计数方法
CN105447458A (zh) * 2015-11-17 2016-03-30 深圳市商汤科技有限公司 一种大规模人群视频分析系统和方法

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101777114B (zh) * 2009-01-08 2013-04-24 北京中星微电子有限公司 视频监控智能分析系统和方法及头肩检测跟踪系统和方法
CN102682291B (zh) * 2012-05-07 2016-10-05 深圳市贝尔信智能系统有限公司 一种场景人数统计方法、装置和系统
CN103077380B (zh) * 2013-01-07 2016-06-29 信帧电子技术(北京)有限公司 一种基于视频的人数统计方法及装置
CN104361327B (zh) * 2014-11-20 2018-09-18 苏州科达科技股份有限公司 一种行人检测方法和系统
CN105069413B (zh) * 2015-07-27 2018-04-06 电子科技大学 一种基于深度卷积神经网络的人体姿势识别方法
CN106570440A (zh) * 2015-10-09 2017-04-19 株式会社日立制作所 基于图像分析的人数统计方法和人数统计装置
CN105740892A (zh) * 2016-01-27 2016-07-06 北京工业大学 一种高准确率的基于卷积神经网络的人体多部位识别方法
CN105787439B (zh) * 2016-02-04 2019-04-05 广州新节奏智能科技股份有限公司 一种基于卷积神经网络的深度图像人体关节定位方法
WO2017206005A1 (zh) * 2016-05-30 2017-12-07 中国石油大学(华东) 一种基于光流检测和身体部分模型的多人姿态识别系统
CN107145821A (zh) * 2017-03-23 2017-09-08 华南农业大学 一种基于深度学习的人群密度检测方法和系统
CN107103299B (zh) * 2017-04-21 2020-03-06 天津大学 一种监控视频中的人数统计方法

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104318263A (zh) * 2014-09-24 2015-01-28 南京邮电大学 一种实时高精度人流计数方法
CN105447458A (zh) * 2015-11-17 2016-03-30 深圳市商汤科技有限公司 一种大规模人群视频分析系统和方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LI YANLIANG: "research on pedestrian detection and density estimation", CHINESE MASTER'S, no. 2, 15 February 2018 (2018-02-15) *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111353377A (zh) * 2019-12-24 2020-06-30 浙江工业大学 一种基于深度学习的电梯乘客数检测方法
CN111444896A (zh) * 2020-05-09 2020-07-24 北京碧拓科技有限公司 一种远红外热成像人体经络关键点定位方法
CN111444896B (zh) * 2020-05-09 2023-06-30 北京碧拓科技有限公司 一种远红外热成像人体经络关键点定位方法
CN112001274A (zh) * 2020-08-06 2020-11-27 腾讯科技(深圳)有限公司 人群密度确定方法、装置、存储介质和处理器
CN112001274B (zh) * 2020-08-06 2023-11-17 腾讯科技(深圳)有限公司 人群密度确定方法、装置、存储介质和处理器
CN111950519A (zh) * 2020-08-27 2020-11-17 重庆科技学院 基于检测与密度估计的双列卷积神经网络人群计数方法
CN113239772A (zh) * 2021-05-07 2021-08-10 南京甄视智能科技有限公司 自助银行或atm环境中的人员聚集预警方法与系统
CN113239772B (zh) * 2021-05-07 2022-09-06 南京甄视智能科技有限公司 自助银行或atm环境中的人员聚集预警方法与系统
CN113688925A (zh) * 2021-08-31 2021-11-23 惠州学院 出勤人数识别方法、电子设备及存储介质
CN113688925B (zh) * 2021-08-31 2023-10-24 惠州学院 出勤人数识别方法、电子设备及存储介质

Also Published As

Publication number Publication date
CN108830145A (zh) 2018-11-16
CN108830145B (zh) 2021-08-24

Similar Documents

Publication Publication Date Title
WO2019210555A1 (zh) 一种基于深度神经网络的人数统计方法及装置、存储介质
CN108764085B (zh) 基于生成对抗网络的人群计数方法
US9547908B1 (en) Feature mask determination for images
CN109284733B (zh) 一种基于yolo和多任务卷积神经网络的导购消极行为监控方法
CN104424634B (zh) 对象跟踪方法和装置
CN110210276A (zh) 一种移动轨迹获取方法及其设备、存储介质、终端
US8692830B2 (en) Automatic avatar creation
CN109344702B (zh) 基于深度图像和彩色图像的行人检测方法及装置
US10186040B2 (en) Systems and methods for detection of significant and attractive components in digital images
CN105279769B (zh) 一种联合多特征的层次粒子滤波跟踪方法
CN106874826A (zh) 人脸关键点跟踪方法和装置
TW202026948A (zh) 活體檢測方法、裝置以及儲存介質
WO2019071976A1 (zh) 基于区域增长和眼动模型的全景图像显著性检测方法
CN104700405B (zh) 一种前景检测方法和系统
CN110827304B (zh) 一种基于深度卷积网络与水平集方法的中医舌像定位方法和系统
CN110825900A (zh) 特征重构层的训练方法、图像特征的重构方法及相关装置
CN110807759A (zh) 照片质量的评价方法及装置、电子设备、可读存储介质
WO2020171379A1 (en) Capturing a photo using a mobile device
Venkatesan et al. Face recognition system with genetic algorithm and ANT colony optimization
US11974050B2 (en) Data simulation method and device for event camera
CN111339902A (zh) 一种数显仪表的液晶屏示数识别方法及装置
CN111444555B (zh) 一种测温信息显示方法、装置及终端设备
US9940543B2 (en) Control of computer vision pre-processing based on image matching using structural similarity
CN116977674A (zh) 图像匹配方法、相关设备、存储介质及程序产品
CN116824641B (zh) 姿态分类方法、装置、设备和计算机存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18917067

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18917067

Country of ref document: EP

Kind code of ref document: A1