WO2021093276A1 - 人群计数系统中基于可变形高斯核的训练数据生成方法 - Google Patents

人群计数系统中基于可变形高斯核的训练数据生成方法 Download PDF

Info

Publication number
WO2021093276A1
WO2021093276A1 PCT/CN2020/086534 CN2020086534W WO2021093276A1 WO 2021093276 A1 WO2021093276 A1 WO 2021093276A1 CN 2020086534 W CN2020086534 W CN 2020086534W WO 2021093276 A1 WO2021093276 A1 WO 2021093276A1
Authority
WO
WIPO (PCT)
Prior art keywords
gaussian kernel
training data
gaussian
kernel
center point
Prior art date
Application number
PCT/CN2020/086534
Other languages
English (en)
French (fr)
Inventor
刘阳
倪国栋
胡卫明
李兵
沈志忠
孔祥斌
Original Assignee
通号通信信息集团有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 通号通信信息集团有限公司 filed Critical 通号通信信息集团有限公司
Priority to US17/607,236 priority Critical patent/US20220222930A1/en
Publication of WO2021093276A1 publication Critical patent/WO2021093276A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06V10/7747Organisation of the process, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/245Aligning, centring, orientation detection or correction of the image by locating a pattern; Special marks for positioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the invention relates to the field of pattern recognition, in particular to a method for generating training data based on a deformable Gaussian kernel in a crowd counting system applied in the direction of computer vision.
  • the basic method commonly used in crowd behavior analysis is the crowd counting system based on deep learning of convolutional neural network.
  • the main principle is: through a lot of training, the convolutional neural network can automatically learn the main characteristics of the human head (such as approximately circular shape). , Darker hair relative to the background, etc.), and finally the convolution map output by the network and the pre-made two-dimensional Gaussian kernel density function similar to the shape of the human head (hereinafter referred to as the Gaussian kernel) to represent each person The contrast difference of the crowd density map where the head is located.
  • the system uses its estimated value of the total number of people and the actual value in the training data, as well as the difference between the convolution map output by the network and the crowd density map in the training data, as the basis for the reverse error propagation of the network, and finally through iteration, Modify the relevant parameters in the network to train the network's ability to recognize human head shape targets.
  • the training data generation methods of the crowd counting system all use a two-dimensional Gaussian kernel density function, and use each head position coordinate as the center point to generate a simulated head shape for training in the screen to achieve a better training effect.
  • the most critical step is to use the two-dimensional coordinates of the human head as the center point to generate the corresponding Gaussian kernel.
  • the continuous type The expression of the two-dimensional Gaussian function is shown below:
  • (x 0 ,y 0 ) is the position of the center point of the function, that is, the coordinates of the human head.
  • ⁇ x and ⁇ y are the variance of the function in the x-axis direction and the y-axis direction, respectively.
  • a discrete Gaussian kernel with a scale of (2k+1)*(2k+1) can be expressed as:
  • A is a constant set to make the Gaussian kernel gray value of each pixel in the Gaussian kernel cut-off area equal to 1 after integration, and its value is not necessarily equal to the formula 1
  • the value of the item needs to be adjusted according to the actual situation.
  • the purpose of the adjustment is to make the sum of the gray values of the discrete pixels belonging to the Gaussian kernel corresponding to the same human head to be 1. Therefore, the calculation method is as follows:
  • the formula 3 is called: the discrete Gaussian kernel expression of the traditional crowd counting system.
  • the system repeats the above process for the coordinates of each human head in the training data, and then draws the gray values of all discrete pixels of the Gaussian kernel generated in a superimposed manner on the same picture, and the training data generation is completed.
  • the two overlapping heads are represented by uniformly round Gaussian kernels, and the head in the front position in the original crowd picture has a relatively complete and clear round edge, which is similar to the training data.
  • the shape of the circular Gaussian kernel is very similar, and the convolutional neural network can easily recognize it by learning and training data; while the occluded head at the back shows different degrees of crescent shape according to the degree of overlap, and its visual The center of gravity moves to the non-overlapping part.
  • you continue to use the circular Gaussian kernel due to the nature of the Gaussian kernel, the closer it is to the center of the circle, the greater the gray value.
  • the visual center of gravity of the two overlapping circular Gaussian kernels is located near the line of their central points.
  • the visual center of the occluded human head in the original image will not only be inconsistent with the visual center of gravity of the corresponding occluded Gaussian kernel in the training data crowd density map, but also easily merge with the Gaussian kernel corresponding to the occluded human head.
  • the output crowd density map is inaccurate, and the error of crowd counting is large.
  • the purpose of the present invention is to provide a training data generation method based on a deformable Gaussian kernel in a crowd counting system, which effectively increases the feature similarity between the crowd density map of the training data and the real image, so that the convolutional nerve It is easier for the network to learn the rules between training data and real images, which improves the accuracy of the crowd counting system.
  • a method for generating training data based on a deformable Gaussian kernel in a crowd counting system which includes the following steps: 1) Find a set of overlapping Gaussian kernels from the training data; 2) Stretch the occluded Gaussian kernel; 3) Rotate the occluded Gaussian kernel; 4) Adjust the center point coordinates of the occluded Gaussian kernel; 5) Determine whether there are any unselected Gaussian kernels in the training data Gaussian kernel, the obtained crowd density map with gray value is output as training data.
  • step 1) read the center point coordinates of each Gaussian kernel in the training data generated by the Gaussian kernel of the traditional crowd counting system in turn, record the Gaussian kernel as the selected Gaussian kernel, and find out The coordinates of the center point of the nearest Gaussian kernel; if the geometric distance between the center point coordinates is less than the sum of the variances, it is considered that the Gaussian kernel of the two training data corresponds to the head in the original picture There is an overlap between the two Gaussian kernels.
  • each Gaussian kernel is only judged whether it overlaps with the other one with the closest geometric distance from its own center point coordinates. If the result of the judgment is that they overlap each other, they are all regarded as the selected Gaussian kernels, and then Go to step 2); otherwise, go to step 5).
  • step 2) for the Gaussian kernels a and b that are judged to overlap each other, if the variance of one Gaussian kernel a is greater than the variance of the other Gaussian kernel b, it is considered that the head of a corresponding to a is away from the shot crowd.
  • the straight-line distance of the camera is closer, and a blocks b in the picture.
  • the axial direction of the Gaussian kernel coordinates telescopic b is the Gaussian kernel along two variance decomposition were x-axis and y-axis are independent with variance ⁇ b_x component ⁇ b_y, and the x-axis direction as a default Gaussian
  • the Gaussian kernel b needs to be The center point is the origin, which is rotated by the angle ⁇ in the counterclockwise direction; the coordinates (x, y) of the points belonging to the crowd density map are transformed according to the coordinate transformation rule of the plane rectangular coordinate system rotated by the angle ⁇ in the counterclockwise direction to obtain the The coordinates (x * , y * ) of the point in the coordinate system rotated by the occluded Gaussian kernel b; substitute the coordinates of the point in the coordinate system rotated by the occluded Gaussian kernel b into the stretched discrete Gaussian kernel Expression, get the discrete Gaussian kernel expression after expansion and rotation.
  • the center point coordinates of the Gaussian kernel b are moved along the direction of the line connecting the center points of the Gaussian kernels a and b toward the Gaussian kernel b, and the moving distance is equal to the Gaussian kernel b in step 2)
  • step 5 if there are Gaussian kernels that have not been selected in step 1) in the training data, then go to step 1); otherwise, for each pixel in the crowd density map, assign it to Add the gray values of the Gaussian kernel, and output the obtained crowd density map with gray values as training data, and the end.
  • the present invention has the following advantages due to the above technical solutions: 1.
  • the visual center of gravity after the circular Gaussian kernel is occluded is often the same as the actual picture.
  • the visual center of gravity of the occluded head is inconsistent, and the center points of the convolution kernel corresponding to the overlapping heads are not separated enough, which makes it difficult for the convolutional neural network of the crowd counting system to learn the characteristics of the occluded head during training. , which ultimately leads to poor training effects and large errors in the crowd density map output by the system, which affects the accuracy of crowd counting.
  • the present invention fully excavates the information contained in the known data, utilizes the center point coordinates and variance of the Gaussian kernel in the training data to the greatest extent, expands, rotates, and adjusts the center point coordinates of the occluded Gaussian kernel, which is effective
  • the feature similarity between the crowd density map of the training data and the real image is increased to make it easier for the convolutional neural network to learn the rules between the training data and the real image, and the accuracy of the crowd counting system is improved.
  • the present invention can be directly nested in the Gaussian kernel generation method of the traditional crowd counting system, effectively sharing the convolutional neural network structure and input data with the traditional method, and there is basically no need to modify the main workflow of the original crowd counting system , The amount of work is small. 3.
  • each Gaussian kernel only judges whether it overlaps with the other closest to itself, and only when they are judged to overlap each other will the subsequent expansion, rotation, and center point of the Gaussian kernel be performed Adjust and other steps to ensure that the Gaussian kernel of each head is deformed at most once, and the algorithm complexity will not increase exponentially as the number of Gaussian kernels in the training data increases. 4.
  • the Gaussian kernel deformation step has rigorous mathematical principles to ensure that the integral value of the Gaussian kernel after the deformation is still 1, that is, the Gaussian kernel after the deformation can still be used to calculate the number of its corresponding human heads. 5. All steps of the present invention are fully automated, and no additional operation by the user is required during the execution process, and no need for the user to re-measure relevant data, which saves manpower, material resources and time costs.
  • Figure 1 is a schematic diagram of the overall flow of the present invention
  • Figure 2 is a schematic diagram of the rotation principle of the plane coordinate system
  • Figure 3 is an effect diagram of a deformable Gaussian kernel.
  • the present invention provides a method for generating training data based on a deformable Gaussian kernel in a crowd counting system, which includes the following steps:
  • the center point coordinates are (x 0_a ,y 0_a ) and (x 0_b ,y 0_b ), and the variances of their respective Gaussian kernel density functions are ⁇ a and ⁇ b (because the original training data
  • the Gaussian kernel of is circular in the two-dimensional coordinate system, so here a single variance value is used for each Gaussian kernel). If the geometric distance between the center point coordinates is less than the sum of the variances, the two The Gaussian kernel of the training data overlaps between the corresponding heads in the original picture, that is, the two Gaussian kernels overlap.
  • Each Gaussian kernel is only judged whether it overlaps with the other one with the closest geometrical distance from its own center point coordinates. If the result of the judgment is that they overlap each other, they will be regarded as the selected Gaussian kernels, and then go to Step 2); otherwise, go to step 5).
  • Gaussian kernels a and b that are judged to overlap each other, if the variance of one Gaussian kernel a is greater than the variance of the other Gaussian kernel b, that is, ⁇ a > ⁇ b , then it is considered that the head of a corresponding to a is away from the camera that shoots the crowd. The straight-line distance is closer, a occludes b in the screen.
  • the variance of the Gaussian kernel b are respectively divided into two along the x-axis and y-axis are independent with variance ⁇ b_x component ⁇ b_y, and default x-axis direction as the direction of the Gaussian kernel and a center coordinate of connection b. According to the following formula, reduce the variance component of the occluded Gaussian kernel b along the x-axis direction, and keep the variance component of the Gaussian kernel b along the y-axis direction unchanged, to obtain the stretched Gaussian kernel b along the x-axis Two independent variances with the y axis versus
  • connection direction between the center points of the Gaussian kernel a and b may be any direction in the two-dimensional coordinate system of the crowd density map screen, and it may not necessarily be the x-axis direction of b itself.
  • the Gaussian kernel b needs to be counterclockwise with its center point as the origin. The direction is rotated by the angle ⁇ (as shown in Figure 2).
  • the visual effect of the corresponding human head in the original picture is that the geometric center of gravity of the unoccluded part (that is, the visible part) has actually changed, that is, along the human head in its center
  • the dotted line moves in the direction corresponding to the head of the Gaussian core b.
  • the moving distance is equal to step 2)
  • the value of the variance of the middle Gaussian kernel b that decreases along the x-axis The adjustment of the center point coordinates of the occluded Gaussian kernel b is completed through the above operations. Center point coordinates adjusted by Gaussian kernel b as follows:
  • step 1 If there is a Gaussian kernel that has not been selected in step 1) in the training data, go to step 1); otherwise, for each pixel in the crowd density map, add the gray value of the Gaussian kernel to which it belongs, And output the obtained crowd density map with gray value as training data, the end.
  • the present invention uses a deformable Gaussian kernel to replace the fixed circular Gaussian kernel in the traditional method.
  • the traditional circle is occluded by each other, it is considered that Gaussian and the corresponding human head are occluded by each other.
  • the present invention adjusts the visual center of gravity of the deformed Gaussian kernel in the crowd density map to the original image by using the deformation methods such as expansion, rotation, and center point coordinate adjustment.
  • the visual center of gravity of the part is basically the same, and the separation between the visual center of gravity of the Gaussian kernels that are mutually occluded is also increased, which is conducive to the learning of the features of the occluded human head by the convolutional neural network.
  • the present invention ensures that the integral value of each Gaussian kernel in the training data is still 1, which still meets the requirement of crowd counting.
  • the improvement of the present invention the consistency of the target crowd density map in the training data and the characteristic law in the actual picture is ensured, the training effect of the convolutional neural network is enhanced, and the accuracy of the crowd counting system is finally improved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

一种人群计数系统中基于可变形高斯核的训练数据生成方法,其步骤:从训练数据中找出一组相互重叠的高斯核;对被遮挡的高斯核进行伸缩;对被遮挡的高斯核进行旋转;对被遮挡的高斯核的中心点坐标进行调整;判断训练数据中是否还有未被选取过的高斯核,得到的带灰度数值的人群密度图作为训练数据输出。有效地增加了训练数据的人群密度图与真实图像的特征相似性,使卷积神经网络更容易学习到训练数据与真实图像之间的规律,提高了人群计数系统的精确性。可以广泛在计算机视觉方向应用。

Description

人群计数系统中基于可变形高斯核的训练数据生成方法 技术领域
本发明涉及一种模式识别领域,特别是关于一种在计算机视觉方向应用的人群计数系统中基于可变形高斯核的训练数据生成方法。
背景技术
近年来人群行为分析常用的基础方法是基于卷积神经网络深度学习的人群计数系统,其主要原理是:通过大量的训练,让卷积神经网络自动学习人类头部的主要特征(例如近似圆形,相对于背景来说颜色较深的头发等),最终将网络输出的卷积图与事先做好的、使用类似于人头形状的二维高斯核密度函数(以下简称高斯核)来表示每个人头所在的位置的人群密度图的对比差异。因为单个高斯核在人群密度图中的各个像素点处数值的积分之和为1,所以只需要统计输出的人群密度图中属于各个高斯核的像素点处数值的积分总和,系统就可以得到对原始画面中总人数的估计数值。系统将其对总人数的估计数值与训练数据中的实际数值,以及网络输出的卷积图与训练数据中的人群密度图之间的差异,作为网络反向误差传播的依据,最终通过迭代,修改网络中的相关参数,训练网络对于人头形状目标的识别能力。
鉴于现有的绝大部分人群计数数据库中只给出图片中的人头二维坐标作为训练数据(即训练算法去完成的目标),为了便于系统将输出的人群密度图与训练数据进行误差比对,优化卷积神经网络的训练效果,系统需要将训练数据中每个人头的二维坐标,转化为画面中类似人头的形状。因此,人群计数系统的训练数据生成方法,均采用二维高斯核密度函数,以每个人头位置坐标为中心点,在画面中生成用于训练的模拟人头形状,以达到对更好训练效果。
如上所述,人群计数系统的训练数据生成中,最关键的一个步骤就是以人头的二维坐标为中心点,生成与之对应的高斯核,为了解释高斯核的具体生成方法,首先将连续型二维高斯函数的表达式展示如下:
Figure PCTCN2020086534-appb-000001
其中(x 0,y 0)为该函数的中心点位置,即人头坐标。σ x与σ y分别为该函数在x轴方向与y轴方向的方差。考虑到人头基本可以视作圆形,为了便于计算,上述文献中默认取σ x=σ y
于是在离散域内,一个尺度为(2k+1)*(2k+1)离散的高斯核可以被表示为:
Figure PCTCN2020086534-appb-000002
其中A为为了使高斯核截止区域内各个像素点的高斯核灰度数值积分后等于1而设置的常数,其数值并不一定等于公式1中的
Figure PCTCN2020086534-appb-000003
项的数值,需要根据实际情况加以调整,调整的目的是使得属于同一个人头head对应的那个高斯核的各个离散像素点的灰度数值相加总和为1,因此,其计算方法如下:
Figure PCTCN2020086534-appb-000004
将公式3称为:传统人群计数系统的离散高斯核表达式。系统对训练数据中的每一个人头的坐标,重复上述过程,然后将生成的所有高斯核离散像素点的灰度数值以叠加的方式绘制在同一张画面中,就完成了训练数据的生成。
然而在现实中人群图片中大量存在一个现象,即人头因为互相遮挡而出现重叠,并且按照透视关系原理,两个相互遮挡的人头在摄像机视角方向的中心点位置越接近,其相互重叠的面积比例越高。
按照现有方法,相互重叠的两个人头都用各向均等的圆形的高斯核来代表,而在原始人群图片中位置靠前的人头拥有比较完整、清晰的圆形边缘,与训练数据中的圆形高斯核的形状非常近似,卷积神经网络可以比较容易地通过学习训练数据,将其识别出来;而位置靠后被遮挡的人头依照重叠程度的大小呈现不同程度的月牙形,其视觉重心向未重叠部分移动。此时,如果继续使用圆形高斯核,由于高斯核自身的性质,其越接近圆心 处灰度数值越大,两个相互重叠的圆形高斯核的视觉重心位于其中心点连线附近。此时就会使得被遮挡的人头在原图中的视觉中心,不仅与对应的被遮挡的高斯核在训练数据人群密度图中得视觉重心不一致,还容易与遮挡其的人头对应的高斯核融合成一个整体,人群计数系统的卷积神经网络不容易通过对圆形的高斯核的训练,学习到被遮挡的人头的特征规律并将其与前方遮挡其的人头分开,最终导致训练效果差,系统输出的人群密度图不准确,人群计数的误差较大。
发明内容
针对上述问题,本发明的目的是提供一种人群计数系统中基于可变形高斯核的训练数据生成方法,其有效地增加了训练数据的人群密度图与真实图像的特征相似性,使卷积神经网络更容易学习到训练数据与真实图像之间的规律,提高了人群计数系统的精确性。
为实现上述目的,本发明采取以下技术方案:一种人群计数系统中基于可变形高斯核的训练数据生成方法,其包括以下步骤:1)从训练数据中找出一组相互重叠的高斯核;2)对被遮挡的高斯核进行伸缩;3)对被遮挡的高斯核进行旋转;4)对被遮挡的高斯核的中心点坐标进行调整;5)判断训练数据中是否还有未被选取过的高斯核,得到的带灰度数值的人群密度图作为训练数据输出。
进一步,所述步骤1)中,依次读取使用传统人群计数系统的高斯核生成的训练数据中每个高斯核的中心点坐标,将该高斯核记录为被选取过的高斯核,并找出与之距离最近的另一个高斯核的中心点坐标;若其中心点坐标之间的几何距离小于其方差相加之和,则认为这两个训练数据的高斯核在原始图片中所对应的人头之间发生重叠,即这两个高斯核之间发生重叠。
进一步,每个高斯核只与和自身中心点坐标之间的几何距离最近的另一个进行是否相互重叠的判断,若判断结果为它们相互重叠,则将它们都作为被选取过的高斯核,然后转至步骤2);否则,转至步骤5)。
进一步,所述步骤2)中,对于被判断为相互重叠的高斯核a与b,如果其中一个高斯核a的方差大于另一个高斯核b的方差,则认为是a对应的人头离拍摄人群画面摄像头的直线距离更近,在画面中a遮挡了b。
进一步,对高斯核b沿坐标轴方向进行伸缩:将高斯核b的方差分解为分别沿x轴与y轴的两个相互独立的方差分量σ b_x与σ b_y,并默认将x轴方向作为高斯核a与b中心点坐标连线的方向;按照以下公式将被遮挡的高斯核b沿着x轴方向的方差分量减小,并保持高斯核b沿着y轴方向的方差分量不变:
Figure PCTCN2020086534-appb-000005
Figure PCTCN2020086534-appb-000006
将伸缩后的高斯核b的方差代入传统人群计数系统的离散高斯核表达式,得到经过伸缩后的离散高斯核表达式。
进一步,所述步骤3)中,假设人群密度图画面的x轴正向与高斯核a与b的中心点连线指向b的一端沿着反时针方向相差角度θ,则高斯核b需要以其中心点为原点,沿着反时针方向旋转角度θ;将属于人群密度图画面中的点的坐标(x,y)按照平面直角坐标系反时针方向旋转角度θ的坐标变换规则进行变换,得到该点在被遮挡的高斯核b旋转后的坐标系中的坐标(x *,y *);将该点在被遮挡的高斯核b旋转后的坐标系中的坐标代入经过伸缩后的离散高斯核表达式,得到经过伸缩、旋转后的离散高斯核表达式。
进一步,所述坐标(x *,y *)为:
Figure PCTCN2020086534-appb-000007
Figure PCTCN2020086534-appb-000008
进一步,所述步骤4)中,将高斯核b的中心点坐标沿着高斯核a与b的中心点连线方向,向着高斯核b的方向移动,移动的距离等于步骤2)中高斯核b的方差沿着x轴减少的数值
Figure PCTCN2020086534-appb-000009
进一步,所述高斯核b调整后的中心点坐标
Figure PCTCN2020086534-appb-000010
为:
Figure PCTCN2020086534-appb-000011
Figure PCTCN2020086534-appb-000012
进一步,所述步骤5)中,如果训练数据中还有未被步骤1)选取过的高斯核,则转至步骤1);反之,则对人群密度图中每个像素点,将其所属的高斯核的灰度数值相加,并将得到的带灰度数值的人群密度图作为训练数据输出,结束。
本发明由于采取以上技术方案,其具有以下优点:1、针对传统的人群计数系统使用的圆形高斯核在处理被遮挡的人头时,圆形的高斯核被遮挡后的视觉重心经常与实际画面中被遮挡的人头的视觉重心不一致,且相互重叠的人头对应的卷积核中心点分离度不足,从而使得人群计数系统的卷积神经网络不容易在训练中学习到被遮挡的人头的特征规律,最终导致训练效果差,系统输出的人群密度图误差较大,影响人群计数的精度这一问题。本发明对已知数据中所蕴含的信息进行了充分的挖掘,最大程度地利用训练数据中高斯核的中心点坐标和方差,对被遮挡的高斯核进行伸缩、旋转、中心点坐标调整,有效地增加了训练数据的人群密度图与真实图像的特征相似性,使卷积神经网络更容易学习到训练数据与真实图像之间的规律,提高了人群计数系统的精确性。2、本发明可直接嵌套在传统人群计数系统的高斯核生成方法中,有效地与传统方法共享卷积神经网络结构和输入数据,对于原有人群计数系统的主要工作流程也基本不用进行修改,工程量小。3、在实际使用中,每个高斯核只与和自身距离最近的另一个进行是否相互重叠的判断,并且只有当它们被判断为相互重叠时才执行后续对高斯核的伸缩、旋转、中心点调整等步骤,从而保证每个人头的高斯核最多进行一次形变,算法复杂度不会随着训练数据中高斯核的数量增加呈指数上升。4、高斯核变形步骤有严谨的数学原理,保证了变形后的高斯核积分数值仍然为1,即变形后的高斯核仍然能被用来计算其对应的人头的数量。5、本发明所有步骤均全自动实现,执行过程中无需用 户的额外操作,也无需用户重新测量相关数据,节省了人力、物力和时间成本。
附图说明
图1是本发明的整体流程示意图;
图2是平面坐标系旋转原理示意图;
图3是可变形的高斯核的效果图。
本发明最佳实施方式
下面结合附图和实施例对本发明进行详细的描述。
如图1所示,本发明提供一种人群计数系统中基于可变形高斯核的训练数据生成方法,其包括以下步骤:
1)从训练数据中找出一组相互重叠的高斯核:
依次读取使用传统人群计数系统的高斯核生成的训练数据中每个高斯核的中心点坐标(即人头中心点坐标),将该高斯核记录为被选取过的高斯核,并找出与之距离最近的另一个高斯核的中心点坐标。
对于上述两个高斯核a与b中心点坐标分别为(x 0_a,y 0_a)与(x 0_b,y 0_b),其各自高斯核密度函数方差分别为σ a与σ b(因为原始训练数据中的高斯核在二维坐标系下为圆形,故此处对每个高斯核用单一的方差数值),若其中心点坐标之间的几何距离小于其方差相加之和,则认为这两个训练数据的高斯核在原始图片中所对应的人头之间发生重叠,即这两个高斯核之间发生重叠。
Figure PCTCN2020086534-appb-000013
每个高斯核只与和自身中心点坐标之间的几何距离最近的另一个进行是否相互重叠的判断,若判断结果为它们相互重叠,则将它们都作为被选取过的高斯核,然后转至步骤2);否则,转至步骤5)。
2)对被遮挡的高斯核进行伸缩:
对于被判断为相互重叠的高斯核a与b,如果其中一个高斯核a的方差大于另一个高斯核b的方差,即σ a>σ b,则认为是a对应的人头离拍摄人群画面摄像头的直线距离更近,在画面中a遮挡了b。
此时,需要对高斯核b沿坐标轴方向进行伸缩。将高斯核b的方差分解 为分别沿x轴与y轴的两个相互独立的方差分量σ b_x与σ b_y,并默认将x轴方向作为高斯核a与b中心点坐标连线的方向。按照以下公式将被遮挡的高斯核b沿着x轴方向的方差分量减小,并保持高斯核b沿着y轴方向的方差分量不变,得到对高斯核b的经过伸缩后的沿x轴与y轴的两个相互独立的方差
Figure PCTCN2020086534-appb-000014
Figure PCTCN2020086534-appb-000015
Figure PCTCN2020086534-appb-000016
Figure PCTCN2020086534-appb-000017
将伸缩后的高斯核b的方差代入传统人群计数系统的离散高斯核表达式(公式3),就可以得到经过伸缩后的离散高斯核表达式。
3)对被遮挡的高斯核进行旋转:
在实际中,高斯核a与b的中心点连线方向可能为人群密度图画面的二维坐标系中的任意方向,其不一定为b自身的x轴方向。假设人群密度图画面的x轴正向与高斯核a与b的中心点连线指向b的一端沿着反时针方向相差角度θ,则高斯核b需要以其中心点为原点,沿着反时针方向旋转角度θ(如图2所示)。
将属于人群密度图画面中的点的坐标(x,y)按照平面直角坐标系反时针方向旋转角度θ的坐标变换规则进行变换,就得到该点在被遮挡的高斯核b旋转后的坐标系中的坐标(x *,y *)。将该点在被遮挡的高斯核b旋转后的坐标系中的坐标代入经过伸缩后的离散高斯核表达式,得到经过伸缩、旋转后的离散高斯核表达式。
Figure PCTCN2020086534-appb-000018
Figure PCTCN2020086534-appb-000019
4)对被遮挡的高斯核的中心点坐标进行调整:
由于高斯核b被a遮挡,其对应的人头在原始图片中的视觉效果是,其未被遮挡部分(即可视部分)的几何重心点实际上也发生了变化,即沿着 人头在其中心点连线向高斯核b对应人头的方向移动。为了保证人群密度图的视觉特征与原始图片接近,将高斯核b的中心点坐标沿着高斯核a与b的中心点连线方向,向着高斯核b的方向移动,移动的距离等于步骤2)中高斯核b的方差沿着x轴减少的数值
Figure PCTCN2020086534-appb-000020
通过上述操作完成对于对被遮挡的高斯核b的中心点坐标的调整。高斯核b调整后的中心点坐标
Figure PCTCN2020086534-appb-000021
如下:
Figure PCTCN2020086534-appb-000022
Figure PCTCN2020086534-appb-000023
将被遮挡的高斯核b的调整后中心点坐标代入经过伸缩、旋转后的离散高斯核表达式,得到经过伸缩、旋转、中心点坐标调整后的离散高斯核表达式。可变形的高斯核的效果如图3所示。
5)判断训练数据中是否还有未被选取过的高斯核:
如果训练数据中还有未被步骤1)选取过的高斯核,则转至步骤1);反之,则对人群密度图中每个像素点,将其所属的高斯核的灰度数值相加,并将得到的带灰度数值的人群密度图作为训练数据输出,结束。
综上,本发明使用可变形的高斯核代替传统方法中固定的圆形高斯核。当判断出传统的圆形存在相互遮挡的现象时,其认为高斯和对应的人头发生了相互遮挡。对于被遮挡的高斯核,本发明通过依次使用伸缩、旋转、中心点坐标调整等变形方法,将变形后的高斯核在人群密度图中的视觉重心调整到与原始图片中被遮挡的人头露出来的部分的视觉重心基本一致,同时还增加了相互遮挡的高斯核的视觉重心之间的分离度,有利于卷积神经网络对于被遮挡的人头的特征学习。因为被遮挡的高斯核的完整性没有被破坏,所以本发明保证了训练数据中每个高斯核积分后的数值仍然为1,仍然满足人群计数的要求。通过本发明的改进,保证了训练数据中作为目标的人群密度图与实际图片中的特征规律的一致性,增强了卷积神经网络的训练效果,最终提高了人群计数系统的精确性。
上述各实施例仅用于说明本发明,各个步骤都是可以有所变化的,在本发明技术方案的基础上,凡根据本发明原理对个别步骤进行的改进和等同变换,均不应排除在本发明的保护范围之外。

Claims (10)

  1. 一种人群计数系统中基于可变形高斯核的训练数据生成方法,其特征在于包括以下步骤:
    1)从训练数据中找出一组相互重叠的高斯核;
    2)对被遮挡的高斯核进行伸缩;
    3)对被遮挡的高斯核进行旋转;
    4)对被遮挡的高斯核的中心点坐标进行调整;
    5)判断训练数据中是否还有未被选取过的高斯核,得到的带灰度数值的人群密度图作为训练数据输出。
  2. 如权利要求1所述训练数据生成方法,其特征在于:所述步骤1)中,依次读取使用传统人群计数系统的高斯核生成的训练数据中每个高斯核的中心点坐标,将该高斯核记录为被选取过的高斯核,并找出与之距离最近的另一个高斯核的中心点坐标;若其中心点坐标之间的几何距离小于其方差相加之和,则认为这两个训练数据的高斯核在原始图片中所对应的人头之间发生重叠,即这两个高斯核之间发生重叠。
  3. 如权利要求2所述训练数据生成方法,其特征在于:每个高斯核只与和自身中心点坐标之间的几何距离最近的另一个进行是否相互重叠的判断,若判断结果为它们相互重叠,则将它们都作为被选取过的高斯核,然后转至步骤2);否则,转至步骤5)。
  4. 如权利要求1所述训练数据生成方法,其特征在于:所述步骤2)中,对于被判断为相互重叠的高斯核a与b,如果其中一个高斯核a的方差大于另一个高斯核b的方差,则认为是a对应的人头离拍摄人群画面摄像头的直线距离更近,在画面中a遮挡了b。
  5. 如权利要求4所述训练数据生成方法,其特征在于:对高斯核b沿坐标轴方向进行伸缩:
    将高斯核b的方差分解为分别沿x轴与y轴的两个相互独立的方差分量σ b_x与σ b_y,并默认将x轴方向作为高斯核a与b中心点坐标连线的方向;按照以下公式将被遮挡的高斯核b沿着x轴方向的方差分量减小,并保持高斯核b沿着y轴方向的方差分量不变:
    Figure PCTCN2020086534-appb-100001
    Figure PCTCN2020086534-appb-100002
    将伸缩后的高斯核b的方差代入传统人群计数系统的离散高斯核表达式,得到经过伸缩后的离散高斯核表达式。
  6. 如权利要求1所述训练数据生成方法,其特征在于:所述步骤3)中,假设人群密度图画面的x轴正向与高斯核a与b的中心点连线指向b的一端沿着反时针方向相差角度θ,则高斯核b需要以其中心点为原点,沿着反时针方向旋转角度θ;将属于人群密度图画面中的点的坐标(x,y)按照平面直角坐标系反时针方向旋转角度θ的坐标变换规则进行变换,得到该点在被遮挡的高斯核b旋转后的坐标系中的坐标(x *,y *);将该点在被遮挡的高斯核b旋转后的坐标系中的坐标代入经过伸缩后的离散高斯核表达式,得到经过伸缩、旋转后的离散高斯核表达式。
  7. 如权利要求6所述训练数据生成方法,其特征在于:所述坐标(x *,y *)为:
    Figure PCTCN2020086534-appb-100003
    Figure PCTCN2020086534-appb-100004
  8. 如权利要求1所述训练数据生成方法,其特征在于:所述步骤4)中,将高斯核b的中心点坐标沿着高斯核a与b的中心点连线方向,向着高斯核b的方向移动,移动的距离等于步骤2)中高斯核b的方差沿着x轴减少的数值
    Figure PCTCN2020086534-appb-100005
  9. 如权利要求8所述训练数据生成方法,其特征在于:所述高斯核b调整后的中心点坐标
    Figure PCTCN2020086534-appb-100006
    为:
    Figure PCTCN2020086534-appb-100007
    Figure PCTCN2020086534-appb-100008
  10. 如权利要求1所述训练数据生成方法,其特征在于:所述步骤5)中,如果训练数据中还有未被步骤1)选取过的高斯核,则转至步骤1);反之,则对人群密度图中每个像素点,将其所属的高斯核的灰度数值相加,并将得到的带灰度数值的人群密度图作为训练数据输出,结束。
PCT/CN2020/086534 2019-11-12 2020-04-24 人群计数系统中基于可变形高斯核的训练数据生成方法 WO2021093276A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/607,236 US20220222930A1 (en) 2019-11-12 2020-04-24 Method for generating training data on basis of deformable gaussian kernel in population counting system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911098587.3 2019-11-12
CN201911098587.3A CN111027389B (zh) 2019-11-12 2019-11-12 人群计数系统中基于可变形高斯核的训练数据生成方法

Publications (1)

Publication Number Publication Date
WO2021093276A1 true WO2021093276A1 (zh) 2021-05-20

Family

ID=70201415

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/086534 WO2021093276A1 (zh) 2019-11-12 2020-04-24 人群计数系统中基于可变形高斯核的训练数据生成方法

Country Status (3)

Country Link
US (1) US20220222930A1 (zh)
CN (1) CN111027389B (zh)
WO (1) WO2021093276A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111027389B (zh) * 2019-11-12 2023-06-30 通号通信信息集团有限公司 人群计数系统中基于可变形高斯核的训练数据生成方法

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110191078A1 (en) * 2010-02-01 2011-08-04 Maria Davidich Calibration of Stream Models and Stream Simulation Tools
CN102890791A (zh) * 2012-08-31 2013-01-23 浙江捷尚视觉科技有限公司 基于深度信息聚类的复杂场景人数统计方法
CN104966054A (zh) * 2015-06-11 2015-10-07 西安电子科技大学 无人机可见光图像中的弱小目标检测方法
CN105740945A (zh) * 2016-02-04 2016-07-06 中山大学 一种基于视频分析的人群计数方法
CN107301387A (zh) * 2017-06-16 2017-10-27 华南理工大学 一种基于深度学习的图像高密度人群计数方法
CN107967451A (zh) * 2017-11-23 2018-04-27 常州大学 一种利用多尺度多任务卷积神经网络对静止图像进行人群计数的方法
CN109271960A (zh) * 2018-10-08 2019-01-25 燕山大学 一种基于卷积神经网络的人数统计方法
CN111027389A (zh) * 2019-11-12 2020-04-17 通号通信信息集团有限公司 人群计数系统中基于可变形高斯核的训练数据生成方法

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104077613B (zh) * 2014-07-16 2017-04-12 电子科技大学 一种基于级联多级卷积神经网络的人群密度估计方法
CN106650913B (zh) * 2016-12-31 2018-08-03 中国科学技术大学 一种基于深度卷积神经网络的车流密度估计方法
CN108960404B (zh) * 2017-05-22 2021-02-02 浙江宇视科技有限公司 一种基于图像的人群计数方法及设备
CN108596054A (zh) * 2018-04-10 2018-09-28 上海工程技术大学 一种基于多尺度全卷积网络特征融合的人群计数方法

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110191078A1 (en) * 2010-02-01 2011-08-04 Maria Davidich Calibration of Stream Models and Stream Simulation Tools
CN102890791A (zh) * 2012-08-31 2013-01-23 浙江捷尚视觉科技有限公司 基于深度信息聚类的复杂场景人数统计方法
CN104966054A (zh) * 2015-06-11 2015-10-07 西安电子科技大学 无人机可见光图像中的弱小目标检测方法
CN105740945A (zh) * 2016-02-04 2016-07-06 中山大学 一种基于视频分析的人群计数方法
CN107301387A (zh) * 2017-06-16 2017-10-27 华南理工大学 一种基于深度学习的图像高密度人群计数方法
CN107967451A (zh) * 2017-11-23 2018-04-27 常州大学 一种利用多尺度多任务卷积神经网络对静止图像进行人群计数的方法
CN109271960A (zh) * 2018-10-08 2019-01-25 燕山大学 一种基于卷积神经网络的人数统计方法
CN111027389A (zh) * 2019-11-12 2020-04-17 通号通信信息集团有限公司 人群计数系统中基于可变形高斯核的训练数据生成方法

Also Published As

Publication number Publication date
CN111027389A (zh) 2020-04-17
CN111027389B (zh) 2023-06-30
US20220222930A1 (en) 2022-07-14

Similar Documents

Publication Publication Date Title
US11747898B2 (en) Method and apparatus with gaze estimation
US10499046B2 (en) Generating depth maps for panoramic camera systems
US10489956B2 (en) Robust attribute transfer for character animation
US11514642B2 (en) Method and apparatus for generating two-dimensional image data describing a three-dimensional image
WO2021093275A1 (zh) 一种人群计数系统中自适应计算高斯核大小的方法
CN109919971B (zh) 图像处理方法、装置、电子设备及计算机可读存储介质
US10169891B2 (en) Producing three-dimensional representation based on images of a person
WO2023024441A1 (zh) 模型重建方法及相关装置、电子设备和存储介质
JPWO2004063991A1 (ja) 画像のサブピクセルマッチングにおける多パラメータ高精度同時推定方法及び多パラメータ高精度同時推定プログラム
WO2020186385A1 (zh) 图像处理方法、电子设备及计算机可读存储介质
CN109767381A (zh) 一种基于特征选择的形状优化的矩形全景图像构造方法
Yung et al. Efficient feature-based image registration by mapping sparsified surfaces
CN114787828A (zh) 利用具有有意受控畸变的成像器进行人工智能神经网络的推理或训练
WO2021093276A1 (zh) 人群计数系统中基于可变形高斯核的训练数据生成方法
JP4887491B2 (ja) 医用画像処理方法及びその装置、プログラム
Hu et al. Towards effective learning for face super-resolution with shape and pose perturbations
CN111652807B (zh) 眼部的调整、直播方法、装置、电子设备和存储介质
US20230169755A1 (en) Apparatus and method with image processing
CN111369425A (zh) 图像处理方法、装置、电子设备和计算机可读介质
WO2021166574A1 (ja) 画像処理装置、画像処理方法、及びコンピュータ読み取り可能な記録媒体
US11967178B2 (en) Progressive transformation of face information
CN113642354B (zh) 人脸姿态的确定方法、计算机设备、计算机可读存储介质
Nguyen Panorama Image Stitching Techniques

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20887155

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20887155

Country of ref document: EP

Kind code of ref document: A1