WO2018059408A1 - 跨线计数方法和神经网络训练方法、装置和电子设备 - Google Patents

跨线计数方法和神经网络训练方法、装置和电子设备 Download PDF

Info

Publication number
WO2018059408A1
WO2018059408A1 PCT/CN2017/103530 CN2017103530W WO2018059408A1 WO 2018059408 A1 WO2018059408 A1 WO 2018059408A1 CN 2017103530 W CN2017103530 W CN 2017103530W WO 2018059408 A1 WO2018059408 A1 WO 2018059408A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame image
neural network
map
current frame
crowd
Prior art date
Application number
PCT/CN2017/103530
Other languages
English (en)
French (fr)
Inventor
王晓刚
赵倬毅
李鸿升
赵瑞
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Publication of WO2018059408A1 publication Critical patent/WO2018059408A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Definitions

  • the present application relates to computer vision technology, and more particularly to a cross-line counting method and a neural network training method, apparatus and electronic device.
  • the current mainstream cross-line counting methods are based on the method of Temporal Slice.
  • the so-called time slice image method refers to accumulating pixel vectors (color images are three-channel vectors) on the line extracted by each frame of the video in a time dimension to form a two-dimensional image of a time series slice, and then directly using the artificial mark.
  • the number of people crossing the line is used as a supervised signal.
  • a regression model is learned to estimate the number of people in the time-series slice image, thereby obtaining the number of people crossing the line within a certain period of time.
  • a method for counting over the line includes:
  • the neural network Includes convolutional neural networks and element multiplication networks.
  • a training based on the above neural network training method is provided.
  • the neural network performs a method of cross-line counting of people in a video.
  • a cross-line counting device including:
  • a second acquiring unit configured to respectively use each frame image of the plurality of original frame images as a current frame image, for a line of interest LOI to be cross-line counted in the video, according to a crowd count map of the current frame image Obtaining, by the number of times the current frame image passes the LOI from at least one direction;
  • a third acquiring unit configured to respectively accumulate the number of the plurality of original frame images in the at least one direction passing through the LOI to obtain the LOI in the at least one direction in the to-be-analyzed time period T The number of people crossing the line.
  • another electronic device including:
  • a further electronic device including:
  • a further electronic device including: a processor, a memory, a communication interface, and a communication bus, wherein the processor, the memory, and the communication interface are completed by using the communication bus Communication with each other;
  • the memory is configured to store at least one executable instruction that causes the processor to perform operations corresponding to the method described in any of the above embodiments of the present application.
  • a computer program comprising computer readable code, when a computer readable code is run on a device, a processor in the device performs the above The instructions of the various steps in the method of any of the embodiments.
  • a computer readable storage medium for storing computer readable instructions, the instructions being executed to implement steps in the method of any of the above embodiments of the present application. Operation.
  • a neural network training method and a technical solution for performing crowd cross-line counting based on a trained neural network are proposed.
  • the number map is used as a supervised signal, and the neural network is iteratively trained until the preset condition is met, and the neural network is obtained; by inputting to the neural network, a plurality of original frame images corresponding to the time period T to be analyzed in the video that needs to be cross-line counted,
  • the population count map of each frame image in the original frame image may be output, that is, at each position, the current frame image and the adjacent previous frame image are in the counting direction (for example, at least the x-axis and the y-axis of the two-dimensional coordinate plane)
  • the number of people passing through a coordinate direction respectively, for each frame image, the number of people passing through
  • the embodiment of the present application directly uses the original frame image in the original video as an input and does not use the time series slice image, and has better robustness and can be applied to various different scenarios, which is beneficial to avoid a large population density and a moving speed of the crowd in the video.
  • the low or stationary motion or the surveillance camera angle is relatively low, the pedestrian's recognizability in the time series slice image is low, and the number of people in the time series slice image cannot be estimated.
  • the population density is high and the population movement speed is low.
  • FIG. 2 is a flow chart of another embodiment of a cross-line counting method of the present application.
  • FIG. 3 is a flow chart of an embodiment of a neural network training method of the present application.
  • FIG. 6 is a schematic structural diagram of another embodiment of a cross-line counting device of the present application.
  • FIG. 7 is a schematic structural diagram of an embodiment of a neural network training apparatus according to the present application.
  • FIG. 8 is a schematic structural diagram of an embodiment of an electronic device according to the present application.
  • FIG. 9 is a schematic structural diagram of another embodiment of an electronic device according to the present application.
  • Embodiments of the present application can be applied to electronic devices such as terminal devices, computer systems, servers, etc., which can operate with numerous other general purpose or special purpose computing system environments or configurations.
  • Examples of well-known computing systems, environments, and/or configurations suitable for use with computer systems, servers, terminal devices include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop Devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, networked personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above, and the like.
  • Computer systems, servers, terminal devices can be described in the general context of computer system executable instructions (such as program modules) being executed by a computer system.
  • program modules may include routines, programs, target programs, components, logic, data structures, and the like that perform particular tasks or implement particular abstract data types.
  • the computer system/server can be implemented in a distributed cloud computing environment where tasks are performed by remote processing devices that are linked through a communication network.
  • program modules may be located on a local or remote computing system storage medium including storage devices.
  • a population counting map of each frame image is obtained from the original video; then the crowd counting map of each frame image is accumulated on a line of interest (LOI) to be cross-line counted, and the The instantaneous cross-line count value in the two directions on the LOI, that is, the number of people passing through the LOI; and then accumulate the instantaneous cross-line count value in the time period T to be analyzed in two directions, respectively, to obtain the crowd in the time period T to be analyzed.
  • Cross-line count value is obtained from the original video; then the crowd counting map of each frame image is accumulated on a line of interest (LOI) to be cross-line counted, and the The instantaneous cross-line count value in the two directions on the LOI, that is, the number of people passing through the LOI; and then accumulate the instantaneous cross-line count value in the time period T to be analyzed in two directions, respectively, to obtain the crowd in the time period T to be analyzed.
  • FIG. 1 is a flow chart of an embodiment of a cross-line counting method of the present application. As shown in FIG. 1, the cross-line counting method of this embodiment includes:
  • the video may be sequentially obtained from the video.
  • the plurality of original frame images corresponding to the time period T to be analyzed at least two frames of images are sequentially extracted, and the back frame image of the at least two frames of images is used as the current frame image, and a crowd count map of the current frame image is generated.
  • the at least two frames of the sequentially extracted image may be consecutive original frame images or discontinuous original frame images, or at least two of the frames may be consecutive original frame images and partially discontinuous original frame images. .
  • the crowd cross-line counting may be performed on all the original frame images corresponding to the time period T to be analyzed in the video, and some original frame images may be extracted from the crowd for cross-line counting, instead of All original frame images corresponding to the time period T to be analyzed are involved in the crowd crossing count.
  • the neural network in various embodiments of the present application may be a deep neural network.
  • operation 102 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a first acquisition unit that is executed by the processor.
  • the frame image of each of the plurality of original frame images is used as the current frame image, and the LOI for cross-line counting in the video is obtained, and the current frame image is obtained from at least one direction through the LOI according to the crowd counting map of the current frame image.
  • the number of people such as the number of people passing the LOI from one direction, or the number of people passing the LOI from both directions.
  • the LOI in the embodiments of the present application may be set according to the application requirements of the crowd counting, and may be any position in the video scene that needs to be counted by the crowd.
  • the connection formed on both sides of the subway entrance and exit, and the sides of the mall door The formation of the connection, etc.
  • the number of people passing through a certain LOI from various directions may be counted according to demand, and the number of people passing through the LOI from a certain direction is also referred to as the number of people crossing the line in the direction of the LOI.
  • the connection formed on both sides of the subway entrance and exit is LOI. You can know the number of people entering the subway station during the period, the number of people entering the subway station, or entering or leaving the subway by counting the number of people crossing the LOI. The total number of mouths.
  • operation 104 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a second acquisition unit executed by the processor.
  • each of the plurality of original frame images passes the number of the LOIs, and obtains the number of people crossing the LOI in the at least one direction in the time period T to be analyzed, for example, from one The number of people passing the LOI or the number of people passing the LOI from both directions.
  • operation 106 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a third acquisition unit executed by the processor.
  • the embodiment of the present application directly uses the image of each frame in the original video as input and does not use the time slice image, the robustness is better, and can be applied to various scenarios, and the population density is high, the moving speed of the crowd is low or static.
  • the extreme case of immobility is also applicable, and can be applied across scenarios; and the embodiment of the present application performs cross-line counting based on the crowd count map instead of only using the total number of people, and also considers the distribution of the crowd, which is beneficial to improve the cross-line counting result.
  • Objectivity and accuracy are also used to improve the crowd count map.
  • the number of people who pass the LOI from the two directions in the current frame image may also be acquired.
  • the two directions comprise: perpendicular to the LOI, a direction from one side of the LOI to the other side of the LOI, and a direction from the other side to the one side; Alternatively, it may be a direction that is not perpendicular to the LOI, from one side of the LOI to the other side of the LOI and its opposite direction.
  • the number of images of each of the plurality of original frame images passing through the LOI can be obtained in the time period T to be analyzed, and the LOI is respectively in the two directions.
  • the number of people crossing the line so that you can fully understand the number of cross-line people in the LOI.
  • the crowd density map and the crowd speed map of the frame image are obtained based on at least two frames of images in the video. It is assumed that the density distribution and the walking speed of the pedestrian in the two frames remain unchanged, and the current frame image is crowded. The density map and the crowd speed map are multiplied at the corresponding positions to obtain a crowd count map of the frame image, and the accurate acquisition of the crowd count map is realized.
  • the current frame may be obtained by: The number of people who passed the above LOI in two directions:
  • the positive scalar value and the negative scalar value on the LOI are respectively accumulated to obtain the number of people who pass the current frame image in the two directions of the LOI.
  • the positive and negative scalar values on the LOI can be accumulated separately by the following formula:
  • c 1, t and c 2 represent the instantaneous span count value at time t in both directions of the LOI in the current frame image
  • ⁇ p represents the count vector (C t, x (p) at the current position p, C t,y (p)) is the angle between the normal direction of the LOI and t is any time within the time period T to be analyzed.
  • FIG. 2 is a flow chart of another embodiment of a cross-line counting method of the present application. As shown in FIG. 2, the cross-line counting method of this embodiment includes:
  • the neural network sequentially extracts at least two frames of images from the plurality of original frame images corresponding to the time period T to be analyzed in the video that needs to be cross-lined by the crowd, and uses the back frame image in the at least two frames as the current A frame image that generates a population count map of the current frame image.
  • the at least two frames of the sequentially extracted images may be consecutive original frame images, or may be discontinuous original frame images, or may be partially continuous original frame images and partially discontinuous original frame images.
  • the crowd count map includes a count vector for each position in the frame image, that is, each position of the crowd count map records a two-dimensional count vector, and the two-dimensional count vector is used to represent the current frame image and the adjacent front The number of people passing between the frame images in the x-axis and y-axis directions.
  • the frame image of each of the plurality of original frame images is used as the current frame image, and for each LOI of the video to be cross-line counted, the count vector of at least one position on the LOI in the crowd count map is respectively at the LOI. Projecting in the normal direction, obtaining a scalar value of at least one position on the LOI, the positive and negative of the scalar value respectively indicating the two directions of the LOI.
  • operation 210 may be performed by a processor invoking a corresponding instruction stored in a memory or by a computing unit executed by the processor.
  • the initial neural network may be trained in advance to obtain a neural network, and the obtained neural network may be used in the cross-line counting method of the foregoing embodiments of the present application, or Used in other applications where a crowd count map is required.
  • a neural network may be preset, the neural network including a convolutional neural network (CNN) and an element multiplication network.
  • CNN convolutional neural network
  • the operation may be performed by a processor invoking a corresponding instruction stored in a memory or by a network training unit executed by the processor.
  • the neural network in each embodiment of the present application may be a deep nerve The internet.
  • the original sample video is input to the neural network, and the population count map pre-labeled by the plurality of original frame images in the sample video is used as a supervised signal to perform iterative training on the neural network.
  • the population count map pre-labeled by the plurality of original frame images in the sample video is used as a supervised signal to perform iterative training on the neural network.
  • crowd crossing counting based on the trained neural network.
  • the neural network directly trains the original frame image in the original video as input, and does not use the time series slice image, the robustness is better, and can be applied to various scenarios, thereby avoiding a large population density in the video, and the crowd
  • the moving speed is low or stationary, or the surveillance camera has a low viewing angle
  • the pedestrian's recognizability in the time series slice image is low, and the number of people in the time series slice image cannot be estimated, and the crowd density is large and the crowd moves.
  • the case of low speed or stationary is also applicable, and can be applied across scenarios; and, in the embodiment of the present application, the neural network is used to perform cross-line counting based on the population count map instead of only using the total number of people, and the distribution of the population is also considered.
  • the results of the cross-line counting obtained by the trained neural network for cross-line counting are more objective and accurate.
  • the plurality of original frame images are respectively labeled with a population density map, a crowd speed map, and a crowd count map.
  • multiple original frame images of the sample video are input to the neural network, and the population count map pre-labeled by the plurality of original frame images is used as a supervised signal, and the neural network is iteratively trained until the preset condition is met.
  • the adjacent two frames of the plurality of original frame images in the sample video are respectively input into the neural network as a training sample, and the pre-labeled population count map is used as a supervised signal, and the neural network is iteratively trained until the second pre-satisfaction is satisfied. Set the convergence conditions to obtain the final neural network.
  • the neural network is trained in two stages. In the first phase, estimates are made for the population density map and the population speed map, respectively. These are two relatively simple tasks with relatively obvious semantic information; in the second phase, a direct estimate of the population count map is given.
  • the embodiment of the present application passes the second stage training after the training in the first stage is completed.
  • the positional mismatch is corrected to effectively ensure the matching of the population density map and the velocity map in the spatial position; and in the second phase, the crowd count map is used as the supervised signal, which is more conducive to the study of complex neural networks, so that training The resulting neural network has a stronger, more accurate counting capability.
  • the following operations may also be performed before the neural network performs iterative training:
  • Each of the plurality of original frame images in the sample video respectively calibrates pedestrian information of each pedestrian, the pedestrian information including a pedestrian location and a pedestrian ID.
  • the geometric perspective includes the correspondence between the number of pixels at different positions in the sample video and the real physical size of the scene
  • the pedestrian position and the geometric perspective in each frame image of the plurality of original frame images in the sample video in the sample video Pedestrian information of each pedestrian is marked in each frame image of the plurality of original frame images, and the pedestrian position may be marked with a correspondingly sized icon in the sample video scene according to the position of the pedestrian in the real scene and the physical size of the pedestrian.
  • the image of each frame in the sample video may not be calibrated, and may be preset according to the movement condition and the moving speed of the pedestrian ( For example, 1 second) is marked, and the pedestrian image of the middle frame image and its pedestrian ID can be approximated by interpolation of pedestrians and their pedestrian IDs in the two frames of images that are calibrated before and after, so as to simplify the labeling workload.
  • all the original frame images in the sample video can participate in the neural network training, and some of the original frame images can be extracted to participate in the neural network training, so that more sample videos can be trained and participate in the neural network under certain training tasks. The more the total number of trained frame images, the better the training effect of the neural network, and the better the robustness of the trained neural network.
  • the convolutional neural network uses the rear frame image in the current training sample.
  • the current frame image generating a population density map of the current frame image according to the pedestrian information calibrated by each frame image, and generating a crowd speed map of the current frame image according to the pedestrian information and the geometric perspective view in the two frames of the current training sample; Comparative convolution Whether the deviation between the population density map and the population speed map generated by the neural network and the labeled population density map and the population speed map is less than a preset condition, or whether the number of times of iterative training by the convolutional neural network reaches a preset number threshold; If the deviation is not less than the preset condition or the number of iterations does not reach the preset number threshold, the network parameters of the convolutional neural network are adjusted, and then the operation 306 is continued until the deviation is less than a preset condition or the number of iter
  • the convolutional neural network can generate a population density map of the current frame image as follows:
  • the crowd density value of at least one position in the frame image can be obtained;
  • the population density map in the obtained frame image can be calculated by the following formula:
  • D t (p) represents the population density value at the p position in the frame image; Represents a normalized two-dimensional Gaussian distribution centered at the head mark P at the P position; that is, a Gaussian kernel is used to represent the position of a pedestrian head); ⁇ P represents the variance of the Gaussian distribution, and the value of ⁇ P It can be determined from the geometric perspective of each sample video scene to ensure that everyone has the same physical size.
  • the convolutional neural network may generate a population speed map of the current frame image by:
  • a crowd speed map of the current frame image is generated based on the crowd speed and the geometric perspective of at least one location in the current frame image.
  • V t (p) represents the population velocity value at the p position; Indicates the moving speed of the mark human head mark P in the current frame image, which can be based on the position difference of two adjacent frames To get; K(p;P,r P ) is a disc-shaped function. The center of the disc is the human head mark P, the radius is r P , and the radius r P can be selected as follows:
  • the actual physical size of the human head set according to experience is converted into the number of pixels of the corresponding position.
  • the convolutional neural network in the neural network is used in the current training sample.
  • the rear frame image is used as the current frame image, and the crowd density map of the current frame image is generated according to the pedestrian information and the geometric perspective image of each frame image in the sample video, and the pedestrian information and the geometric perspective view in the two frames of the current training sample are generated.
  • the crowd density map and the crowd speed map of the frame image are obtained based on at least two frames of the plurality of original frame images in the sample video and the geometric speed map, and the pedestrian is assumed to be The density distribution and walking speed of the two frames remain unchanged, and the crowd density map of the current frame image and the crowd speed map are multiplied at the corresponding positions to obtain a crowd count map of the frame image, thereby realizing convenient acquisition of the crowd count map.
  • the neural network is trained in two stages.
  • the first phase corresponds to operation 306, which provides an estimate of the population density map and the population velocity map, respectively, by a convolutional neural network, which are two relatively simple tasks with relatively distinct semantic information;
  • the second phase corresponds to operation 308, through
  • the element multiplication network gives a direct estimate of the population count map.
  • FIG. 4 it is a schematic diagram of training a neural network through two stages in the embodiment of the present application.
  • the adjacent two frames of the sample video are input as a training sample into the convolutional neural network in the neural network.
  • the population density map and the population speed map are outputted by the convolutional neural network; the population density map and the population are The velocity map is input to the element multiplication network in the neural network.
  • the population multiplication network outputs the population count map.
  • the number of frames of the image of the population density map and the population speed map output by the convolutional neural network and the pre-labeled population density map and the crowd speed map are compared with the plurality of original frame images
  • the ratio of the number of frames reaches a first preset threshold, that is, the number of frames of the image of the crowd density map output by the convolutional neural network and the pre-labeled population density map and the crowd speed map are input to the sample video input to the convolutional neural network.
  • the ratio of the number of frames of the image reaches a first preset threshold, and the number of frames of the image of the crowd speed map output by the convolutional neural network and the pre-labeled crowd speed map and the image of the sample video input to the convolutional neural network The ratio of the number of frames reaches a first preset threshold;
  • the similarity between the population density map outputted by the convolutional neural network and the pre-labeled population density map, the population speed map output by the convolutional neural network, and the advance is greater than the second preset threshold
  • the average similarity between the population density map outputted by the convolutional neural network and the pre-labeled population density map, the population speed map output by the convolutional neural network, and the pre-labeled population is greater than a third preset threshold
  • the number of iterative trainings performed by the convolutional neural network reaches a fourth predetermined threshold.
  • the second preset convergence condition may be considered to be satisfied when any one or more of the following conditions are met:
  • the ratio of the number of frames of the image that is output by the element multiplication network to the pre-labeled population count map and the number of frames of the plurality of original frame images reaches the fifth pre- Set a threshold
  • the similarity between the crowd count map output by the element multiplication network and the pre-labeled crowd count map is greater than a sixth preset threshold
  • the average similarity between the population count map output by the element multiplication network and the manually counted population count map is greater than a seventh preset threshold
  • the number of iterative trainings in the second part of the neural network reaches an eighth preset threshold.
  • the population density map and the pre-labeled population density map are considered to be consistent when any one or more conditions including but not limited to the following are met:
  • the population density map output by the convolutional neural network is identical to the image characteristics of the pre-labeled population density map (or the population speed map output by the convolutional neural network and the pre-labeled population speed map);
  • the image features of the pre-labeled population density map include but more than the image characteristics of the population density map output by the convolutional neural network (or the image features of the pre-labeled population velocity map include but more than the population velocity map output by the convolutional neural network) Image feature);
  • the same feature between the population density map output by the convolutional neural network and the image features of the pre-labeled population density map (or the population speed map output by the convolutional neural network and the pre-labeled population speed map) reaches a certain number or preset. a proportion;
  • the population count map output by the element multiplication network is the same as the image feature of the pre-labeled population density map
  • the image features of the pre-labeled population density map include but more than the image characteristics of the population count map output by the element multiplication network;
  • the same feature between the crowd count map output by the element multiplication network and the image feature of the pre-labeled crowd count map reaches a certain number or a preset ratio
  • the similarity between the two graphs for example, the similarity between the population density map output by the convolutional neural network and the pre-labeled population density map, convolution
  • the similarity between the population speed map output by the neural network and the pre-labeled population speed map, the similarity between the population count map output by the element multiplication network and the pre-labeled crowd density map can be determined by the Euclidean distance of the two graphs ( L2) Distance measurement. Therefore, the Euclidean distance between the two graphs may be obtained first, and whether the Euclidean distance between the two graphs is greater than a preset threshold is compared to confirm whether the similarity between the two graphs is greater than the preset threshold.
  • the foregoing embodiment of the present application further provides a method for performing cross-line counting of a crowd in a video based on a neural network trained by the above neural network training method.
  • the cross-line counting of the crowd in the video may be performed by the cross-line counting method embodiment of any of the above embodiments of the present application. I will not repeat them here.
  • the population count map of the frame image in the video can be acquired based on the neural network, so as to perform the cross-line counting of the crowd in the video.
  • the original frame image of the video that needs to be cross-line counted is input to the neural network, and the neural network can output a crowd count map of the frame image by, but not limited to, the operations described in any of the above embodiments of the present application.
  • the neural network used in the cross-line counting method of the foregoing embodiment of the present application may be obtained based on the neural network training method of any of the above embodiments of the present application, or may be obtained by other training methods, as long as the trained neural network can be input.
  • the original frame image outputs a crowd count map of the frame image.
  • Any cross-line counting method and neural network training method provided by the embodiments of the present application may be performed by any suitable device having data processing capability, including but not limited to: a terminal device, a server, and the like.
  • any of the cross-line counting method and the neural network training method provided by the embodiments of the present application may be executed by a processor, such as the processor executing any one of the cross-line mentioned in the embodiment of the present application by calling a corresponding instruction stored in the memory.
  • Counting methods and neural network training methods This will not be repeated below.
  • FIG. 5 is a schematic structural diagram of an embodiment of a cross-line counting device of the present application.
  • the cross-line counting device of this embodiment can be used to implement the embodiments of the above-described cross-line counting methods of the present application.
  • the cross-line counting device of this embodiment includes: a first acquiring unit, a second acquiring unit, and a third acquiring unit. among them:
  • the first obtaining unit as a neural network, is configured to receive a plurality of original frame images corresponding to the time period T to be analyzed in the video that needs to be cross-line counted, and output a crowd counting map of the original frame image.
  • the crowd count map includes a count vector of at least one position in the frame image, and the count vector is used to indicate that each frame image of the plurality of original frame images and the adjacent previous frame image are in a counting direction (for example, a two-dimensional coordinate plane) The number of people passing through the two coordinate directions).
  • a counting direction for example, a two-dimensional coordinate plane
  • the first obtaining unit may be configured to sequentially extract at least two frames of images from the plurality of original frame images corresponding to the time period T to be analyzed in the video, and use the back frame image in the at least two frames as the image.
  • the current frame image, a population count map of the current frame image is generated.
  • the neural network in various embodiments of the present application may be a deep neural network.
  • a second acquiring unit configured to respectively use each frame image of the plurality of original frame images as the current frame image, and for the LOI of the video to be cross-line counted, obtain the current frame image according to the crowd counting map of the current frame image respectively.
  • the number of people passing through the LOI in at least one direction for example, the number of people who can pass the LOI from one direction of the current frame image, and the number of people who pass the LOI from the two directions in the current frame image.
  • the second obtaining unit may be configured to respectively project a count vector of at least one position on the LOI in the crowd count map in a normal direction of the LOI to obtain a scalar value of at least one position on the LOI.
  • the positive and negative of the scalar value represent the two directions of the LOI; and the positive scalar value and the negative scalar value on the LOI are respectively accumulated to obtain the number of people passing through the two directions of the current frame image on the LOI.
  • a third acquiring unit configured to respectively accumulate the number of the plurality of original frame images in the at least one direction passing through the LOI, and obtain the number of people crossing the line in the at least one direction in the time period T to be analyzed.
  • the third acquiring unit when the second acquiring unit acquires the number of people passing the LOI through the LOI in one direction, the third acquiring unit correspondingly accumulates the number of people in each of the plurality of original frame images in the direction passing the LOI, and obtains a to-be-analyzed The number of people crossing the LOI in this direction during time period T.
  • the second acquiring unit acquires the number of people who pass the LOI from the two directions in the current direction
  • the third acquiring unit respectively accumulates the number of the images of the plurality of original frame images in the two directions passing through the LOI, and obtains the During the analysis period T, the LOI is the number of people crossing the line in these two directions.
  • the population counting map of each frame image corresponding to the time period T to be analyzed in the video is separately acquired by the neural network, and each frame image is obtained according to the crowd counting map from at least one direction.
  • the number of people crossing the line in the at least one direction of the LOI in the time period T to be analyzed is obtained by the number of LOIs and the number of people passing through the LOI of the plurality of original frame images in at least one direction. Since the embodiment of the present application directly uses the image of each frame in the original video as input and does not use the time slice image, the robustness is better, and can be applied to various scenarios, and the population density is high, the moving speed of the crowd is low or static.
  • the extreme case of immobility is also applicable, and can be applied across scenarios; and the embodiment of the present application performs cross-line counting based on the crowd count map instead of only using the total number of people, and also considers the distribution of the crowd, which is beneficial to improve the cross-line counting result.
  • Objectivity and accuracy are also applicable, and can be applied across scenarios; and the embodiment of the present application performs cross-line counting based on the crowd count map instead of only using the total number of people, and also considers the distribution of the crowd, which is beneficial to improve the cross-line counting result.
  • FIG. 6 is a schematic structural diagram of another embodiment of a cross-line counting device of the present application.
  • the first obtaining unit may optionally include a convolutional neural network and an element multiplication network, as compared with the embodiment shown in FIG. 5. among them:
  • the convolutional neural network is configured to receive at least two frames of the input image, and use the rear frame image of the at least two frames of the image as the current frame image to generate a population density map and a crowd speed map of the current frame image.
  • the crowd density map is used to represent the population density of at least one position in the current frame image
  • the crowd speed map is used to indicate the speed at which each pedestrian in the current frame image moves from the adjacent previous frame image to the current frame image.
  • the convolutional neural network when it generates the crowd density map of the current frame image, it may be used to respectively acquire the population density value of at least one position in the current frame image according to the pedestrian information in the current frame image; and at least according to the current frame image.
  • the crowd density value of one location generates a crowd density map of the current frame image; when generating the crowd speed map of the current frame image, it can be used to determine the position difference between the front frame image and the current frame image of each pedestrian in the current frame image in the current training sample.
  • the corresponding time difference of the front frame image and the current frame image acquires a moving speed of each pedestrian in the current frame image; and acquires a crowd speed of at least one position in the current frame image according to the moving speed of each pedestrian in the current frame image and the pedestrian position And generating a crowd speed map of the current frame image based on the crowd speed of at least one location in the current frame image.
  • the element multiplication network is configured to multiply the crowd density map of the current frame image and the crowd speed map at the corresponding position to obtain a crowd count map of the current frame image.
  • a calculation unit may be further included for accumulating the number of people crossing the LOI in two directions to obtain a time period T to be analyzed. Within, the total number of people crossing the line through the LOI.
  • FIG. 7 is a schematic structural diagram of an embodiment of a neural network training apparatus according to the present application.
  • the neural network training device of the embodiments of the present application can be used to implement the foregoing neural network training method embodiments of the present application.
  • the neural network training apparatus of this embodiment includes a network training unit for inputting a plurality of original frame images of the sample video to the neural network, and counting the populations pre-labeled by the plurality of original frame images in the sample video. As a supervised signal, the graph performs iterative training on the neural network until the preset conditions are met to obtain the final neural network.
  • the neural network includes CNN and element multiplication networks.
  • the neural network in various embodiments of the present application may be a deep neural network.
  • the neural network training device inputs the original sample video to the neural network, and uses the pre-labeled population count map of the plurality of original frame images in the sample video as a supervised signal to perform iterative training on the neural network. Until the preset conditions are met, the crowd crossing counts based on the trained neural network. Since the neural network directly uses the original frame image in the original video as input and does not use the time-series slice image, the robustness is better, and can be applied to various scenarios, and the crowd density in the video is large, and the moving speed of the crowd is low.
  • the pedestrian's recognizability in the time series slice image is low, and the number of people in the time series slice image cannot be estimated.
  • the population density the population movement speed is low or
  • the case of being stationary is also applicable, and can be applied across scenarios; and the embodiment of the present application trains the neural network based on the population count map for cross-line counting instead of only using the total number of people, and also considers the distribution of the population, which is beneficial to improve The objectivity and accuracy of the cross-line counting results.
  • the network training unit can exemplarily train the neural network by:
  • the adjacent two frames of the plurality of original frame images in the sample video are respectively input to the convolutional neural network as a training sample, and the crowd density map and the crowd speed map pre-labeled by the training sample are used as supervised signals for convolution
  • the neural network performs iterative training until the first predetermined convergence condition is met, and the final convolutional neural network is obtained;
  • the two adjacent frames of the plurality of original frame images in the sample video are respectively input into the neural network as a training sample, and the population counting map pre-labeled by the training sample is used as a supervised signal, and the neural network is iteratively trained until the second is satisfied.
  • the convergence condition is preset to obtain the final neural network.
  • the first preset convergence condition, and the second preset convergence condition refer to the description of the embodiment shown in FIG. 3, and details are not described herein again.
  • the embodiment of the present application further provides an electronic device, including the cross-line counting device or the neural network training device provided by any of the foregoing embodiments of the present application.
  • the electronic device in the embodiment of the present application may be any device having a data processing function, and may include, but is not limited to, an advanced reduced instruction set machine (ARM), a central processing unit (CPU), or a graphics processing unit (GPU). )Wait.
  • ARM advanced reduced instruction set machine
  • CPU central processing unit
  • GPU graphics processing unit
  • the embodiment of the present application further provides another electronic device, including: a processor and the cross-line counting device of any of the foregoing embodiments of the present application; when the processor runs the cross-line counting device of any of the above embodiments of the present application The unit in the cross-line counting device of any of the above embodiments of the present application is operated.
  • the embodiment of the present application further provides another electronic device, including:
  • the processor runs the neural network training device of any of the above embodiments of the present application
  • the unit in the neural network training device of any of the above embodiments of the present application is operated.
  • the embodiment of the present application further provides another electronic device, including: a processor, a memory, a communication interface, and a communication bus, where the processor, the memory, and the communication interface complete communication with each other through the communication bus;
  • the memory is configured to store at least one executable instruction that causes the processor to perform operations corresponding to the method of any of the above embodiments of the present application.
  • the electronic device may include a processor, a communications interface, a memory, and a communication bus. among them:
  • the processor, the communication interface, and the memory complete communication with each other through the communication bus.
  • a communication interface for communicating with network elements of other devices such as other clients or servers.
  • the processor may be a central processing unit (CPU), or an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of the present application, or a graphics processor (Graphics) Processing Unit, GPU).
  • the one or more processors included in the terminal device may be the same type of processor, such as one or more CPUs, or one or more GPUs; or may be different types of processors, such as one or more CPUs and One or more GPUs.
  • the memory is for at least one executable instruction that causes the processor to perform operations corresponding to the cross-line counting method and the neural network training method of any of the above-described embodiments of the present application.
  • Memory may contain high speed random access memory (random access memory, RAM), may also include non-volatile memory, such as at least one disk storage.
  • FIG. 9 is a schematic structural diagram of an embodiment of an electronic device according to the present application. As shown in FIG. 9, an electronic device for implementing an embodiment of the present application includes a central processing unit (CPU) or a graphics processing unit (GPU), which may be according to executable instructions stored in a read only memory (ROM) or from storage. Various suitable actions and processes are performed by partially executing the executable instructions loaded into the random access memory (RAM).
  • CPU central processing unit
  • GPU graphics processing unit
  • ROM read only memory
  • RAM random access memory
  • the central processing unit or the graphics processing unit can communicate with the read-only memory and/or the random access memory to execute the executable instructions to complete the operations corresponding to the cross-line counting method provided by the embodiments of the present application, for example, input to the neural network needs to be cross- a plurality of original frame images corresponding to the time period T to be analyzed in the video of the line count, and the population count map of the plurality of original frame images is output by the neural network; the count vector of each position is included in the crowd count map The count vector is used to indicate the number of people passing through each of the plurality of original frame images and the adjacent previous frame images in the counting direction; respectively, each of the plurality of original frame images As the current frame image, for the line of interest LOI in the video to be cross-line counted, according to the crowd count map of the current frame image, obtain the number of people whose current frame image respectively passes through the LOI from at least one direction; respectively accumulate the at least one Obtaining, in the direction, the number of the original frame images in the
  • the central processing unit or the graphics processing unit can communicate with the read-only memory and/or the random access memory to execute the executable instructions to complete the operations corresponding to the neural network training method provided by the embodiments of the present application, for example,
  • the original frame images are input to the neural network, and the population count map pre-labeled by the plurality of original frame images is used as a supervised signal, and the neural network is iteratively trained until a preset condition is met; the neural network includes a convolutional neural network Multiply the network with the element.
  • the CPU, GPU, ROM, and RAM are connected to each other through a bus.
  • An input/output (I/O) interface is also connected to the bus.
  • the following components are connected to the I/O interface: an input portion including a keyboard, a mouse, and the like; an output portion including a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a speaker; a storage portion including a hard disk or the like; The communication part of the network interface card of the LAN card, modem, etc.
  • the communication section performs communication processing via a network such as the Internet.
  • the drive is also connected to the I/O interface as needed.
  • a removable medium such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like is mounted on the drive as needed so that a computer program read therefrom is pre-installed into the storage portion as needed.
  • the embodiment of the present application further provides a computer program, including computer readable code, when the computer readable code is run on the device, the processor in the device executes the method for implementing any of the foregoing embodiments of the present application. Instructions for each step.
  • an embodiment of the present disclosure includes a computer program product comprising a computer program tangibly embodied on a machine readable medium, the computer program comprising program code for executing the method illustrated in the flowchart, the program code being
  • the instruction corresponding to the step of performing the cross-line counting method provided by the embodiment of the present application is included, for example, inputting a plurality of original frame images corresponding to the time period T to be analyzed in the video that needs to be cross-line counted to the neural network,
  • the neural network outputs an instruction of a crowd count map of the plurality of original frame images;
  • the crowd count map includes a count vector of each position, and the count vector is used to represent each frame of the plurality of original frame images a number of people passing through the image in the counting direction between the image and the adjacent previous frame image; each frame image of the plurality of original frame images is used as the current frame image
  • the program code may further include instructions corresponding to performing any of the neural network training method steps provided by the embodiments of the present application, for example, inputting a plurality of original frame images in the sample video to the neural network, the plurality of originals
  • the pre-labeled population count map of the frame image is used as a supervised signal, and the neural network is iteratively trained until an instruction that satisfies a preset condition; the neural network includes a convolutional neural network and an element multiplication network.
  • the computer program can be downloaded and installed from the network via the communication portion, and/or installed from a removable medium.
  • the above-described functions defined in the method of the present application are performed when the computer program is executed by a central processing unit (CPU) or a graphics processing unit (GPU).
  • the embodiment of the present application further provides a computer storage medium for storing computer readable instructions, which are executed to implement the operations of the steps in the method of any of the foregoing embodiments of the present application.
  • the instruction includes: inputting, to the neural network, a plurality of original frame images corresponding to the time period T to be analyzed in the video that needs to be cross-line counted, and outputting the multiple originals by the neural network
  • the crowd count map includes a count vector for each position, the count vector being used to represent between each frame image of the plurality of original frame images and an adjacent previous frame image a number of people passing through in the counting direction; respectively, each frame image of the plurality of original frame images is used as a current frame image, and for a line of interest LOI to be cross-line counted in the video, according to a crowd counting map of the current frame image, Acquiring an instruction of the number of people of the current frame image passing through the LOI from at least
  • the command includes: inputting a plurality of original frame images in the sample video to a neural network, and using the pre-labeled crowd count map of the plurality of original frame images as a supervised signal, performing iterative training on the neural network until the satisfaction An instruction to preset a condition; the neural network includes a convolutional neural network and an element multiplication network.
  • Scenario 1 When it is necessary to count the number of people crossing the subway at the time interval T to be analyzed, the video of each entrance and exit of the subway is collected by the surveillance camera, and the entrances and exits of the subway are used as the LOI, and the video input of the subway entrances and exits in the time period T to be analyzed is input.
  • the neural network of the embodiment of the present application can obtain the number of people crossing the subway in the time period T to be analyzed by the cross-line counting method of the embodiment of the present application, thereby realizing statistics and analysis on the flow of people at each entrance and exit. It can estimate the operation load of the subway, facilitate the optimization of vehicle dispatching and reasonably control the flow of people to ensure passenger safety and smooth travel;
  • Scene 2 For the urban mass parade, collect the video of the parade street through the street surveillance camera, set the LOI in the width direction on the parade street, and input the video of the parade street on the LOI to be analyzed to the neural network of the embodiment of the present application. Through the cross-line counting method of the embodiment of the present application, the number of participating parades and the movement state of the crowd can be obtained, and the police force protection parade order and public safety can be conveniently arranged;
  • Scenario 3 For scenic spots or public sports venues, you can also collect video of scenic spots or public sports venues through surveillance cameras, set LOIs at entrances and exits of scenic spots or venues, and input video of scenic spots or public sports venues to the neural network of this application embodiment. Through the cross-line counting method of the embodiment of the present application, statistics can be performed on the people entering or leaving the scenic spot or the stadium, thereby reasonably controlling the flow of people and avoiding the danger of being overcrowded and having a stampede accident.
  • the methods, apparatus, and devices of the present application may be implemented in a number of ways.
  • the methods, apparatus, and devices of the present application can be implemented in software, hardware, firmware, or any combination of software, hardware, and firmware.
  • the above-described sequence of steps for the method is for illustrative purposes only, and the steps of the method of the present application are not limited to the order of the above optional description unless otherwise specifically stated.
  • the present application can also be implemented as a program recorded in a recording medium, the programs including machine readable instructions for implementing the method according to the present application.
  • the present application also covers a recording medium storing a program for executing the method according to the present application.
  • the methods, apparatus, and apparatus of the present application may be implemented in a number of ways.
  • the methods, apparatus, and apparatus of the present application can be implemented in software, hardware, firmware, or any combination of software, hardware, and firmware.
  • the above-described sequence of steps for the method is for illustrative purposes only, and the steps of the method of the present application are not limited to the order of the above optional description unless otherwise specifically stated.
  • the present application can also be implemented as a program recorded in a recording medium, the programs including machine readable instructions for implementing the method according to the present application.
  • the present application also covers a recording medium storing a program for executing the method according to the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例公开了一种跨线计数方法和神经网络训练方法、装置和电子设备,其中,跨线计数方法包括:向神经网络输入需要进行跨线计数的视频中与待分析时间段T对应的多个原始帧图像,由神经网络输出多个原始帧图像的人群计数图,人群计数图中包括每个位置的计数向量;分别以多个原始帧图像中的各帧图像作为当前帧图像,针对视频中要进行跨线计数的感兴趣线LOI,根据当前帧图像的人群计数图,获取当前帧图像分别从至少一个方向通过LOI的人数;分别累积至少一个方向上多个原始帧图像通过LOI的人数,获得待分析时间段T内,LOI在至少一个方向上的跨线人数。本申请实施例可以应用于多种不同的场景,有利于提高跨线计数结果的客观性和准确性。

Description

跨线计数方法和神经网络训练方法、装置和电子设备
本申请要求在2016年09月29日提交中国专利局、申请号为201610867834.1、发明名称为“跨线计数方法和深度神经网络训练方法、装置和电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机视觉技术,尤其是一种跨线计数方法和神经网络训练方法、装置和电子设备。
背景技术
视频中的人群自动计数技术在人群流量监控和公共安全等方面发挥着越来越重要的作用,尤其是跨线计数方法,可用于实时统计关键道路或进出口的人流量,进而给出一个区域内的人群总数的估计。
当前主流的跨线计数方法都是基于时序切片图像(Temporal Slice)的方法。所谓的时序切片图像的方法,是指将视频每帧提取的线上的像素向量(彩色图像是三通道向量)在时间维度进行累积,形成一个时序切片的二维图像,然后直接利用人工标记的跨线人数作为监督信号,基于这一时序切片图像学习一个回归模型,估计时序切片图像中的人群数量,从而得到一定时段内的跨线人数。
发明内容
本申请实施例提供一种跨线计数的技术方案。
根据本申请实施例的一个方面,提供的一种跨线计数方法,包括:
向神经网络输入需要进行跨线计数的视频中与待分析时间段T对应的多个原始帧图像,由所述神经网络输出所述多个原始帧图像的人群计数图;所述人群计数图中包括帧图像中至少一个位置的计数向量,所述计数向量用于表示所述多个原始帧图像中各帧图像与相邻的前帧图像之间在计数方向通过的人数;
分别以所述多个原始帧图像中的各帧图像作为当前帧图像,针对视频中要进行跨线计数的感兴趣线LOI,根据所述当前帧图像的人群计数图,获取所述当前帧图像从至少一个方向通过所述LOI的人数;
分别累积所述至少一个方向上所述多个原始帧图像通过所述LOI的人数,获得所述待分析时间段T内,所述LOI在所述至少一个方向上的跨线人数。
根据本申请实施例的另一个方面,提供的一种神经网络训练方法,包括:
将样本视频的多个原始帧图像输入至神经网络,以所述多个原始帧图像预先标注的人群计数图作为监督信号,对所述神经网络进行迭代训练直至满足预设条件;所述神经网络包括卷积神经网络和元素相乘网络。
根据本申请实施例的又一个方面,提供的一种基于上述神经网络训练方法训练得到的 神经网络进行视频中人群跨线计数的方法。
根据本申请实施例的又一个方面,提供的一种跨线计数装置,包括:
第一获取单元,作为神经网络,用于接收需要进行跨线计数的视频中与待分析时间段T对应的多个原始帧图像,输出所述原始帧图像的人群计数图;所述人群计数图中包括帧图像中至少一个位置的计数向量,所述计数向量用于表示所述多个原始帧图像中各帧图像与相邻的前帧图像之间在计数方向通过的人数;
第二获取单元,用于分别以所述多个原始帧图像中的各帧图像作为当前帧图像,针对视频中要进行跨线计数的感兴趣线LOI,根据所述当前帧图像的人群计数图,获取所述当前帧图像从至少一个方向通过所述LOI的人数;
第三获取单元,用于分别累积所述至少一个方向上所述多个原始帧图像通过所述LOI的人数,获得所述待分析时间段T内,所述LOI在所述至少一个方向上的跨线人数。
根据本申请实施例的又一个方面,提供一种神经网络训练装置,包括:
网络训练单元,用于将样本视频的多个原始帧图像输入至神经网络,以所述多个原始帧图像预先标注的人群计数图作为监督信号,对所述神经网络进行迭代训练直至满足预设条件;所述神经网络包括卷积神经网络CNN和元素相乘网络。
根据本申请实施例的又一个方面,提供的一种电子设备,包括本申请上述任一实施例所述的跨线计数装置或者神经网络训练装置。
根据本申请实施例的再一个方面,提供的另一种电子设备,包括:
处理器和本申请上述任一实施例所述的跨线计数装置;
在处理器运行本申请上述任一实施例所述的跨线计数装置时,本申请上述任一实施例所述的跨线计数装置中的单元被运行。
根据本申请实施例的再一个方面,提供的又一种电子设备,包括:
处理器和本申请上述任一实施例所述的神经网络训练装置;
在处理器运行本申请上述任一实施例所述的神经网络训练装置时,本申请上述任一实施例所述的神经网络训练装置中的单元被运行。
根据本申请实施例的再一个方面,提供的再一种电子设备,包括:处理器、存储器、通信接口和通信总线,所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信;
所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行本申请上述任一实施例所述方法对应的操作。
根据本申请实施例的再一个方面,提供的一种计算机程序,包括计算机可读代码,当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现本申请上述任一实施例所述方法中各步骤的指令。
根据本申请实施例的再一个方面,提供的一种计算机可读存储介质,用于存储计算机可读取的指令,所述指令被执行时实现本申请上述任一实施例所述方法中各步骤的操作。
基于本申请上述实施例提供的跨线计数方法和神经网络训练方法、装置和电子设备,提出了一种神经网络的训练方法,以及基于训练好的神经网络的进行人群跨线计数的技术方案,将样本视频输入至神经网络,以该样本视频中的多个原始帧图像预先标注的人群计 数图作为监督信号,对神经网络进行迭代训练直至满足预设条件,获得神经网络;通过向该神经网络输入需要进行跨线计数的视频中与待分析时间段T对应的多个原始帧图像,可输出原始帧图像中各帧图像的人群计数图,即:每个位置上,当前帧图像与相邻的前帧图像之间在计数方向(例如,二维坐标平面x轴和y轴的至少一个坐标方向)分别通过的人数;分别针对各帧图像,根据人群计数图获取从至少一个方向通过LOI的人数,并分别累积至少一个方向上多个原始帧图像通过LOI的人数,获得待分析时间段T内LOI在至少一个方向上的跨线人数。本申请实施例直接以原始视频中的原始帧图像作为输入而未使用时序切片图像,鲁棒性更好,可以应用于各种不同的场景,有利于避免视频中人群密度较大、人群移动速度低或静止不动、或者监控摄像头视角比较低等情形时,时序切片图像中的行人的可辨识度低、无法估计时序切片图像中的人群数量的问题,对于人群密度较大、人群移动速度低或静止不动的情形同样适用,并且可以跨场景应用;并且,本申请实施例基于人群计数图进行跨线计数而非只使用人群总数,还考虑了人群的分布情况,有利于提高跨线计数结果的客观性和准确性。
下面通过附图和实施例,对本申请的技术方案做进一步的详细描述。
附图说明
构成说明书的一部分的附图描述了本申请的实施例,并且连同描述一起用于解释本申请的原理。
参照附图,根据下面的详细描述,可以更加清楚地理解本申请,其中:
图1为本申请跨线计数方法一个实施例的流程图。
图2为本申请跨线计数方法另一个实施例的流程图。
图3为本申请神经网络训练方法一个实施例的流程图。
图4为本申请实施例中通过两个阶段对神经网络进行训练的一个示意图。
图5为本申请跨线计数装置一个实施例的结构示意图。
图6为本申请跨线计数装置另一个实施例的结构示意图。
图7为本申请神经网络训练装置一个实施例的结构示意图。
图8为本申请电子设备一个实施例的结构示意图。
图9为本申请电子设备另一个实施例的结构示意图。
具体实施方式
现在将参照附图来详细描述本申请的各种示例性实施例。应注意到:除非另外可选说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本申请的范围。
同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本申请及其应用或使用的任何限制。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情 况下,所述技术、方法和设备应当被视为说明书的一部分。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
本申请实施例可以应用于终端设备、计算机系统、服务器等电子设备,其可与众多其它通用或专用计算系统环境或配置一起操作。适于与计算机系统、服务器、终端设备一起使用的众所周知的计算系统、环境和/或配置的例子包括但不限于:个人计算机系统、服务器计算机系统、瘦客户机、厚客户机、手持或膝上设备、基于微处理器的系统、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机系统﹑大型计算机系统和包括上述任何系统的分布式云计算技术环境,等等。
计算机系统、服务器、终端设备可以在由计算机系统执行的计算机系统可执行指令(诸如程序模块)的一般语境下描述。通常,程序模块可以包括例程、程序、目标程序、组件、逻辑、数据结构等等,它们执行特定的任务或者实现特定的抽象数据类型。计算机系统/服务器可以在分布式云计算环境中实施,分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或远程计算系统存储介质上。
本申请实施例中,从原始的视频得到每帧图像的人群计数图(Counting Map);然后对每帧图像的人群计数图在要进行跨线计数的感兴趣线(LOI)上累积,得到该LOI上瞬时的两个方向的跨线计数值,即通过LOI的人数;接着分别对两个方向,累积待分析时间段T内的瞬时跨线计数值,得到该待分析时间段T内的人群跨线计数值。
图1为本申请跨线计数方法一个实施例的流程图。如图1所示,该实施例的跨线计数方法包括:
102,向神经网络输入需要进行跨线计数的视频中与待分析时间段T对应的多个原始帧图像,由该神经网络输出需要上述多个原始帧图像的人群计数图。
本申请实施例中引入了人群计数图,该人群计数图包括帧图像中至少一个位置的计数向量,即:人群计数图的每个位置都会记录一个二维的计数向量,这个二维的计数向量表示当前帧图像与相邻的前帧图像之间在计数方向上通过的人数,例如在二维坐标平面的两个坐标方向(即:x轴和y轴方向)分别通过的人数。人群计数图是一个数学意义上的近似图,人群计数图中每个位置上计数向量的值通常会小于1,它表示在当前帧图像与相邻的前帧图像之间有一个人的多少比例通过了该位置。
作为本申请各实施例的一个可选示例,该操作102中,向神经网络输入需要进行跨线计数的视频中与待分析时间段T对应的多个原始帧图像之后,还可以依次从上述视频中与待分析时间段T对应的多个原始帧图像中,顺序抽取至少两帧图像,以该至少两帧图像中的后帧图像作为当前帧图像,生成当前帧图像的人群计数图。其中,顺序抽取的至少两帧图像可以是连续的原始帧图像、或者是不连续的原始帧图像,也可以至少两帧图像中的部分为连续的原始帧图像、部分为不连续的原始帧图像。即:本申请实施例中,可以基于需要,对对视频中与待分析时间段T对应的全部原始帧图像进行人群跨线计数,也可以从中抽取部分原始帧图像进行人群跨线计数,而不需要待分析时间段T对应的全部原始帧图像均参与人群跨线计数。
作为本申请各实施例的一个可选示例,本申请各实施例中的神经网络可以是深度神经网络。
在一个可选示例中,操作102可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一获取单元执行。
104,分别以上述多个原始帧图像中的各帧图像作为当前帧图像,针对视频中要进行跨线计数的LOI,根据当前帧图像的人群计数图,获取当前帧图像从至少一个方向通过LOI的人数,例如从一个方向通过LOI的人数、或者分别从两个方向通过LOI的人数。
本申请各实施例中的LOI可以根据需要进行人群计数的应用需求设置,可以是视频场景中的任意需要进行人群计数的位置连线,例如,地铁出入口两侧形成的连线、商场门口两侧形成的连线等。本申请各实施例中,可以根据需求统计从各个方向通过某一LOI的人数,其中,从某一方向上通过LOI的人数也称为LOI在该方向上的跨线人数。例如,设置地铁出入口两侧形成的连线为LOI,可以通过统计一段时间内该LOI的跨线人数,获知该段时间内进入该地铁口的人数、走出该地铁口的人数、或者进出该地铁口的总人数。
在一个可选示例中,操作104可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二获取单元执行。
106,分别累积至少一个方向上,上述多个原始帧图像中的各帧图像通过上述LOI的人数,获得待分析时间段T内,上述LOI在上述至少一个方向上的跨线人数,例如从一个方向通过LOI的人数、或者分别从两个方向通过LOI的人数。
在一个可选示例中,操作106可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第三获取单元执行。
基于本申请上述实施例,提出了一种基于神经网络(CNN)的进行人群跨线计数的技术方案,通过神经网络分别获取视频中与待分析时间段T对应的各帧图像的人群计数图,分别针对各帧图像获取从至少一个方向通过LOI的人数,分别累积至少一个方向上各帧图像通过该LOI的人数,获得待分析时间段T内该LOI分别在至少一个方向上的跨线人数。由于本申请实施例直接以原始视频中的各帧图像作为输入而未使用时序切片图像,鲁棒性更好,可以应用于各种不同的场景,对于人群密度较大、人群移动速度低或静止不动的极端情况同样适用,并且可以跨场景应用;并且,本申请实施例基于人群计数图进行跨线计数而非只使用人群总数,还考虑了人群的分布情况,有利于提高跨线计数结果的客观性和准确性。
在本申请各跨线计数方法实施例的一个可选示例中,操作104中,可以获取当前帧图像从一个方向通过LOI的人数。相应地,操作106中,累积该方向上,上述多个原始帧图像中的各帧图像通过LOI的人数,便可获得待分析时间段T内,该LOI在该方向上的跨线人数。
另外,操作104中,也可以获取当前帧图像分别从两个方向通过LOI的人数。在本发明各实施例的一个可选示例中,两个方向包括:垂直于LOI,从该LOI的一侧到该LOI另一侧的方向、和从该另一侧到该一侧的方向;或者,还可以是不垂直于该LOI,从该LOI的一侧到该LOI另一侧的方向及其反方向。相应地,操作106中,分别累积这两个方向上,上述多个原始帧图像中的各帧图像通过该LOI的人数,便可获得待分析时间段T内,该LOI分别在这两个方向上的跨线人数,从而可以全面了解该LOI的双向跨线人数。
在本申请各跨线计数方法实施例的另一个可选示例中,可以通过如下方式,生成当前帧图像的人群计数图:
向神经网络输入多个原始帧图像,利用该神经网络中的卷积神经网络,生成当前帧图像的人群密度图和人群速度图。其中的人群密度图用于表示当前帧图像中至少一个位置的人群密度,人群速度图用于表示当前帧图像中各行人从相邻的前帧图像移动到当前帧图像的速度;
向神经网络中的元素相乘网络(elementwise product network)输入当前帧图像的人群密度图和人群速度图,利用该元素相乘网络将当前帧图像的人群密度图和人群速度图在对应位置的元素相乘,获得当前帧图像的人群计数图。
本申请上述实施例中,先基于视频中的至少两帧图像,获得帧图像的人群密度图和人群速度图,假设行人在两帧的密度分布和行走速度保持不变,将当前帧图像的人群密度图和人群速度图在对应位置的元素相乘来获得帧图像的人群计数图,实现了人群计数图的准确获取。
在本申请各跨线计数方法实施例的另一个可选示例中,基于本申请实施得到帧图像的人群计数图后,针对任意一条要进行跨线计数的LOI,可以通过如下方式,获取当前帧图像分别从两个方向通过上述LOI的人数:
分别将人群计数图中在上述LOI上至少一个位置的计数向量在LOI的法线方向进行投影,获得该LOI上至少一个位置的标量值,该标量值的正负表示该LOI的两个方向,例如进入地铁出入口这一LOI和走出地铁出入口这一LOI的两个方向;
分别对上述LOI上的正标量值和负标量值进行累加,获得当前帧图像在该LOI的两个方向分别通过的人数。
例如,可以通过如下公式分别对LOI上的正标量值和负标量值进行累加:
Figure PCTCN2017103530-appb-000001
Figure PCTCN2017103530-appb-000002
其中,c1,t和c2,t分别表示当前帧图像中LOI两个方向上t时刻的瞬时跨线计数值,θp表示当前位置p处的计数向量(Ct,x(p),Ct,y(p))与LOI法线方向的夹角,t为待分析时间段T内的任一时刻。
在得到帧图像中LOI上两个方向的瞬时跨线计数值c1,t和c2,t后,可以通过公式为c1=∑{t|t∈T}c1,t,c2=∑{t|t∈T}c2,t,在待分析时间段T内对各时刻t的c1,t和c2,t进行累积,即可得到待分析时间段T内的跨线人数,c1和c2分别表示在待分析时间段T内LOI在两个方向上的跨线人数。
在本申请跨线计数方法的又一实施例中,获得当前帧图像在LOI上的两个方向分别通 过的人数以后,可以对该LOI在两个方向上的跨线人数进行累加,从而获得待分析时间段T内,通过该LOI的跨线总人数。
图2为本申请跨线计数方法另一个实施例的流程图。如图2所示,该实施例的跨线计数方法包括:
202,神经网络依次从需要进行人群跨线计数的视频中与待分析时间段T对应的多个原始帧图像中,顺序抽取至少两帧图像,以该至少两帧图像中的后帧图像作为当前帧图像,生成当前帧图像的人群计数图。
其中,顺序抽取的至少两帧图像可以是连续的原始帧图像,还可以是不连续的原始帧图像,还可以部分为连续的原始帧图像、部分为不连续的原始帧图像。人群计数图中包括帧图像中每个位置的计数向量,即:人群计数图的每个位置都会记录一个二维的计数向量,这个二维的计数向量用于表示当前帧图像与相邻的前帧图像之间在x轴和y轴方向分别通过的人数。
在一个可选示例中,操作202可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一获取单元执行。
204,分别以上述多个原始帧图像中的各帧图像作为当前帧图像,针对视频中要进行跨线计数的LOI,分别将人群计数图中在该LOI上至少一个位置的计数向量在该LOI的法线方向进行投影,获得该LOI上至少一个位置的标量值,该标量值的正负分别表示该LOI的两个方向。
206,分别对上述LOI上的正标量值和负标量值进行累加,获得当前帧图像在该LOI的两个方向分别通过的人数,为当前帧图像对应的t时刻,分别在LOI上的两个方向上的瞬时跨线计数值。
在一个可选示例中,操作204-206可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二获取单元执行。
208,分别累积待分析时间段T内多个原始帧图像在两个方向上通过LOI的人数,获得待分析时间段T内,LOI分别在两个方向上的跨线人数。
在一个可选示例中,操作208可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第三获取单元执行。
210,对LOI在两个方向上的跨线人数进行累加,获得待分析时间段T内,通过LOI的跨线总人数。
在一个可选示例中,操作210可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的计算单元执行。
在本申请上述各实施例的跨线计数方法之前,还可以预先对初始的神经网络进行训练获得神经网络,获得的神经网络可以用于本申请上述各实施例的跨线计数方法,另外也可以用于其它需要人群计数图的应用情形。在本申请实施例的一个可选示例中,可以预设一个神经网络,该神经网络包括卷积神经网络(CNN)和元素相乘网络。将一个以上样本视频的多个原始帧图像输入至神经网络,以该样本视频中的多个原始帧图像预先标注的人群计数图作为监督信号,对该神经网络进行迭代训练,直至满足预设条件,获得最终的神经网络。在一个可选示例中,该操作可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的网络训练单元执行。
作为本申请各实施例的一个可选示例,本申请各实施例中的神经网络可以是深度神经 网络。
基于本申请上述实施例提供的神经网络训练方法,将原始的样本视频输入至神经网络,以该样本视频中的多个原始帧图像预先标注的人群计数图作为监督信号,对神经网络进行迭代训练,以便基于训练好的神经网络进行人群跨线计数。由于该神经网络直接以原始视频中的原始帧图像作为输入进行训练、而未使用时序切片图像,鲁棒性更好,可以应用于各种不同的场景,避免了视频中人群密度较大、人群移动速度低或静止不动、或者监控摄像头视角比较低等情形时,时序切片图像中的行人的可辨识度低、无法估计时序切片图像中的人群数量的问题,对于人群密度较大、人群移动速度低或静止不动的情形同样适用,并且可以跨场景应用;并且,本申请实施例训练神经网络时基于人群计数图进行跨线计数而非只使用人群总数,还考虑了人群的分布情况,使得训练好的神经网络用于跨线计数时获得的跨线计数结果更加客观、准确。
在本申请实施例的一个可选示例中,上述多个原始帧图像分别被标注人群密度图和人群速度图、人群计数图。相应地,该实施例中,将样本视频的多个原始帧图像输入至神经网络,以该多个原始帧图像预先标注的人群计数图作为监督信号,对神经网络进行迭代训练直至满足预设条件,可以包括:
分别以上述样本视频中多个原始帧图像中的相邻两帧图像作为一个训练样本输入至卷积神经网络,以训练样本预先标注的人群密度图和人群速度图作为监督信号,对该卷积神经网络进行迭代训练,直至满足第一预设收敛条件,获得最终的卷积神经网络。其中,相邻两帧图像可以是原始视频中连续的两帧原始图像,也可以是从原始视频中按照一定时间间隔或帧图像间隔抽取出来的不连续的原始帧图像;
分别以上述样本视频中多个原始帧图像中的相邻两帧图像作为一个训练样本输入神经网络,以预先标注的人群计数图作为监督信号,对该神经网络进行迭代训练,直至满足第二预设收敛条件,获得最终的神经网络。
为了学习得到更好的神经网络,本申请上述实施例中,通过两个阶段对神经网络进行训练。在第一个阶段对人群密度图和人群速度图分别给出估计,这是两个相对简单且语义信息相对明显的任务;在第二阶段对人群计数图给出直接估计。
值得注意的是,在完成第一阶段的训练后,我们已经可以利用人群密度和速度图相乘得到人群计数图,但是实际应用中,由于在第一阶段的训练过程中没有对人群密度图和速度图给出空间位置匹配约束,得到的人群密度图和速度图可能存在空间位置上不匹配的情况。由于第二阶段训练的目标是由第一阶段输出的人群密度图和人群速度在对应位置的元素相乘得到,本申请实施例在第一阶段训练完成后,通过第二阶段训练,对此空间位置上的不匹配进行了矫正,以有效保证人群密度图和速度图在空间位置上的匹配;并且,在第二阶段利用人群计数图作为监督信号,更利于复杂的神经网络的学习,使得训练得到的神经网络具有更强、更准确的计数能力。
在本申请神经网络训练方法的另一个实施例中,还可以在神经网络进行迭代训练之前,执行如下操作:
分别针对样本视频中上述多个原始帧图像中的各帧图像进行行人定位,获得样本视频中各帧图像中的行人位置并对各行人分别分配行人标识(ID),其中,每个行人ID用于在 上述视频中唯一标识一个行人;
分别在样本视频中上述多个原始帧图像中的各帧图像标定各行人的行人信息,该行人信息包括行人位置与行人ID。
由于几何透视图包括样本视频中不同位置的像素数与场景的真实物理尺寸之间的对应关系,根据样本视频中多个原始帧图像的各帧图像中的行人位置与几何透视图,在样本视频中多个原始帧图像的各帧图像中标注各行人的行人信息,可以根据行人在真实场景中的位置以及行人的物理尺寸,在样本视频场景中以相应大小的图标标记该行人位置。例如,在t时刻对应的样本视频的帧图像中,可以标注行人人头的位置信息:Pt={Pt 1,…,Pt n,},其中t表示时刻,{1,…,n}表示每个行人的行人ID,此处示例性地以序号表示行人ID。
在神经网络的训练过程中,对样本视频中的行人进行标定并分配行人ID时,可以不针对样本视频中的每帧图像均进行标定,可以根据行人的运动情况和移动速度按照预设间隔(例如,1秒)进行标注,中间的帧图像的行人及其行人ID可以通过前后标定的两帧图像中的行人及其行人ID插值近似得到,以简化标注工作量。另外,可以样本视频中所有的原始帧图像均参加神经网络训练,也可以抽取其中一部分原始帧图像参与神经网络训练,这样可以在训练任务一定的情况下,训练较多的样本视频,参与神经网络训练的帧图像的总数量越多,神经网络的训练效果越好,训练得到的神经网络鲁棒性也更好。
图3为本申请神经网络训练方法一个实施例的流程图。预设的神经网络可以包括CNN和元素相乘网络。训练完成后得到的神经网络,也包括相应的CNN和元素相乘网络。在一个可选示例中,图3所示实施例可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的网络训练单元执行。如图3所示,该实施例神经网络训练方法包括:
302,预先针对样本视频的场景设置该样本视频的几何透视图,包括样本视频中不同位置的像素数与场景的真实物理尺寸之间的对应关系;分别针对样本视频中参与网络训练的多个原始帧图像中各帧图像进行行人定位,获得各帧图像中的行人位置并对各行人分别分配行人ID。
由于不同行人的身体大小不同、且人头更不易被遮挡,为了更准确、客观的表示行人位置,本申请实施例中可以各行人的人头的位置作为该行人位置。
304,根据上述样本视频的多个原始帧图像中各帧图像中的行人位置,分别在上述样本视频的多个原始帧图像中各帧图像标定各行人的行人信息,该行人信息包括行人位置与行人ID。
306,分别以上述样本视频中多个原始帧图像中的相邻两帧图像作为一个训练样本输入至卷积神经网络,以训练样本预先标注的人群密度图和人群速度图作为监督信号,对卷积神经网络进行迭代训练,直至满足第一预设收敛条件,获得最终的卷积神经网络。
其中,人群密度图用于表示当前帧图像中至少一个位置的人群密度,人群速度图用于表示当前帧图像中各行人从前帧图像移动到当前帧图像的速度。
在一个可选示例中,分别以上述样本视频中多个原始帧图像中的相邻两帧图像作为一个训练样本输入至卷积神经网络后,卷积神经网络以当前训练样本中的后帧图像作为当前帧图像,根据各帧图像标定的行人信息,生成当前帧图像的人群密度图,以及根据当前训练样本的两帧图像中的行人信息与几何透视图,生成当前帧图像的人群速度图;比较卷积 神经网络生成的人群密度图和人群速度图与被标注的人群密度图和人群速度图之间的偏差是否小于预设条件,或者卷积神经网络进行迭代训练的次数是否达到预设次数阈值;若上述偏差不小于预设条件或者迭代训练的次数未达到预设次数阈值,则对卷积神经网络的网络参数进行调整,然后继续该操作306,直至上述偏差小于预设条件或者迭代训练的次数达到预设次数阈值,对卷积神经网络的训练结束,获得最终的卷积神经网络。
可选地,上述样本视频中多个原始帧图像中的相邻两帧图像可以是连续的两帧原始图像,也可以是从连续的三帧以上的原始帧图像中顺序抽取的连续两帧原始图像,还可以是不连续的两帧原始图像,或从不连续的三帧以上的帧图像顺序抽取的两帧原始图像,或者还可以是原始图像的光流图像。在抽取的帧图像的数量大于两帧时,当前帧图像和前帧图像分别是在原始样本视频中位置靠后和靠前的两帧图像,不需要二者的帧序号连续。
其中一个可选示例中,卷积神经网络可以通过如下方式,生成当前帧图像的人群密度图:
根据当前帧图像中的行人信息,分别获取当前帧图像中至少一个位置的人群密度值;
根据当前帧图像中至少一个位置的人群密度值与几何透视图生成当前帧图像的人群密度图。
例如,根据各帧图像中的行人位置,分别在各帧图像中标注各行人位置后,可以获得帧图像中至少一个位置的人群密度值;通过如下公式可以计算获得帧图像中的人群密度图:
Figure PCTCN2017103530-appb-000003
其中,Dt(p)表示在帧图像中p位置的人群密度值;
Figure PCTCN2017103530-appb-000004
表示一个以人头标记P为中心的归一化的二维高斯分布在P位置的值;即:以一个高斯核来表示一个行人人头的位置);σP表示高斯分布的方差,σP的值可以根据每个样本视频场景的几何透视图来确定,以保证每个人有着相同的物理尺寸。
在另一个可选示例中,卷积神经网络可以通过如下方式,生成当前帧图像的人群速度图:
根据当前帧图像中各行人在前帧图像和当前帧图像中的位置差、以及前帧图像和当前帧图像的对应的时刻差获取当前帧图像中各行人的移动速度;
根据当前帧图像中各行人的移动速度和行人位置,获取当前帧图像中至少一个位置的人群速度;
根据当前帧图像中至少一个位置的人群速度与几何透视图生成当前帧图像的人群速度图。
例如,可以通过以下公式计算获得帧图像中的人群速度图:
Figure PCTCN2017103530-appb-000005
其中,Vt(p)表示在p位置的人群速度值;
Figure PCTCN2017103530-appb-000006
表示标记人头标记P在当前帧图像中的移动速度,可根据相邻两帧图像的位置差
Figure PCTCN2017103530-appb-000007
来得到;K(p;P,rP)是一个圆盘型的函数,圆盘的中心是人头标记P,半径为rP,半径rP可以通过如下方式选取:
由几何透视图,将根据经验设定的人头的真实物理尺寸换算成对应位置的像素数,例如,半径rP的取值可以根据经验选取为0.15m;K(p;P,rP)=1(p≤‖P-rP2)。
308,分别以样本视频中上述多个原始帧图像中的相邻两帧图像作为一个训练样本输入至神经网络,以预先标注的人群计数图作为监督信号,对神经网络进行迭代训练,直至满足第二预设收敛条件,获得最终的神经网络。
在一个可选示例中,分别以样本视频中上述多个原始帧图像中的相邻两帧图像作为一个训练样本输入至神经网络后,由神经网络中的卷积神经网络以当前训练样本中的后帧图像作为当前帧图像,根据样本视频中各帧图像标定的行人信息与几何透视图,生成当前帧图像的人群密度图,以及根据当前训练样本的两帧图像中的行人信息与几何透视图,生成当前帧图像的人群速度图并输入至神经网络中的元素相乘网络;由元素相乘网络将卷积神经网络输入的当前帧图像的人群密度图和人群速度图在对应位置的元素相乘,获得当前帧图像的人群计数图。比较元素相乘网络输出的人群计数图与预先标注的人群计数图之间的偏差是否小于预设条件,或者神经网络进行迭代训练的次数是否达到预设次数阈值;若上述偏差不小于预设条件或者迭代训练的次数未达到预设次数阈值,则对元素相乘网络的网络参数进行调整,然后继续执行该操作308,直至上述偏差小于预设条件或者迭代训练的次数达到预设次数阈值,对神经网络的训练结束,获得最终的元素相乘网络,从而获得最终的神经网络。
为了获得人群计数图,本申请上述各实施例中,先基于样本视频中多个原始帧图像中的至少两帧图像和几何透视图,获得帧图像的人群密度图和人群速度图,假设行人在两帧的密度分布和行走速度保持不变,将当前帧图像的人群密度图和人群速度图在对应位置的元素相乘来获得帧图像的人群计数图,实现了人群计数图的便捷获取。
在本申请图3所示实施例中,引入深度学习的模型---神经网络,直接以原始视频作为训练样本视频,将其中的帧图像作为卷积神经网络的输入,并且利用语义信息丰富、像素级的人群密度图与被标注的人群密度图、人群计数图作为监督信号,并且,训练时基于人群计数图进行跨线计数而非只使用人群总数,还考虑了人群的分布情况,从而训练得到用于进行跨线计数的神经网络,具有高度的鲁棒性,对于人群密度较大、人群移动速度低或静止不动的极端情况同样适用,并且可以跨场景应用,不存在时序切片图像中的行人的可辨识度低、无法估计时序切片图像中的人群数量的问题,可以使得跨线计数结果更加客观、准确。
为了学习得到更好的神经网络,图3所示的实施例中,通过两个阶段对神经网络进行训练。第一个阶段对应于操作306,通过卷积神经网络对人群密度图和人群速度图分别给出估计,这是两个相对简单且语义信息相对明显的任务;第二阶段对应于操作308,通过元素相乘网络对人群计数图给出直接估计。
如图4所示,为本申请实施例中通过两个阶段对神经网络进行训练的一个示意图。将样本视频中的相邻两帧图像作为一个训练样本输入神经网络中的卷积神经网络,在第一阶段,由卷积神经网络输出人群密度图和人群速度图;将该人群密度图和人群速度图输入至神经网络中的元素相乘网络,在第二阶段,由元素相乘网络输出人群计数图。
值得注意的是,在完成第一阶段的训练后,我们已经可以利用人群密度和速度图相乘 得到人群计数图,但是实际应用中,由于在第一阶段的训练过程中没有对人群密度图和速度图给出空间位置匹配约束,得到的人群密度图和速度图可能存在空间位置上不匹配的情况。由于第二阶段训练的目标是由第一阶段输出的两个人群密度图和人群速度在对应位置的元素相乘得到,本申请实施例通过第二阶段训练,对此空间位置上的不匹配进行了矫正,以便有效保证人群密度图和速度图在空间位置上的匹配;并且,在第二阶段利用人群计数图作为监督信号,更利于复杂神经网络的学习,使得训练得到的神经网络具有更强、更准确的计数能力。
在图3所示实施例的一个可选示例中,例如可以在满足以下任意一个或多个条件时,认为满足第一预设收敛条件:
针对各样本视频中的多个原始帧图像,卷积神经网络输出的人群密度图和人群速度图与预先标注的人群密度图和人群速度图一致的图像的帧数与上述多个原始帧图像的帧数的比例达到第一预设阈值,即:卷积神经网络输出的人群密度图与预先标注的人群密度图和人群速度图一致的图像的帧数与输入至卷积神经网络的样本视频中图像的帧数的比例达到第一预设阈值,同时,卷积神经网络输出的人群速度图与预先标注的人群速度图一致的图像的帧数与输入至卷积神经网络的样本视频中图像的帧数的比例达到第一预设阈值;
针对各样本视频中上述多个原始帧图像中的各帧图像,卷积神经网络输出的人群密度图与预先标注的人群密度图之间的相似度、卷积神经网络输出的人群速度图与预先标注的人群速度图之间的相似度大于第二预设阈值;
针对各样本视频中的上述多个原始帧图像,卷积神经网络输出的人群密度图与预先标注的人群密度图之间的平均相似度、卷积神经网络输出的人群速度图与预先标注的人群速度图之间的平均相似度大于第三预设阈值;
卷积神经网络进行迭代训练的次数达到第四预设阈值。
在图3所示实施例的另一个可选示例中,例如可以在满足以下任意一个或多个条件时,认为满足第二预设收敛条件:
针对各样本视频中上述多个原始帧图像,元素相乘网络输出的人群计数图与预先标注的人群计数图一致的图像的帧数与上述多个原始帧图像的帧数的比例达到第五预设阈值;
针对各样本视频中的各帧图像,元素相乘网络输出的人群计数图与预先标注的人群计数图之间的相似度大于第六预设阈值;
针对各样本视频中的所有帧图像,元素相乘网络输出的人群计数图与人工标注得到的人群计数图之间的平均相似度大于第七预设阈值;
神经网络第二部分进行迭代训练的次数达到第八预设阈值。
其中,可以根据实际需求,设置在满足包括但不限于以下任意一个或多个条件时,认为人群密度图与预先标注的人群密度图(或人群速度图与预先标注的人群速度图)一致:
卷积神经网络输出的人群密度图与预先标注的人群密度图(或卷积神经网络输出的人群速度图与预先标注的人群速度图)的图像特征相同;
预先标注的人群密度图的图像特征包括但多于卷积神经网络输出的人群密度图的图像特征(或预先标注的人群速度图的图像特征包括但多于卷积神经网络输出的人群速度图的图像特征);
卷积神经网络输出的人群密度图与预先标注的人群密度图(或卷积神经网络输出的人群速度图与预先标注的人群速度图)的图像特征之间的相同特征达到一定数量或者预设的一定比例;
卷积神经网络输出的人群密度图与预先标注的人群密度图(或卷积神经网络输出的人群速度图与预先标注的人群速度图)的图像特征之间的相同特征满足其他预设条件。
另外,也可以根据实际需求,设置在满足包括但不限于以下任意一个或多个条件时,认为元素相乘网络输出的人群计数图与预先标注的人群计数图一致:
元素相乘网络输出的人群计数图与预先标注的人群密度图的图像特征相同;
预先标注的人群密度图的图像特征包括但多于元素相乘网络输出的人群计数图的图像特征;
元素相乘网络输出的人群计数图与预先标注的人群计数图的图像特征之间的相同特征达到一定数量或者预设的一定比例;
元素相乘网络输出的人群计数图与预先标注的人群计数图的图像特征之间的相同特征满足其他预设条件。
另外,在本申请各实施例的一个可选示例中,两个图之间的相似度,例如,卷积神经网络输出的人群密度图与预先标注的人群密度图之间的相似度,卷积神经网络输出的人群速度图与预先标注的人群速度图之间的相似度,元素相乘网络输出的人群计数图与预先标注的人群密度图之间的相似度,可以通过两图的欧式距离(L2)距离衡量。因此,可以先获取两个图之间的欧式距离,并比较该两个图之间的欧式距离是否大于预设阈值,来确认两个图之间的相似度是否大于该预设阈值。
另外,本申请上述实施例还提供了一种基于上述神经网络训练方法训练得到的神经网络进行视频中人群跨线计数的方法。
示例性地,基于上述神经网络训练方法训练得到的神经网络,可以通过本申请上述任一实施例的跨线计数方法实施例,对视频中人群进行跨线计数。此处不再赘述。
通过本申请上述神经网络训练方法训练得到的神经网络后,可以基于该神经网络获取视频中帧图像的人群计数图,以便进行视频中人群跨线计数。将需要进行跨线计数的视频的原始帧图像输入该神经网络,该神经网络便可以通过但不限于如本申请上述任一实施例记载的操作输出帧图像的人群计数图。另外,本申请上述实施例的跨线计数方法使用的神经网络可以基于本申请上述任一实施例的神经网络训练方法获得,也可以通过其他的训练方法获得,只要训练得到的神经网络可以针对输入的原始帧图像输出帧图像的人群计数图即可。
本申请实施例提供的任一种跨线计数方法和神经网络训练方法可以由任意适当的具有数据处理能力的设备执行,包括但不限于:终端设备和服务器等。或者,本申请实施例提供的任一种跨线计数方法和神经网络训练方法可以由处理器执行,如处理器通过调用存储器存储的相应指令来执行本申请实施例提及的任一种跨线计数方法和神经网络训练方法。下文不再赘述。
本领域普通技术人员可以理解:实现上述物体检测方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该 程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
图5为本申请跨线计数装置一个实施例的结构示意图。该实施例的跨线计数装置可用于实现本申请上述各跨线计数方法实施例。如图5所示,该实施例的跨线计数装置包括:第一获取单元,第二获取单元和第三获取单元。其中:
第一获取单元,作为神经网络,用于接收需要进行跨线计数的视频中与待分析时间段T对应的多个原始帧图像,输出原始帧图像的人群计数图。
其中,人群计数图中包括帧图像中至少一个位置的计数向量,该计数向量用于表示多个原始帧图像中各帧图像与相邻的前帧图像之间在计数方向(例如二维坐标平面的两个坐标方向)分别通过的人数。
示例性地,第一获取单元,可选用于依次从视频中与待分析时间段T对应的多个原始帧图像中,顺序抽取至少两帧图像,以该至少两帧图像中的后帧图像作为当前帧图像,生成当前帧图像的人群计数图。
作为本申请各实施例的一个可选示例,本申请各实施例中的神经网络可以是深度神经网络。
第二获取单元,用于分别以多个原始帧图像中的各帧图像作为当前帧图像,针对视频中要进行跨线计数的LOI,根据当前帧图像的人群计数图,获取当前帧图像分别从至少一个方向通过该LOI的人数,例如可以获取当前帧图像从一个方向通过该LOI的人数,也可以获取当前帧图像分别从两个方向通过LOI的人数。
示例性地,第二获取单元可选可用于分别将人群计数图中在该LOI上至少一个位置的计数向量在该LOI的法线方向进行投影,获得该LOI上至少一个位置的标量值,该标量值的正负表示LOI的两个方向;以及分别对该LOI上的正标量值和负标量值进行累加,获得当前帧图像在LOI上的两个方向分别通过的人数。
第三获取单元,用于分别累积上述至少一个方向上多个原始帧图像通过上述LOI的人数,获得待分析时间段T内,该LOI在上述至少一个方向上的跨线人数。
示例性地,第二获取单元获取当前帧图像从一个方向通过LOI的人数时,第三获取单元相应累积该方向上上述多个原始帧图像中的各帧图像通过该LOI的人数,获得待分析时间段T内,该LOI在该方向上的跨线人数。第二获取单元获取当前帧图像分别从两个方向通过该LOI的人数时,第三获取单元分别累积这两个方向上上述多个原始帧图像中的各帧图像通过该LOI的人数,获得待分析时间段T内,该LOI分别在这两个方向上的跨线人数。
基于本申请实施例的人群跨线计数装置,通过神经网络分别获取视频中与待分析时间段T对应的各帧图像的人群计数图,分别针对各帧图像,根据人群计数图获取从至少一个方向通过LOI的人数,并分别累积至少一个方向上多个原始帧图像通过LOI的人数,获得待分析时间段T内LOI在至少一个方向上的跨线人数。由于本申请实施例直接以原始视频中的各帧图像作为输入而未使用时序切片图像,鲁棒性更好,可以应用于各种不同的场景,对于人群密度较大、人群移动速度低或静止不动的极端情况同样适用,并且可以跨场景应用;并且,本申请实施例基于人群计数图进行跨线计数而非只使用人群总数,还考虑了人群的分布情况,有利于提高跨线计数结果的客观性和准确性。
图6为本申请跨线计数装置另一个实施例的结构示意图。如图6所示,与图5所示的实施例相比,该实施例的跨线计数装置中,第一获取单元可选包括卷积神经网络和元素相乘网络。其中:
卷积神经网络,用于接收输入的至少两帧图像,以该至少两帧图像中的后帧图像作为当前帧图像,生成当前帧图像的人群密度图和人群速度图。其中,人群密度图用于表示当前帧图像中至少一个位置的人群密度,人群速度图用于表示当前帧图像中各行人从相邻的前帧图像移动到当前帧图像的速度。
示例性地,卷积神经网络生成当前帧图像的人群密度图时,可用于根据当前帧图像中的行人信息,分别获取当前帧图像中至少一个位置的人群密度值;以及根据当前帧图像中至少一个位置的人群密度值生成当前帧图像的人群密度图;生成当前帧图像的人群速度图时,可用于根据当前训练样本中当前帧图像中各行人在前帧图像和当前帧图像中的位置差、以及前帧图像和当前帧图像的对应的时刻差获取当前帧图像中各行人的移动速度;根据当前帧图像中各行人的移动速度和行人位置,获取当前帧图像中至少一个位置的人群速度;以及根据当前帧图像中至少一个位置的人群速度生成当前帧图像的人群速度图。
元素相乘网络,用于将当前帧图像的人群密度图和人群速度图在对应位置的元素相乘,获得当前帧图像的人群计数图。
进一步地,再参见图6,在本申请跨线计数装置的另一个实施例中,还可以包括计算单元,用于对LOI在两个方向上的跨线人数进行累加,获得待分析时间段T内,通过该LOI的跨线总人数。
图7为本申请神经网络训练装置一个实施例的结构示意图。本申请各实施例的神经网络训练装置可用于实现本申请上述各神经网络训练方法实施例。如图7所示,该实施例的神经网络训练装置包括网络训练单元,用于将样本视频的多个原始帧图像输入至神经网络,以样本视频中的多个原始帧图像预先标注的人群计数图作为监督信号,对神经网络进行迭代训练直至满足预设条件,获得最终的神经网络。其中的神经网络包括CNN和元素相乘网络。
作为本申请各实施例的一个可选示例,本申请各实施例中的神经网络可以是深度神经网络。
基于本申请上述实施例提供的神经网络训练装置,将原始的样本视频输入至神经网络,以该样本视频中的多个原始帧图像预先标注的人群计数图作为监督信号,对神经网络进行迭代训练直至满足预设条件,以便基于训练好的神经网络进行人群跨线计数。由于该神经网络直接以原始视频中的原始帧图像作为输入而未使用时序切片图像,鲁棒性更好,可以应用于各种不同的场景,可以避免视频中人群密度较大、人群移动速度低或静止不动、或者监控摄像头视角比较低等情形时,时序切片图像中的行人的可辨识度低、无法估计时序切片图像中的人群数量的问题,对于人群密度较大、人群移动速度低或静止不动的情形同样适用,并且可以跨场景应用;并且,本申请实施例训练神经网络时基于人群计数图进行跨线计数而非只使用人群总数,还考虑了人群的分布情况,有利于提高跨线计数结果的客观性和准确性。
在上述神经网络训练装置实施例的一个可选示例中,多个原始帧图像分别被标注人群 密度图和人群速度图、人群计数图。相应地,该实施例中,网络训练单元可示例性地通过如下方式对神经网络进行训练:
分别以样本视频中上述多个原始帧图像中的相邻两帧图像作为一个训练样本输入至卷积神经网络,以该训练样本预先标注的人群密度图和人群速度图作为监督信号,对卷积神经网络进行迭代训练直至满足第一预设收敛条件,获得最终的卷积神经网络;以及
分别以样本视频中上述多个原始帧图像中的相邻两帧图像作为一个训练样本输入神经网络,以该训练样本预先标注的人群计数图作为监督信号,对神经网络进行迭代训练直至满足第二预设收敛条件,获得最终的神经网络。
其中,对神经网络的训练过程、满足第一预设收敛条件、以及满足第二预设收敛条件的实现,可以参考上述图3所示实施例的记载,此处不再赘述。
本申请实施例还提供了一种电子设备,包括本申请上述任一实施例提供的跨线计数装置、或者神经网络训练装置。
可选地,本申请实施例的电子设备可以是任意具有数据处理功能的设备,例如可以包括但不限于:进阶精简指令集机器(ARM)、中央处理单元(CPU)或图形处理单元(GPU)等。
另外,本申请实施例还提供了另一种电子设备,包括:处理器和本申请上述任一实施例的跨线计数装置;在处理器运行本申请上述任一实施例的跨线计数装置时,本申请上述任一实施例的跨线计数装置中的单元被运行。
另外,本申请实施例还提供了又一种电子设备,包括:
处理器和本申请上述任一实施例的神经网络训练装置;
在处理器运行本申请上述任一实施例的神经网络训练装置时,本申请上述任一实施例的神经网络训练装置中的单元被运行。
另外,本申请实施例还提供了再一种电子设备,包括:处理器、存储器、通信接口和通信总线,处理器、存储器和通信接口通过通信总线完成相互间的通信;
存储器用于存放至少一可执行指令,可执行指令使处理器执行本申请上述任一实施例方法对应的操作。
参照图8,示出了根据本申请种电子设备一实施例的结构示意图,本申请具体实施例并不对电子设备的具体实现做限定。如图8所示,该电子设备可以包括:处理器(processor)、通信接口(Communications Interface)、存储器(memory)、以及通信总线。其中:
处理器、通信接口、以及存储器通过通信总线完成相互间的通信。
通信接口,用于与其它设备比如其它客户端或服务器等的网元通信。
处理器可能是中央处理器(CPU),或者是特定集成电路(Application Specific Integrated Circuit,ASIC),或者是被配置成实施本申请实施例的一个或多个集成电路,或者是图形处理器(Graphics Processing Unit,GPU)。终端设备包括的一个或多个处理器,可以是同一类型的处理器,如一个或多个CPU,或者,一个或多个GPU;也可以是不同类型的处理器,如一个或多个CPU以及一个或多个GPU。
存储器,用于至少一可执行指令,该可执行指令使处理器执行如本申请上述任一实施例跨线计数方法和神经网络训练方法对应的操作。存储器可能包含高速随机存取存储器 (random access memory,RAM),也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。图9为本申请电子设备一个实施例的结构示意图。如图9所示,用于实现本申请实施例的电子设备包括中央处理单元(CPU)或者图形处理单元(GPU),其可以根据存储在只读存储器(ROM)中的可执行指令或者从存储部分加载到随机访问存储器(RAM)中的可执行指令而执行各种适当的动作和处理。中央处理单元或者图形处理单元可与只读存储器和/或随机访问存储器中通信以执行可执行指令从而完成本申请实施例提供的跨线计数方法对应的操作,例如:向神经网络输入需要进行跨线计数的视频中与待分析时间段T对应的多个原始帧图像,由所述神经网络输出所述多个原始帧图像的人群计数图;所述人群计数图中包括每个位置的计数向量,所述计数向量用于表示所述多个原始帧图像中各帧图像与相邻的前帧图像之间在计数方向分别通过的人数;分别以所述多个原始帧图像中的各帧图像作为当前帧图像,针对视频中要进行跨线计数的感兴趣线LOI,根据当前帧图像的人群计数图,获取当前帧图像分别从至少一个方向通过所述LOI的人数;分别累积所述至少一个方向上所述多个原始帧图像通过所述LOI的人数,获得所述待分析时间段T内,所述LOI在所述至少一个方向上的跨线人数。另外,中央处理单元或者图形处理单元可与只读存储器和/或随机访问存储器中通信以执行可执行指令从而完成本申请实施例提供的神经网络训练方法对应的操作,例如:将样本视频的多个原始帧图像输入至神经网络,以所述多个原始帧图像预先标注的人群计数图作为监督信号,对所述神经网络进行迭代训练直至满足预设条件;所述神经网络包括卷积神经网络和元素相乘网络。
此外,在RAM中,还可存储有系统操作所需的各种程序和数据。CPU、GPU、ROM以及RAM通过总线彼此相连。输入/输出(I/O)接口也连接至总线。
以下部件连接至I/O接口:包括键盘、鼠标等的输入部分;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分;包括硬盘等的存储部分;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分。通信部分经由诸如因特网的网络执行通信处理。驱动器也根据需要连接至I/O接口。可拆卸介质,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器上,以便于从其上读出的计算机程序根据需要预先安装入存储部分。
另外,本申请实施例还提供了一种计算机程序,包括计算机可读代码,当计算机可读代码在设备上运行时,该设备中的处理器执行用于实现本申请上述任一实施例方法中各步骤的指令。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括有形地包含在机器可读介质上的计算机程序,计算机程序包含用于执行流程图所示的方法的程序代码,所述程序代码可包括对应执行本申请实施例提供的任一项跨线计数方法步骤对应的指令,例如,向神经网络输入需要进行跨线计数的视频中与待分析时间段T对应的多个原始帧图像,由所述神经网络输出所述多个原始帧图像的人群计数图的指令;所述人群计数图中包括每个位置的计数向量,所述计数向量用于表示所述多个原始帧图像中各帧图像与相邻的前帧图像之间在计数方向分别通过的人数;分别以所述多个原始帧图像中的各帧图像作为当前帧图像,针对视频中要进行跨线计数的感兴趣线LOI,根据当前帧图像的人群计数图,获取当前帧 图像分别从至少一个方向通过所述LOI的人数的指令;分别累积所述至少一个方向上所述多个原始帧图像通过所述LOI的人数,获得所述待分析时间段T内,所述LOI在所述至少一个方向上的跨线人数的指令。所述程序代码还可包括对应执行本申请实施例提供的任一项神经网络训练方法步骤对应的指令,例如,将样本视频中的多个原始帧图像输入至神经网络,以所述多个原始帧图像预先标注的人群计数图作为监督信号,对所述神经网络进行迭代训练直至满足预设条件的指令;所述神经网络包括卷积神经网络和元素相乘网络。该计算机程序可以通过通信部分从网络上被下载和安装,和/或从可拆卸介质被安装。在该计算机程序被中央处理单元(CPU)或图形处理单元(GPU)执行时,执行本申请的方法中限定的上述功能。
本申请实施例还提供了一种计算机存储介质,用于存储计算机可读取的指令,该指令被执行时实现本申请上述任一实施例方法中各步骤的操作。在其中一个可选示例中,所述指令包括:向神经网络输入需要进行跨线计数的视频中与待分析时间段T对应的多个原始帧图像,由所述神经网络输出所述多个原始帧图像的人群计数图的指令;所述人群计数图中包括每个位置的计数向量,所述计数向量用于表示所述多个原始帧图像中各帧图像与相邻的前帧图像之间在计数方向分别通过的人数;分别以所述多个原始帧图像中的各帧图像作为当前帧图像,针对视频中要进行跨线计数的感兴趣线LOI,根据当前帧图像的人群计数图,获取当前帧图像分别从至少一个方向通过所述LOI的人数的指令;分别累积所述至少一个方向上所述多个原始帧图像通过所述LOI的人数,获得所述待分析时间段T内,所述LOI分别在所述至少一个方向上的跨线人数的指令。或者,所述指令包括:将样本视频中的多个原始帧图像输入至神经网络,以所述多个原始帧图像预先标注的人群计数图作为监督信号,对所述神经网络进行迭代训练直至满足预设条件的指令;所述神经网络包括卷积神经网络和元素相乘网络。
本申请实施例可以应用于所有需要进行人群流量统计的场景,例如:
场景1:需要统计地铁出入口在待分析时间段T的跨线人数时,通过监控摄像头采集地铁各出入口的视频,分别以地铁各出入口作为LOI,将地铁各出入口在待分析时间段T的视频输入至本申请实施例的神经网络,通过本申请实施例的跨线计数方法,即可获得地铁各出入口在待分析时间段T的跨线人数,从而实现对各出入口的人流情况进行统计、分析,可以对地铁运营负荷给出估计,方便优化车辆调度并合理管控人流保障旅客安全和顺利出行;
场景2:针对城市群众游行,通过街道监控摄像头采集游行街道的视频,在游行街道上宽度方向设置LOI,将游行街道在LOI上上待分析时间段T的视频输入至本申请实施例的神经网络,通过本申请实施例的跨线计数方法,即可获得给出参加游行人数及人群移动状态,方便调配警力保障游行有序和公众安全;
场景3:针对景区或者公共体育场馆,亦可通过监控摄像头采集景区或者公共体育场馆的视频,在景区或场馆的出入口设置LOI,将景区或者公共体育场馆的视频输入至本申请实施例的神经网络,通过本申请实施例的跨线计数方法,即可对进出景区或场馆的人群进行统计,从而合理管控人流,避免过于拥挤发生踩踏事故等危险。
本说明书中各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其它实 施例的不同之处,各个实施例之间相同或相似的部分相互参见即可。对于装置、设备实施例而言,由于其与方法实施例基本对应,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
可能以许多方式来实现本申请的方法、装置和设备。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本申请的方法、装置和设备。用于所述方法的步骤的上述顺序仅是为了进行说明,本申请的方法的步骤不限于以上可选描述的顺序,除非以其它方式特别说明。此外,在一些实施例中,还可将本申请实施为记录在记录介质中的程序,这些程序包括用于实现根据本申请的方法的机器可读指令。因而,本申请还覆盖存储用于执行根据本申请的方法的程序的记录介质。
本说明书中各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似的部分相互参见即可。对于装置、设备实施例而言,由于其与方法实施例基本对应,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
可能以许多方式来实现本申请的方法和装置、设备。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本申请的方法和装置、设备。用于所述方法的步骤的上述顺序仅是为了进行说明,本申请的方法的步骤不限于以上可选描述的顺序,除非以其它方式特别说明。此外,在一些实施例中,还可将本申请实施为记录在记录介质中的程序,这些程序包括用于实现根据本申请的方法的机器可读指令。因而,本申请还覆盖存储用于执行根据本申请的方法的程序的记录介质。
本申请的描述是为了示例和描述起见而给出的,而并不是无遗漏的或者将本申请限于所公开的形式。很多修改和变化对于本领域的普通技术人员而言是显然的。选择和描述实施例是为了更好说明本申请的原理和实际应用,并且使本领域的普通技术人员能够理解本申请从而设计适于特定用途的带有各种修改的各种实施例。

Claims (34)

  1. 一种跨线计数方法,其特征在于,包括:
    向神经网络输入需要进行跨线计数的视频中与待分析时间段T对应的多个原始帧图像,由所述神经网络输出所述多个原始帧图像的人群计数图;所述人群计数图中包括帧图像中至少一个位置的计数向量,所述计数向量用于表示所述多个原始帧图像中各帧图像与相邻的前帧图像之间在计数方向通过的人数;
    分别以所述多个原始帧图像中的各帧图像作为当前帧图像,针对视频中要进行跨线计数的感兴趣线LOI,根据所述当前帧图像的人群计数图,获取所述当前帧图像从至少一个方向通过所述LOI的人数;
    分别累积所述至少一个方向上所述多个原始帧图像通过所述LOI的人数,获得所述待分析时间段T内,所述LOI在所述至少一个方向上的跨线人数。
  2. 根据权利要求1所述的方法,其特征在于,所述计数方向包括二维坐标平面的两个坐标方向。
  3. 根据权利要求1或2所述的方法,其特征在于,所述获取所述当前帧图像从至少一个方向通过所述LOI的人数,包括:获取所述当前帧图像分别从两个方向通过所述LOI的人数;
    所述分别累积所述至少一个方向上所述多个原始帧图像通过所述LOI的人数,获得所述待分析时间段T内,所述LOI在所述至少一个方向上的跨线人数,包括:
    分别累积所述两个方向上所述多个原始帧图像通过所述LOI的人数,获得所述待分析时间段T内,所述LOI分别在所述两个方向上的跨线人数。
  4. 根据权利要求1至3任意一项所述的方法,其特征在于,所述向神经网络输入需要进行跨线计数的视频中与待分析时间段T对应的多个原始帧图像之后,还包括:
    所述神经网络依次从所述视频中与待分析时间段T对应的多个原始帧图像中,顺序抽取至少两帧图像,以所述至少两帧图像中的后帧图像作为当前帧图像,生成所述当前帧图像的人群计数图。
  5. 根据权利要求4所述的方法,其特征在于,所述生成所述当前帧图像的人群计数图包括:
    利用所述神经网络中的卷积神经网络,生成所述当前帧图像的人群密度图和人群速度图;所述人群密度图用于表示所述当前帧图像中至少一个位置的人群密度,所述人群速度图用于表示当前帧图像中各行人从相邻的前帧图像移动到所述当前帧图像的速度;
    利用所述神经网络中的元素相乘网络,将所述当前帧图像的人群密度图和人群速度图在对应位置的元素相乘,获得所述当前帧图像的人群计数图。
  6. 根据权利要求3至5任意一项所述的方法,其特征在于,所述获取所述当前帧图像分别从两个方向通过所述LOI的人数,包括:
    分别将所述人群计数图中在所述LOI上至少一个位置的计数向量在所述LOI的法线方向进行投影,获得所述LOI上至少一个位置的标量值,所述标量值的正负表示所述LOI的两个方向;
    分别对所述LOI上的正标量值和负标量值进行累加,获得所述当前帧图像在所述LOI的两个方向分别通过的人数。
  7. 根据权利要求3至6任意一项所述的方法,其特征在于,还包括:
    对所述LOI在所述两个方向上的跨线人数进行累加,获得所述待分析时间段T内,通过所述LOI的跨线总人数。
  8. 一种神经网络训练方法,其特征在于,包括:
    将样本视频的多个原始帧图像输入至神经网络,以所述多个原始帧图像预先标注的人群计数图作为监督信号,对所述神经网络进行迭代训练直至满足预设条件;所述神经网络包括卷积神经网络和元素相乘网络。
  9. 根据权利要求8所述的方法,其特征在于,所述多个原始帧图像分别标注有人群密度图和人群速度图、人群计数图;
    所述将样本视频的多个原始帧图像输入至神经网络,以所述多个原始帧图像预先标注的人群计数图作为监督信号,对所述神经网络进行迭代训练直至满足预设条件,包括:
    分别以所述多个原始帧图像中的相邻两帧图像作为一个训练样本输入至所述卷积神经网络,以所述训练样本预先标注的人群密度图和人群速度图作为监督信号,对所述卷积神经网络进行迭代训练直至满足第一预设收敛条件;以及
    分别以所述多个原始帧图像中的相邻两帧图像作为一个训练样本输入所述神经网络,以所述训练样本预先标注的人群计数图作为监督信号,对所述神经网络进行迭代训练直至满足第二预设收敛条件。
  10. 根据权利要求9所述的方法,其特征在于,还包括:
    分别针对所述多个原始帧图像中的各帧图像进行行人定位,获得各帧图像中的行人位置并对各行人分别分配行人标识ID,其中,行人ID用于在所述视频中唯一标识一个行人;
    分别在所述各帧图像标定各行人的行人信息,所述行人信息包括行人位置与行人ID。
  11. 根据权利要求10所述的方法,其特征在于,还包括:
    预先针对所述样本视频的场景设置所述样本视频的几何透视图;所述几何透视图包括所述样本视频中不同位置的像素数与所述场景的真实物理尺寸之间的对应关系;
    分别以所述多个原始帧图像中的相邻两帧图像作为一个训练样本输入至所述卷积神经网络之后,还包括:
    所述卷积神经网络以当前训练样本中的后帧图像作为当前帧图像,根据所述各帧图像标定的行人信息与所述几何透视图,生成所述当前帧图像的人群密度图,以及根据当前训练样本的两帧图像中的行人信息与所述几何透视图,生成所述当前帧图像的人群速度图。
  12. 根据权利要求11所述的方法,其特征在于,所述生成所述当前帧图像的人群密度图,包括:
    根据所述当前帧图像中的行人信息与所述几何透视图,分别获取所述当前帧图像中至少一个位置的人群密度值;
    根据所述当前帧图像中至少一个位置的人群密度值生成所述当前帧图像的人群密度图。
  13. 根据权利要求11或12所述的方法,其特征在于,所述生成所述当前帧图像的人 群速度图,包括:
    根据当前训练样本中当前帧图像中各行人在前帧图像和当前帧图像中的位置差、以及所述前帧图像和所述当前帧图像的对应的时刻差,获取所述当前帧图像中各行人的移动速度;
    根据所述当前帧图像中各行人的移动速度和行人位置,获取所述当前帧图像中至少一个位置的人群速度;
    根据所述当前帧图像中至少一个位置的人群速度与所述几何透视图,生成所述当前帧图像的人群速度图。
  14. 根据权利要求9至13任意一项所述的方法,其特征在于,所述满足第一预设收敛条件,包括:
    针对所述多个原始帧图像,所述卷积神经网络输出的人群密度图和人群速度图与预先标注的人群密度图和人群速度图一致的图像的帧数与上述多个原始帧图像的帧数的比例达到第一预设阈值;和/或
    针对所述多个原始帧图像中的各帧图像,所述卷积神经网络输出的人群密度图与预先标注的人群密度图之间的相似度、所述卷积神经网络输出的人群速度图与预先标注的人群速度图之间的相似度大于第二预设阈值;和/或
    针对所述多个原始帧图像,所述卷积神经网络输出的人群密度图与预先标注的人群密度图之间的平均相似度、所述卷积神经网络输出的人群速度图与预先标注的人群速度图之间的平均相似度大于第三预设阈值;和/或
    所述卷积神经网络进行迭代训练的次数达到第四预设阈值。
  15. 根据权利要求9至13任意一项所述的方法,其特征在于,所述满足第二预设收敛条件包括:
    针对所述多个原始帧图像,所述元素相乘网络输出的人群计数图与预先标注的人群计数图一致的图像的帧数与上述多个原始帧图像的帧数的比例达到第五预设阈值;和/或
    针对所述多个原始帧图像中的各帧图像,所述元素相乘网络输出的人群计数图与预先标注的人群计数图之间的相似度大于第六预设阈值;和/或
    针对所述多个原始帧图像,所述元素相乘网络输出的人群计数图与预先标注得到的人群计数图之间的平均相似度大于第七预设阈值;和/或
    所述神经网络第二部分进行迭代训练的次数达到第八预设阈值。
  16. 一种基于权利要求8至15任一所述的神经网络训练方法训练得到的神经网络进行视频中人群跨线计数的方法。
  17. 一种跨线计数装置,其特征在于,包括:
    第一获取单元,作为神经网络,用于接收需要进行跨线计数的视频中与待分析时间段T对应的多个原始帧图像,输出所述原始帧图像的人群计数图;所述人群计数图中包括帧图像中至少一个位置的计数向量,所述计数向量用于表示所述多个原始帧图像中各帧图像与相邻的前帧图像之间在计数方向通过的人数;
    第二获取单元,用于分别以所述多个原始帧图像中的各帧图像作为当前帧图像,针对视频中要进行跨线计数的感兴趣线LOI,根据所述当前帧图像的人群计数图,获取所述当 前帧图像从至少一个方向通过所述LOI的人数;
    第三获取单元,用于分别累积所述至少一个方向上所述多个原始帧图像通过所述LOI的人数,获得所述待分析时间段T内,所述LOI在所述至少一个方向上的跨线人数。
  18. 根据权利要求17所述的装置,其特征在于,所述计数方向包括二维坐标平面的两个坐标方向。
  19. 根据权利要求17或18所述的装置,其特征在于,所述第二获取单元用于获取所述当前帧图像分别从两个方向通过所述LOI的人数;
    所述第三获取单元用于分别累积所述两个方向上所述多个原始帧图像通过所述LOI的人数,获得所述待分析时间段T内,所述LOI分别在所述两个方向上的跨线人数。
  20. 根据权利要求17至19任意一项所述的装置,其特征在于,所述第一获取单元用于依次从所述视频中与待分析时间段T对应的多个原始帧图像中,顺序抽取至少两帧图像,以所述至少两帧图像中的后帧图像作为当前帧图像,生成所述当前帧图像的人群计数图。
  21. 根据权利要求20所述的装置,其特征在于,所述第一获取单元包括:
    卷积神经网络,用于接收输入的至少两帧图像,以所述至少两帧图像中的后帧图像作为当前帧图像,生成所述当前帧图像的人群密度图和人群速度图;所述人群密度图用于表示所述当前帧图像中至少一个位置的人群密度,所述人群速度图用于表示所述当前帧图像中各行人从相邻的前帧图像移动到当前帧图像的速度;
    元素相乘网络,用于将所述当前帧图像的人群密度图和人群速度图在对应位置的元素相乘,获得所述当前帧图像的人群计数图。
  22. 根据权利要求17至21任意一项所述的装置,其特征在于,第二获取单元用于:
    分别将所述人群计数图中在所述LOI上至少一个位置的计数向量在所述LOI的法线方向进行投影,获得所述LOI上至少一个位置的标量值,所述标量值的正负表示所述LOI的两个方向;以及
    分别对所述LOI上的正标量值和负标量值进行累加,获得所述当前帧图像在所述LOI的两个方向分别通过的人数。
  23. 根据权利要求19至22任意一项所述的装置,其特征在于,还包括:
    计算单元,用于对所述LOI在所述两个方向上的跨线人数进行累加,获得所述待分析时间段T内,通过所述LOI的跨线总人数。
  24. 一种神经网络训练装置,其特征在于,包括:
    网络训练单元,用于将样本视频的多个原始帧图像输入至神经网络,以所述多个原始帧图像预先标注的人群计数图作为监督信号,对所述神经网络进行迭代训练直至满足预设条件;所述神经网络包括卷积神经网络CNN和元素相乘网络。
  25. 根据权利要求24所述的装置,其特征在于,所述多个原始帧图像分别被标注人群密度图和人群速度图、人群计数图;
    所述网络训练单元用于:
    分别所述多个原始帧图像中的相邻两帧图像作为一个训练样本输入至所述卷积神经网络,以所述训练样本预先标注的人群密度图和人群速度图作为监督信号,对所述卷积神经网络进行迭代训练直至满足第一预设收敛条件;以及
    分别以所述多个原始帧图像中的相邻两帧图像作为一个训练样本输入所述神经网络,以所述训练样本预先标注的人群计数图作为监督信号,对所述神经网络进行迭代训练直至满足第二预设收敛条件。
  26. 根据权利要求25所述的装置,其特征在于,所述样本视频的场景被预先标注几何透视图,所述几何透视图包括所述样本视频中不同位置的像素数与所述场景的真实物理尺寸之间的对应关系;所述多个原始帧图像中被预先标定各行人的行人信息,所述行人信息包括行人位置与行人ID,行人ID唯一标识一个行人;
    所述卷积神经网络,用于以当前训练样本中的后帧图像作为当前帧图像,根据各帧图像标定的行人信息与所述几何透视图,生成所述当前帧图像的人群密度图,以及根据当前训练样本的两帧图像中的行人与所述几何透视图,生成所述当前帧图像的人群速度图。
  27. 根据权利要求26所述的装置,其特征在于,所述卷积神经网络生成当前帧图像的人群密度图时,用于根据当前帧图像中的行人信息与所述几何透视图,分别获取所述当前帧图像中至少一个位置的人群密度值;以及根据所述当前帧图像中至少一个位置的人群密度值生成所述当前帧图像的人群密度图。
  28. 根据权利要求25至27任意一项所述的装置,其特征在于,所述卷积神经网络生成所述当前帧图像的人群速度图时,用于:
    根据当前训练样本中当前帧图像中各行人在前帧图像和所述当前帧图像中的位置差、以及所述前帧图像和所述当前帧图像的对应的时刻差,获取所述当前帧图像中各行人的移动速度;
    根据所述当前帧图像中各行人的移动速度和行人位置,获取所述当前帧图像中至少一个位置的人群速度;
    根据所述当前帧图像中至少一个位置的人群速度与所述几何透视图,生成所述当前帧图像的人群速度图。
  29. 一种电子设备,其特征在于,包括:权利要求17至23任意一项所述的跨线计数装置;或者权利要求24至28任意一项所述的神经网络训练装置。
  30. 一种电子设备,其特征在于,包括:
    处理器和权利要求17至23任意一项所述的跨线计数装置;
    在处理器运行权利要求17至23任意一项所述的跨线计数装置时,权利要求17至23任意一项所述的跨线计数装置中的单元被运行。
  31. 一种电子设备,其特征在于,包括:
    处理器和权利要求24至28任意一项所述的神经网络训练装置;
    在处理器运行权利要求24至28任意一项所述的神经网络训练装置时,权利要求24至28任意一项所述的神经网络训练装置中的单元被运行。
  32. 一种电子设备,其特征在于,包括:处理器、存储器、通信接口和通信总线,所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信;
    所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行权利要求1-16任一所述方法对应的操作。
  33. 一种计算机程序,包括计算机可读代码,其特征在于,当所述计算机可读代码在 设备上运行时,所述设备中的处理器执行用于实现权利要求1-16任一所述方法中各步骤的指令。
  34. 一种计算机可读存储介质,用于存储计算机可读取的指令,其特征在于,所述指令被执行时实现权利要求1-16任一所述方法中各步骤的操作。
PCT/CN2017/103530 2016-09-29 2017-09-26 跨线计数方法和神经网络训练方法、装置和电子设备 WO2018059408A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610867834.1 2016-09-29
CN201610867834.1A CN106407946B (zh) 2016-09-29 2016-09-29 跨线计数方法和深度神经网络训练方法、装置和电子设备

Publications (1)

Publication Number Publication Date
WO2018059408A1 true WO2018059408A1 (zh) 2018-04-05

Family

ID=59228726

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/103530 WO2018059408A1 (zh) 2016-09-29 2017-09-26 跨线计数方法和神经网络训练方法、装置和电子设备

Country Status (2)

Country Link
CN (1) CN106407946B (zh)
WO (1) WO2018059408A1 (zh)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109615140A (zh) * 2018-12-14 2019-04-12 中国科学技术大学 一种预测行人运动的方法及装置
CN109726658A (zh) * 2018-12-21 2019-05-07 上海科技大学 人群计数及定位方法、系统、电子终端及存储介质
CN109948500A (zh) * 2019-03-13 2019-06-28 西安科技大学 一种用于煤矿的精确监测人员进出的方法
CN110263643A (zh) * 2019-05-20 2019-09-20 上海兑观信息科技技术有限公司 一种基于时序关系的快速视频人群计数方法
CN110674729A (zh) * 2019-09-20 2020-01-10 澳门理工学院 基于热能估算的人数识别方法、计算机装置及计算机可读存储介质
CN110866453A (zh) * 2019-10-22 2020-03-06 同济大学 基于卷积神经网络的实时人群稳定状态识别方法及装置
CN110909648A (zh) * 2019-11-15 2020-03-24 华东师范大学 使用神经网络在边缘计算设备上实现的人流量监控方法
CN110941999A (zh) * 2019-11-12 2020-03-31 通号通信信息集团有限公司 一种人群计数系统中自适应计算高斯核大小的方法
CN111062275A (zh) * 2019-12-02 2020-04-24 汇纳科技股份有限公司 一种多层次监督的人群计数方法、装置、介质及电子设备
CN111178276A (zh) * 2019-12-30 2020-05-19 上海商汤智能科技有限公司 图像处理方法、图像处理设备及计算机可读存储介质
CN112333431A (zh) * 2020-10-30 2021-02-05 深圳市商汤科技有限公司 场景监测方法、装置、电子设备及存储介质
CN113297983A (zh) * 2021-05-27 2021-08-24 上海商汤智能科技有限公司 人群定位方法及装置、电子设备和存储介质
CN113378608A (zh) * 2020-03-10 2021-09-10 顺丰科技有限公司 人群计数方法、装置、设备及存储介质
CN113807274A (zh) * 2021-09-23 2021-12-17 山东建筑大学 一种基于图像反透视变换的人群计数方法及系统

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106407946B (zh) * 2016-09-29 2020-03-03 北京市商汤科技开发有限公司 跨线计数方法和深度神经网络训练方法、装置和电子设备
CN108052984B (zh) * 2017-10-30 2019-11-08 中国科学院计算技术研究所 计数方法及装置
CN109472291A (zh) * 2018-10-11 2019-03-15 浙江工业大学 一种基于dnn算法的人数统计分类方法
JP6703679B1 (ja) * 2019-02-01 2020-06-03 株式会社計数技研 計数装置、学習器製造装置、計数方法、学習器製造方法、及びプログラム
CN110135325B (zh) * 2019-05-10 2020-12-08 山东大学 基于尺度自适应网络的人群人数计数方法及系统
CN110458114B (zh) * 2019-08-13 2022-02-01 杜波 一种人数确定方法、装置及存储介质
JP7383435B2 (ja) * 2019-09-17 2023-11-20 キヤノン株式会社 画像処理装置、画像処理方法、及びプログラム
CN110991225A (zh) * 2019-10-22 2020-04-10 同济大学 基于多列卷积神经网络的人群计数与密度估计方法及装置
CN111428551B (zh) * 2019-12-30 2023-06-16 杭州海康威视数字技术股份有限公司 密度检测方法、密度检测模型训练方法和装置
CN112232257B (zh) * 2020-10-26 2023-08-11 青岛海信网络科技股份有限公司 一种交通异常确定方法、装置、设备及介质
CN112364788B (zh) * 2020-11-13 2021-08-03 润联软件系统(深圳)有限公司 基于深度学习的监控视频人群数量监测方法及其相关组件
CN113239882B (zh) * 2021-06-03 2022-06-03 成都鼎安华智慧物联网股份有限公司 一种基于深度学习的人员计数方法及其系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050147291A1 (en) * 1999-09-13 2005-07-07 Microsoft Corporation Pose-invariant face recognition system and process
CN102542289A (zh) * 2011-12-16 2012-07-04 重庆邮电大学 一种基于多高斯计数模型的人流量统计方法
CN105160313A (zh) * 2014-09-15 2015-12-16 中国科学院重庆绿色智能技术研究院 视频监控中人群行为分析的方法及装置
CN105590094A (zh) * 2015-12-11 2016-05-18 小米科技有限责任公司 确定人体数量的方法及装置
CN105740894A (zh) * 2016-01-28 2016-07-06 北京航空航天大学 一种高光谱遥感图像的语义标注方法
CN106407946A (zh) * 2016-09-29 2017-02-15 北京市商汤科技开发有限公司 跨线计数方法和深度神经网络训练方法、装置和电子设备

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102148959B (zh) * 2010-02-09 2016-01-20 北京中星微电子有限公司 一种视频监控系统及其图像的运动目标检测方法
CN103778442B (zh) * 2014-02-26 2017-04-05 哈尔滨工业大学深圳研究生院 一种基于视频人数统计分析的中央空调控制方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050147291A1 (en) * 1999-09-13 2005-07-07 Microsoft Corporation Pose-invariant face recognition system and process
CN102542289A (zh) * 2011-12-16 2012-07-04 重庆邮电大学 一种基于多高斯计数模型的人流量统计方法
CN105160313A (zh) * 2014-09-15 2015-12-16 中国科学院重庆绿色智能技术研究院 视频监控中人群行为分析的方法及装置
CN105590094A (zh) * 2015-12-11 2016-05-18 小米科技有限责任公司 确定人体数量的方法及装置
CN105740894A (zh) * 2016-01-28 2016-07-06 北京航空航天大学 一种高光谱遥感图像的语义标注方法
CN106407946A (zh) * 2016-09-29 2017-02-15 北京市商汤科技开发有限公司 跨线计数方法和深度神经网络训练方法、装置和电子设备

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109615140A (zh) * 2018-12-14 2019-04-12 中国科学技术大学 一种预测行人运动的方法及装置
CN109615140B (zh) * 2018-12-14 2024-01-09 中国科学技术大学 一种预测行人运动的方法及装置
CN109726658A (zh) * 2018-12-21 2019-05-07 上海科技大学 人群计数及定位方法、系统、电子终端及存储介质
CN109726658B (zh) * 2018-12-21 2022-10-04 上海科技大学 人群计数及定位方法、系统、电子终端及存储介质
CN109948500A (zh) * 2019-03-13 2019-06-28 西安科技大学 一种用于煤矿的精确监测人员进出的方法
CN109948500B (zh) * 2019-03-13 2022-12-27 西安科技大学 一种用于煤矿的精确监测人员进出的方法
CN110263643A (zh) * 2019-05-20 2019-09-20 上海兑观信息科技技术有限公司 一种基于时序关系的快速视频人群计数方法
CN110263643B (zh) * 2019-05-20 2023-05-16 上海兑观信息科技技术有限公司 一种基于时序关系的快速视频人群计数方法
CN110674729A (zh) * 2019-09-20 2020-01-10 澳门理工学院 基于热能估算的人数识别方法、计算机装置及计算机可读存储介质
CN110866453A (zh) * 2019-10-22 2020-03-06 同济大学 基于卷积神经网络的实时人群稳定状态识别方法及装置
CN110866453B (zh) * 2019-10-22 2023-05-02 同济大学 基于卷积神经网络的实时人群稳定状态识别方法及装置
CN110941999A (zh) * 2019-11-12 2020-03-31 通号通信信息集团有限公司 一种人群计数系统中自适应计算高斯核大小的方法
CN110941999B (zh) * 2019-11-12 2023-02-17 通号通信信息集团有限公司 一种人群计数系统中自适应计算高斯核大小的方法
CN110909648A (zh) * 2019-11-15 2020-03-24 华东师范大学 使用神经网络在边缘计算设备上实现的人流量监控方法
CN110909648B (zh) * 2019-11-15 2023-08-25 华东师范大学 使用神经网络在边缘计算设备上实现的人流量监控方法
CN111062275A (zh) * 2019-12-02 2020-04-24 汇纳科技股份有限公司 一种多层次监督的人群计数方法、装置、介质及电子设备
CN111178276A (zh) * 2019-12-30 2020-05-19 上海商汤智能科技有限公司 图像处理方法、图像处理设备及计算机可读存储介质
CN111178276B (zh) * 2019-12-30 2024-04-02 上海商汤智能科技有限公司 图像处理方法、图像处理设备及计算机可读存储介质
CN113378608A (zh) * 2020-03-10 2021-09-10 顺丰科技有限公司 人群计数方法、装置、设备及存储介质
CN113378608B (zh) * 2020-03-10 2024-04-19 顺丰科技有限公司 人群计数方法、装置、设备及存储介质
CN112333431B (zh) * 2020-10-30 2022-06-07 深圳市商汤科技有限公司 场景监测方法、装置、电子设备及存储介质
CN112333431A (zh) * 2020-10-30 2021-02-05 深圳市商汤科技有限公司 场景监测方法、装置、电子设备及存储介质
CN113297983A (zh) * 2021-05-27 2021-08-24 上海商汤智能科技有限公司 人群定位方法及装置、电子设备和存储介质
CN113807274A (zh) * 2021-09-23 2021-12-17 山东建筑大学 一种基于图像反透视变换的人群计数方法及系统
CN113807274B (zh) * 2021-09-23 2023-07-04 山东建筑大学 一种基于图像反透视变换的人群计数方法及系统

Also Published As

Publication number Publication date
CN106407946B (zh) 2020-03-03
CN106407946A (zh) 2017-02-15

Similar Documents

Publication Publication Date Title
WO2018059408A1 (zh) 跨线计数方法和神经网络训练方法、装置和电子设备
US10880524B2 (en) System and method for activity monitoring using video data
Xu et al. Predicting head movement in panoramic video: A deep reinforcement learning approach
US10380431B2 (en) Systems and methods for processing video streams
Zhao et al. Crossing-line crowd counting with two-phase deep neural networks
US8582816B2 (en) Method and apparatus for video analytics based object counting
EP2918071B1 (en) System and method for processing visual information for event detection
US9639747B2 (en) Online learning method for people detection and counting for retail stores
US20170161591A1 (en) System and method for deep-learning based object tracking
CN107025658A (zh) 采用单个相机检测运动物体的方法和系统
US10043078B2 (en) Virtual turnstile system and method
KR102335045B1 (ko) 깊이 카메라 기반 사람 객체를 판별하는 방법 및 장치
US20150286853A1 (en) Eye gaze driven spatio-temporal action localization
US8615107B2 (en) Method and apparatus for multiple object tracking with K-shortest paths
US9972094B2 (en) Foreground extracting method and apparatus
AU2018379393A1 (en) Monitoring systems, and computer implemented methods for processing data in monitoring systems, programmed to enable identification and tracking of human targets in crowded environments
CN111553947A (zh) 目标对象的定位方法、装置
Himeur et al. Deep visual social distancing monitoring to combat COVID-19: A comprehensive survey
Acharya et al. Real-time detection and tracking of pedestrians in CCTV images using a deep convolutional neural network
Hayakawa et al. Block-matching-based motion field generation utilizing directional edge displacement
KR20170006356A (ko) 이차원 영상 기반 고객 분석 방법 및 장치
Ince et al. Child and adult classification using ratio of head and body heights in images
US20230410524A1 (en) Information processing apparatus, control method, and program
US10636153B2 (en) Image processing system, image processing apparatus, and image processing method for object tracking
US20210312182A1 (en) Systems and methods for processing video data for activity monitoring

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17854858

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17854858

Country of ref document: EP

Kind code of ref document: A1