KR101675692B1

KR101675692B1 - Method and apparatus for crowd behavior recognition based on structure learning

Info

Publication number: KR101675692B1
Application number: KR1020150091594A
Authority: KR
Inventors: 김문현; 김진평
Original assignee: 성균관대학교산학협력단
Priority date: 2015-06-26
Filing date: 2015-06-26
Publication date: 2016-11-14

Abstract

A computer-based structure learning-based crowd behavior recognition method includes: (i) extracting an image feature vector composed of a plurality of elements for each unit image from an image signal including a unit image obtained for each image section; , (ii) inputting the image feature vector extracted during the video section to a classifier set to judge a crowd behavior class based on a plurality of elements of the image feature vector, and performing a crowd action (Iii) extracting the image feature vectors extracted when the crowd behavior class has not been determined in step (ii) or before any crowd behavior class has been constructed, or the image feature vector extracted from the training data set (Iv) a step of predicting the dependence of the elements constituting the image feature vectors extracted in one video section (V) extracting at least one path patterns from the context networks for each group of behavior classes, and (vi) inputting image feature vectors based on the extracted path patterns And setting a classifier such that a corresponding crowd behavior class is determined.

Description

[0001] METHOD AND APPARATUS FOR CROWD BEHAVIOR RECOGNITION BASED ON STRUCTURE LEARNING [0002]

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to video surveillance, and more particularly, to monitoring pedestrian behavior in an image.

In the case of a large-scale CCTV system, the monitoring personnel need to monitor dozens to hundreds of images, so it takes a considerable number of people and often misses important situations due to concentration of the monitoring personnel, fatigue, carelessness, and arbitrary judgment In addition, there are many difficulties in storing images of a huge amount of time.

Therefore, in order to monitor a public place, a public place is photographed with a CCTV camera, and an image is automatically analyzed to extract an unspecified number of objects and analyze the motion. When an abnormal motion is detected, There is a growing demand for intelligent video surveillance systems that transmit information to automation systems.

Conventional intelligent video surveillance systems divide foreground regions identified from an image based on a pedestrian pattern model previously provided to extract and count individual pedestrians.

Since this method relies on the pedestrian model obtained when a specific place is viewed from a specific viewpoint, the direction, angle of view, and focal distance of the camera are very limited. It is also very likely that the identification of individual pedestrians that are obscured will fail if "occlusion" occurs where the pedestrians are covering other pedestrians in a common congestion situation.

Although these conventional systems require a great deal of computation resources through detailed algorithms such as extraction of the foreground region, model analysis, similarity determination, and tracking, the accuracy is not sufficient.

For example, it is not easy to accurately detect abnormal movements in the crowd using conventional intelligent video surveillance systems when monitoring crowded crowds such as squares, parks, train stations, and playgrounds.

Tae-Ki, A. N., and Moon-Hyun Kim. "Context-aware Video Surveillance System." Journal of Electrical Engineering & Technology (2012): 115-123.

A problem to be solved by the present invention is to provide a method and apparatus for recognizing a crowd behavior based on structure learning that can recognize the collective behavior of the crowd without being affected by the modeling of the individual walkers.

A problem to be solved by the present invention is to provide a method and apparatus for recognizing crowd behavior based on structure learning that can improve accuracy and usability through automated crowd behavior learning.

The solution to the problem of the present invention is not limited to those mentioned above, and other solutions not mentioned can be clearly understood by those skilled in the art from the following description.

According to an aspect of the present invention, there is provided a method for recognizing crowd behaviors based on structural learning using a computer, the method comprising: (i) extracting, from a video signal including a unit image acquired for each imaging window, Extracting an image feature vector composed of the image feature vectors; (ii) inputting the image feature vector extracted during the video section into a classifier set to judge a crowd behavior class based on a plurality of elements of the image feature vector, Determining a crowd behavior class for an image feature vector of the image feature vector; (iii) if the crowd behavior class has not been determined in the step (ii) or if no crowd behavior class has yet been established, the extracted image feature vectors or the image feature vector extracted from the training data set Preprocessing them; (iv) creating context networks in which a graphical representation of a dependency between elements constituting the image feature vectors extracted in one video segment; (v) extracting at least one path patterns from the situation networks for each of the crowd behavior classes; And (vi) setting the classifier so that a corresponding crowd behavior class is determined based on the extracted path patterns when the image feature vectors are input.

According to an exemplary embodiment, the image feature vector may be a Histogram of Oriented Optical Flows (HOOF), and the values of the image feature vectors may be values of optical flow histogram bins of azimuth angle intervals.

According to one embodiment, the context network may be a Bayesian network expressed as a directed acyclic graph.

According to one embodiment, the bezier network may be obtained via a K2 algorithm.

According to one embodiment, the path pattern may be each of all possible paths leading from the root node to the end node of the situation network.

According to one embodiment, the classifier is a neural network-based classifier, and the step (vi) includes, for each class of action class, a path feature having a relatively high appearance probability among the path patterns, Selecting path features; Constructing the neural network such that image feature vector elements constituting the selected path features are each assigned to input nodes of a neural network and crowd behavior classes are respectively assigned to output nodes of the neural network; And a weighting function for weighting the weight of the neural network such that when the elements of the image feature vectors extracted from the video signal corresponding to each of the crowd behavior classes included in the training data set are input to the input nodes of the neural network, and determining weight values.

According to one embodiment, the classifier is a neural network-based classifier, and between step (ii) and step (iii), if the crowd behavior class has not been determined in step (i) In the former case, it may further include adding a new crowd behavior class.

According to another aspect of the present invention, there is provided a device for recognizing crowd behavior based on structure learning, comprising: a training data set storage unit for storing a training data set including a video signal and a crowd behavior class related to a crowd behavior observed in the video region, ; An image feature vector extracting unit for extracting an image feature vector composed of a plurality of elements for each unit image from the received image signal or the image signal in the stored training data set; A structure learning unit for generating context networks in which graphical representations of dependency relationships between the extracted elements of the image feature vector are extracted and path patterns are extracted from context networks for each group of behavior classes; And when the elements of the image feature vector are input based on the path patterns extracted from the image feature vectors that require a new crowd behavior class or the image feature vectors extracted from the training data set, And a classifier configured to determine one corresponding crowd behavior class.

According to one embodiment, the image feature vector is a HOOF vector, and the element values of the image feature vector may be values of optical flow histogram bins for each azimuth interval.

According to one embodiment, the situation network may be a bezier network represented by a directed acyclic graph.

According to one embodiment, the classifier is a neural network-based classifier, and for each class of action, a path feature selection unit that selects path features as path patterns having a relatively high appearance probability; And a neural network setting unit for setting weight values of the neural network so that corresponding context classes are determined when the elements of the image feature vector are input to input nodes of the neural network; And a crowd behavior class table storing crowd behavior classes or newly added crowd behavior classes included in the training data set and storing context networks and path patterns extracted for each crowd behavior class, The image feature vector elements constituting the selected path features may be assigned to the input nodes of the neural network, respectively, and the crowd behavior classes may be assigned to the output nodes, respectively.

According to one embodiment, the classifier inputs the elements of the image feature vector to the input nodes of the set neural network, determines the crowd behavior class based on the output values of the output nodes of the neural network, May further include a neural network operation unit for additionally registering a new crowd behavior class in the crowd behavior class table if all output values of the neural network activity class are lower than a predetermined threshold value.

According to the method and apparatus for recognizing crowd behavior based on the structure learning of the present invention, collective behavior of the crowd can be recognized without being influenced by the occlusion since it does not depend on modeling of individual pedestrians.

According to the structure learning based crowd behavior recognition method and apparatus of the present invention, accuracy and usability can be improved through automated crowd behavior learning.

The effects of the present invention are not limited to those mentioned above, and other effects not mentioned can be clearly understood by those skilled in the art from the following description.

FIG. 1 is a flowchart illustrating a structure learning-based crowd behavior recognition method according to an embodiment of the present invention.
FIG. 2 is a schematic diagram illustrating context networks extracted from a context extraction step of a structure learning-based crowd behavior recognition method according to an exemplary embodiment of the present invention.
FIG. 3 is a schematic diagram illustrating a procedure for extracting path patterns for a specific context network in a path pattern extraction step of a structure learning based crowd behavior recognition method according to an exemplary embodiment of the present invention.
FIG. 4 is a flowchart illustrating a classifier setting step of a mural behavior recognition method based on structure learning according to an exemplary embodiment of the present invention.
FIG. 5 is a schematic diagram illustrating how a particular path feature appears differently in different crowd behavior classes in the context classification step of the structure learning-based crowd behavior recognition method according to an exemplary embodiment of the present invention.
FIG. 6 is a schematic diagram illustrating a neural network constructed in the classifying step of the structure learning based crowd behavior recognition method according to an embodiment of the present invention.
7 is a block diagram illustrating a structure learning based crowd behavior recognition apparatus according to an embodiment of the present invention.

For the embodiments of the invention disclosed herein, specific structural and functional descriptions are set forth for the purpose of describing an embodiment of the invention only, and it is to be understood that the embodiments of the invention may be practiced in various forms, The present invention should not be construed as limited to the embodiments described in Figs.

Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. The same reference numerals are used for the same constituent elements in the drawings and redundant explanations for the same constituent elements are omitted.

Throughout this specification, unless otherwise defined, a "frame" is a unit of an image to be displayed on a display device, a "picture" is a unit of an image viewed from the perspective of encoding or decoding, A "unit image" refers to a frame or a picture according to a context, and a "block" is a set of pixels for performing a predetermined operation.

FIG. 1 is a flowchart illustrating a structure learning-based crowd behavior recognition method according to an embodiment of the present invention.

The structure learning based crowd behavior recognition method according to an embodiment of the present invention can realize unsupervised crowd behavior recognition without prior learning.

Referring to FIG. 1, in the structure learning based mobility behavior recognition method using a computer, in step S11, a computer can extract an image feature vector composed of a plurality of elements for each unit image from a video signal.

The video signal is composed of unit images obtained for each imaging window.

According to the embodiment, in step S11, the image feature vector extracted by the computer per unit image of the image signal is a HOOF (Histogram of Oriented Optical Flow) vector, and the image feature vector is an optical flow flow) histogram bin value.

The following references can be referred to for HOOF vectors.

Chaudhry, Rizwan, et al. "Histograms of oriented optical flow and bin-cauchy kernels on nonlinear dynamical systems for the recognition of human actions." Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009.

The HOOF vector may be composed of a plurality of optical flow histogram bin values. The histogram bin represents a partial histogram of cases corresponding to a certain range, and the value of the histogram bin is the number of cases belonging to a specific range. The optical flow histogram bin value in the HOOF vector is the number of optical flows counted for each azimuth range. In practice, the number of histogram bins is two or more.

In order to obtain the HOOF vector, block optical flows must first be calculated.

For this purpose, the unit image may be divided into m non-overlapping blocks of nxn pixels, e.g., m blocks of 8x8 pixels.

The block optical flow representing each block may be computed, for example, as an average of n x n pixel optical flows in the block. Pixel optical flows can be extracted using well known algorithms such as the Lucas and Kanade algorithm, the Horn-Schunck algorithm, or the Combined Local-Global algorithm.

At this time, the block optical flow can be expressed by the magnitude and the azimuth angle according to the polar coordinate system.

The block optical flows represented by the polar coordinate system can be grouped at predetermined angular intervals according to the azimuth angle. The number of block optical flows belonging to each azimuth range is the value of each optical flow histogram bin. If the azimuth is equally divided, for example, at 40 ° intervals, nine optical flow histogram bins can be prepared. In this case, the HOOF vector as the image feature vector is a vector composed of nine optical flow histogram bin values.

According to the embodiment, the azimuth angle may not be evenly divided. Considering the optical distortion due to the angle between the camera and the ground, the azimuth interval in the direction toward the camera is relatively large, and the azimuth interval in the direction away from the camera is relatively large Small, or small.

Accordingly, as the image feature vector extracted for each unit image, the HOOF vector

Can be expressed as here,

Is the HOOF vector of the i-th unit image,

Is the value of the kth optical flow histogram bin (natural number 1k K), and K is the number of optical flow histogram bins.

According to the embodiment, the unit image may be one frame or one picture, or may be an image obtained by temporally averaging a plurality of frames, or a partial image obtained by spatially dividing a frame or a picture.

Accordingly, a plurality of image feature vectors can be obtained from a plurality of unit images during one image section. If necessary, two or more image sections may be merged, and image feature vectors may be obtained in the merged image section.

For example, from 15 unit images constituting one video segment, 15 elements of image feature vectors can be obtained as shown in Table 1 below. In Table 1, the element of the image feature vector is the optical flow histogram bin value.

A plurality of image feature vectors corresponding to one video segment as shown in Table 1 may constitute an image feature sequence for the corresponding video segment.

In other words, the image feature sequence is composed of image feature vectors of T consecutive unit images including the i-th unit image,

Can be expressed as here

Is the image feature sequence of the i-th unit image,

Is the image feature vector of the i-th + 1-th unit image before the i-th unit image, in particular the HOOF vector.

The image feature sequence tells how the direction of optical flows changes roughly over time during a given video segment.

For each video segment, the video feature sequence may have information about the behavior of the crowd. The most expressive expression of the behavior of the crowd in each video segment, distinguishing it from the other phenomena, is the crowd behavior class.

Crowd-action classes include, for example, a walk in which a crowd moves at a regular speed in a certain direction along a walkway, a run in which a crowd moves along a walkway but a run at a high speed, Such as crowded gathering behavior in the region and evacuation in which the crowd is scattered in various directions. Crowd behavioral classes can be appropriately identified according to the place where the behavior of the crowd is observed and the nature of the crowd.

In step S12, the computer inputs the image feature vector extracted during the video section into a classifier set to judge a crowd behavior class based on a plurality of elements of the image feature vector And determines a crowd behavior class related to the video feature vector of the video section.

The classifier receives all or a part of the elements of the image feature vector having the extracted plural elements and determines one of the one or more crowd behavior classes, the details of which will be described later.

At this time, if the image feature vector extracted from a certain image section corresponds to one of the group behavior classes constructed before the image section, the classifier can determine the crowd behavior class related to the extracted image feature vector.

On the other hand, if the image feature vector extracted from a video segment does not match the crowd behavior classes constructed before the video segment, if the classifier can not determine which of the existing crowd behavior classes, The existence of these image feature vectors means that a new crowd behavior class is needed.

Also, for example, as in the initial state, even in the special case where no crowd action class has yet been established, the first crowd action class should be generated for the image feature vector extracted in the first video section.

Accordingly, in step S13, in step S12, when the classifier determines any one of the one or more crowd behavior classes among the previously established crowd behavior classes, the computer performs a crowd behavior recognition process Can be terminated.

Also in step S13, the computer determines whether the classifier has failed to determine any one of the one or more crowd behavior classes from among the one or more previously established crowd behavior classes, or has yet to build any crowd behavior classes, If a new crowd behavior class is needed, a new crowd behavior class is added and step S14 is performed.

Here, when the classifier fails to judge the crowd behavior class, for example, based on the scores evaluated by the classifier for each of the crowd behavior classes, the score of the highest ranked crowd behavior class does not exceed the predetermined threshold value, This means that it is difficult to determine the mass action class.

In step S14, the computer preprocesses the image feature vector extracted from the input image signal or the image feature vector extracted from the image signal included in the training data set.

In other words, in step S12, the computer inputs the elements of the extracted image feature vector into the classifier without performing the preprocessing, but if the image feature vector extracted in a certain section of the image is input to the classifier in step S12, The computer may preprocess to extract a new path pattern that matches the new crowd behavior class from such image feature vectors in step S14.

In this context, in step S14, the computer preprocesses the elements of the image feature vector of the initial image section for the image feature vector extracted in the first image section where no crowd action class has yet been generated.

According to the embodiment, in step S14, the computer can always preprocess the elements of the extracted image feature vector for extraction of the path pattern.

On the other hand, the crowd behavior classes may be incrementally added according to their own learning and needs, but may be specified in advance by the designer depending on the application field and the embodiment.

In this embodiment, the training data set is a data set that pairs image feature vectors of the video signal obtained in past video intervals with one of the pre-established crowd behavior classes. That is, the training data set may be composed of an image feature vector extracted during each of the past video segments and a crowd behavior class specified in advance for the crowd behavior observed in each video segment.

For training, training data sets corresponding to at least several tens to several hundred video intervals for each crowd behavior class may be prepared.

The elements of the image feature vector may be additionally quantized. At this time, the quantization level can be determined such that an appropriate number of patterns are generated so that an appropriate number of nodes are generated in the subsequent state graph generation step, and also in the pattern extraction step. Here, the appropriate number means that it can be determined according to designer's intention and design specification in consideration of calculation amount, processing speed, cost, and accuracy.

The quantization may be processed according to a given quantization level, for example, with respect to an optical flow histogram bin, and specifically, an operation that rounds optical flow histogram bin values to a unit of ten.

For example, the elements of the image feature vector obtained in one image interval as shown in Table 1, that is, the optical flow histogram bin values, can be quantized as shown in Table 2 below.

In step S15, the context networks in which the computer displays the graphical representation of the dependency between the elements constituting the image feature vectors in one video segment are compared with each other in each video segment, .

Here, the graph refers to a structure composed of vertices (or nodes) and sides (or lines, edges, and arcs), and particularly refers to objects mathematically interpreted by graph theory.

According to an embodiment, the graph is a Bayesian network.

The Bezier network is also called a belief network or a directed acyclic graphical model. It is defined as a graphical model of the probability of expressing a set of random variables conditionally independent using a set of random variables and a directed acyclic graph . For example, a probabilistic causal relationship or dependency between a disease and a symptom can be given as the direction and weight of the edge between the node corresponding to the disease and the node corresponding to the symptom. Thus, the bezier network can be configured with nodes and edges with direction and weighting between the nodes.

The methodology for creating a beige network from a dataset consisting of a given set of variables and the values of each variable is largely a constraint-based approach and a score-based approach, It is more appropriate to create a beige network by the score-based method in the present invention.

According to an embodiment, a bezier network may be generated by the K2 algorithm. The K2 algorithm is an algorithm that determines a graph that calculates the highest score based on scores calculated using a given score function from a given data set as the bezier network of the corresponding data set. For the K2 algorithm, see Gregory F. Cooper and Edward Herskovits, "A Bayesian Method for the Induction of Probabilistic Networks from Data," MACHINE LEARNING, Volume 9, Number 4, 309-347, 1992.

In step S15, the bezier network generated using the K2 algorithm or the like uses the values of the elements in the image feature vectors obtained in the corresponding image section to graph the causal relationship or dependency between the elements of the image feature vector It is a situation network expressed as.

Referring to FIG. 2, FIG. 2 is a schematic diagram illustrating context networks extracted from a context extraction step in a structure learning-based crowd behavior recognition method according to an exemplary embodiment of the present invention.

In Fig. 2, context networks are shown that can be generated for each of the four illustrated crowd behavior classes.

Nine nodes of the situation networks illustrated in FIG. 2 correspond to nine optical flow histogram bins constituting the image feature vector, respectively. The training data set includes video signals and crowd behavior classes accumulated over a period of time corresponding to a plurality of video sections, so that a large number of situation networks can be generated for each crowd behavior class. The number of context networks may be different for each crowd behavior class.

In general, a situation network can be expressed as G = ( V , E ) for mathematical manipulation. Where V is a set of nodes n and E is a set of edges e. The edge _e between the starting node n _s and the ending node n _e can be expressed as <n _s , n _e >. Edge e means that the end node n _e is a dependent relation to the start node n _s . Each node constituting the context network corresponds to one of a plurality of elements of the image feature vector.

Let C be the total set of crowd behavior classes, c each action class is c (ie, c ∈ C ), the number of context networks classified as a crowd behavior class c is α ^c , Let G _i ^c be the i-th situation network (i is an integer with 1 ≤ i ≤ α ^c ) ^c , and let G ^C (ie, G _i ^c ∈ G ^C ) The situation network set of the class can be expressed as follows. Illustratively, the crowd behavior class c is one of the walking behavior class Wa, the running behavior class Ru, the gathering behavior class Me, and the avoidance behavior class Ev.

Situation network set of gait behavior class

Situation Network of Running Behavior Class

Gathering Behavior Class of Situation Network Set

Situation network class of avoidance behavior class

If an image feature vector that does not correspond to any of the existing crowd behavior classes is input in step S12, the context network derived from such image feature vectors belongs to the context network set of the newly established crowd behavior class.

The newly created crowd action class and situation network set can be treated the same as existing crowd behavior classes and situation network set.

Referring back to FIG. 1, in step S16, the computer extracts at least one path patterns from the situation networks for each crowd behavior class. The extracted path patterns can be classified by the crowd behavior class.

Situation networks generated from image feature vectors obtained in any identical situation, or from image feature vectors having the same crowd behavior class in the training data set, can be obtained by analyzing the topological characteristics of each graph Each of which will have certain structural features.

When these structural features are referred to as a path pattern, reference can be made to FIG. 3 to describe the path pattern.

FIG. 3 is a schematic diagram illustrating a procedure for extracting path patterns for a specific context network in a path pattern extraction step of a structure learning based crowd behavior recognition method according to an exemplary embodiment of the present invention.

In Figure 3, a plurality of path patterns are extracted from a context network consisting of _nine nodes (O ₁ through O ₉ ).

In one embodiment, the path pattern refers to each of all paths that can be derived as a path starting from a root node of the situation network and ending at a leaf node that is no longer able to move .

Referring to FIG. 3, as one of all possible routes derived from the situation network, which is one of the situation networks of the gait behavior class illustrated in FIG. 2, O ₁ → O ₂ → O ₄ → O ₈ (or "1- Quot; 2-4-8 "). All other possible routes may also include, for example, O ₁ → O ₂ → O ₅ → O ₇ (or "1-2-5-7"), O ₁ → O ₃ → O ₆ → O ₉ Quot; 1-3-6-9 ") is also extracted as a route pattern belonging to the gait behavior class situation network.

If the number of path patterns that can be extracted from the i-th situation network G _i ^c belonging to a crowd action class c is K _i, then the j-th path pattern extracted from the i-th situation network G _i ^c is P _ij ^c &Lt; _{i &lt}; / = K _i ). P ₂₁ ^Wa in FIG. 3 means the first route pattern of the second situation network belonging to the walking behavior class.

For example, the situation network set of the gait behavior class Wa

Each of the situation networks G _i ^Wa belonging to the network can have path patterns P _ij as shown in Table 3 below.

More specifically, path patterns of each of the situation networks of the gait behavior class can be derived as shown in Table 4 below.

Although only fourteen context networks are illustrated in Table 4, the number of context networks may vary depending on the number of entire video segments of the training data set.

On the other hand, in Table 4, for example, the 1-2-4-5-6 path pattern frequently appears in networks 1, 3, 6, 8, 10 and 13, (The elements of the image feature vector) constituting the path pattern, for example, the optical flow histogram bin values, are quite similar to each other, It is expected. Such element values are illustrated in Table 5 below.

Accordingly, the structure learning based crowd behavior recognition method according to the embodiment of the present invention uses the topology of the path pattern as well as the values of the nodes constituting the path pattern, that is, the values of the image feature vector elements, It can be determined more accurately.

Situation network set of running behavior class Ru

, Gathering behavior class Me situation network set

And avoidance behavior class Ev situation network set

Likewise, situation networks and path patterns can be extracted, respectively.

Referring back to FIG. 1, when the computer inputs the image feature vectors extracted from the video signal of one video section on the basis of the path patterns extracted for each crowd behavior class, in step S17, A classifier is set up to determine the crowd behavior class.

According to an embodiment, the classifier is a neural network, specifically a layered feed-forward neural network, and more specifically a two-layer neural network.

Referring to FIG. 4, the detailed procedure of step S17 will be described with reference to FIG. 4. FIG. 4 is a flowchart specifically illustrating a classifier setting step of the structure learning based crowd behavior recognition method according to an exemplary embodiment of the present invention.

In step S171 of FIG. 4, based on the frequencies of the extracted path patterns for each group of behavior classes, the computer calculates path features having a relatively high appearance probability among the path patterns of each group behavior class Select.

According to recognition and classification studies using neural networks, it is known that low frequency datasets tend to degrade accuracy. Thus, the path features may be selected as some path patterns that are higher in frequency basis, or may be selected as remaining path patterns except for some lower path patterns.

For example, path features can be selected from the path patterns shown in Table 4 as shown in Table 6 below.

Referring to Table 6, path patterns and frequencies extracted from the entire training data set are illustrated in order of frequency according to the crowd behavior class. Only the top four path patterns with high frequency are selected as path features for neural network learning for each crowd behavior class. Thus, by way of example, in Table 6, for the four crowd behavior classes, four path features of 1-3-5, 1-2-4-8, 1-4-8, and 1-2-3-9 Considering that they are selected redundantly in two or more crowd behavior classes, we have a total of 10 path features: 1-2-4-5-6, 1-3-5, 1-2-4-8, 1 -2-3-9, 1-5, 1-4-8, 1-5-7-8, 1-7-8, 1-6-9, 1-4-8, 1-2-7-9 Path features may be selected.

In Table 6, for example, the "1-3-5" path feature emerges from the gait behavior class, the running behavior class and the avoidance behavior class, and the "1-4-8" path feature appears in the running behavior class and the avoidance behavior class It appears. These overlapping path features appear redundantly in several mob action classes, making it difficult to learn or determine the mob behavior class.

However, as exemplified above in Table 5, the element values of the image feature vectors constituting the same crowd behavior class and the same topology path feature are similarly observed in various context networks. On the other hand, the element values of the image feature vectors constituting the path feature of the same topology in different crowd behavior classes can be observed differently. By using this point, neural network learning can be performed more accurately.

Referring now to FIG. 5, to illustrate the possibility of distinguishing a crowd behavior class from a path feature commonly occurring in different crowd behavior classes, FIG. 5 illustrates a structure learning based crowd according to an embodiment of the present invention FIG. 2 is a schematic diagram illustrating how the values of the image feature vectors are differently distributed in different crowd behavior classes in the context classification step of the behavior recognition method.

Referring to FIG. 5, since the optical flow histogram bin values for each azimuth angle section of the HOOF vector have distinct distributions among the respective behavior classes, the crowd behavior classes can be distinguished from each other.

Referring back to Fig. 4, in step S172, the computer determines whether the image feature vector elements constituting the selected path characteristics are assigned to the input nodes of the neural network, respectively, and the crowd behavior classes are respectively assigned to the output nodes. .

Referring to FIG. 6, a description will now be made of a method of assigning image feature vector elements to input nodes of a neural network. Referring to FIG. 6, This is a schematic diagram illustrating a neural network.

6, the selected path features IP ₁ , IP ₂ , ..., IP _{N are} arranged in a row on the input side of the neural network, and the image feature vectors (IP ₁ to IP _N ) Elements are listed in order, which correspond to the input nodes of the neural network.

On the output side of the neural network, four crowd behavior classes are assigned to the output nodes, respectively.

Although there are common path characteristics in different crowd behavior classes, different values will be input to the image feature vector element nodes constituting path features, so that the weights of the hidden layer of the neural network are appropriately determined .

4 again, in step S173, the computer determines whether the elements of the image feature vectors extracted from the image signal corresponding to each of the crowd behavior classes included in the training data set are input to the input nodes of the neural network The weight values of the neural network are determined such that a corresponding crowd behavior class is determined.

The elements of the image feature vectors for neural network learning may be input to the input nodes of the neural network as extracted values. However, there is a method of inputting the average value of the elements in the image section for each element of the image feature vector, , A method of inputting an average of the maximum values of the respective elements of a plurality of consecutive video segments, and the like, and then input to the input nodes of the neural network.

In addition, if there is no value of an image feature vector element assigned to an input node, a default value, for example "0 ", can be input.

In this way, the structure learning-based crowd behavior recognition method using the computer of the present invention is able to recognize the crowd behavior according to the outline path features of the situation network and the element values of the image feature vector through the procedure of steps S171 to S173 We can learn the neural network that can determine the behavior class.

Returning to FIG. 1, when the setting of the classifier is completed or updated in step S17, the computer may return to step S11 and the computer may determine the crowd behavior class with the image feature vectors extracted from the real-time image signal.

As described above, in step S11, the computer inputs the image feature vectors extracted from the video signal to the classifier, and determines the crowd behavior class by the classifier.

Specifically, in step S11, the computer inputs image feature vector elements constituting path features among the image feature vector elements to each of the input nodes of the neural network, and generates a crowd behavior class based on the output values of the output nodes of the neural network . Since the path feature is selected from the path patterns, each node constituting the path feature corresponds to one of the elements of the image feature vector.

For example, in step S11, the crowd action class corresponding to the largest output value among the output values of the output nodes of the neural network may be a behavior caused by the crowd in the scene in which it is currently photographed.

7 is a block diagram illustrating a structure learning based crowd behavior recognition apparatus according to an embodiment of the present invention.

7, the structure learning based mob behavior recognition apparatus 70 may include a training data set storage unit 71, an image feature vector extraction unit 72, a structure learning unit 73, and a classifier 74 have.

The training data set storage unit 71 stores a training data set consisting of a video signal and a crowd behavior class related to the crowd behavior observed in the video section for each predetermined video section. Training data sets may be prepared such that video signals of at least several tens to several hundreds of video intervals are included for each crowd behavior class for learning.

The image feature vector extracting unit 72 may extract an image feature vector composed of a plurality of elements for each unit image from the received image signal or the image signal in the stored training data set.

According to an embodiment, the image feature vector is a HOOF vector, and the element values of the image feature vector are values of optical flow histogram bins for each azimuth angle section.

The structure learning unit 73 generates context networks in which the dependency relationship between the elements of the extracted image feature vectors is expressed in a graph, and extracts the path patterns from the context networks for each group of behavior classes. As described above, the context network and path patterns are comprised of a plurality of nodes and one or more edges, and each node corresponds to one of the elements of the image feature vector.

To this end, the structure learning unit 73 may include a preprocessing unit 731, a situation network generating unit 732, and a path pattern extracting unit 733. [

The preprocessor 731 appropriately preprocesses the image feature vectors extracted from the image signal extracted from the image signal in the stored training image data set or the image feature vectors required for the new mobility behavior class among the image feature vectors extracted from the received image signal, For example, you can quantize.

The situation network generating unit 732 generates situation networks in which graphs of the dependency between the elements of the image feature vectors are expressed from the preprocessed image feature vectors every image section or every merged image section .

According to an embodiment, the graph is a bezier network.

According to the embodiment, the situation network generation unit 732 can generate a bezier network derived by the K2 algorithm as a situation network.

The path pattern extracting unit 733 extracts path patterns from the situation networks for each group of action classes.

According to an embodiment, the path pattern includes a path beginning at the root node of the situation network and terminating at the end node.

If the element values of the image feature vectors are inputted on the basis of the path patterns extracted from the image feature vectors required for the new crowd behavior class or the image feature vectors extracted from the training data set, And to determine a corresponding one of the behavior classes.

According to an embodiment, the classifier 74 may comprise a multilayer feedforward neural network.

Further, in the classifying step 74, the classifier 74 determines and outputs one class action class based on the element values of the image feature vectors of the input video signal.

For this, the classifier 74 may include a path feature selector 741, a neural network setting unit 742, a crowd behavior class table 743, and a neural network operator 744.

The path feature selecting unit 741 refers to the crowd behavior class table 743 and selects a path having a relatively high appearance probability among the path patterns of each crowd behavior class based on the frequencies of the extracted path patterns for each crowd behavior class Select path features as patterns.

The neural network setting unit 742 sets the neural network setting unit 742 such that the elements of the image feature vectors constituting the path features selected in the path feature selecting unit 741 are allocated to the input nodes of the neural network and the crowd behavior classes are respectively assigned to the output nodes, .

Further, the neural network setting unit 742 sets the image feature vectors extracted from the image signals of the respective crowd behavior classes included in the training data set or the elements of the image feature vectors that cause the addition of the new crowd behavior class to input to the input nodes of the neural network The weight values of the neural network can be set such that corresponding crowd behavior classes are determined.

The crowd behavior class table 743 is used to determine that the corresponding crowd behavior class is determined when the measurements of the multiple sensor signals included in the training data set or of the multiple sensor signals that generated the new situation class are input to the input nodes of the neural network, We can set the weight values of the neural network.

The situation class table 743 stores the crowd behavior classes included in the training data set or the newly added crowd behavior class, and stores the context networks and path patterns extracted for each crowd behavior class.

The neural network operation unit 744 inputs the elements of the image feature vectors extracted from the image signal actually obtained at the input nodes of the neural network set by the neural network setting unit 742, Determine the behavior class.

Specifically, the neural network operation unit 744 inputs the values of the elements constituting the path characteristics among the elements of the image feature vector to each of the input nodes of the neural network, and outputs the largest output value among the output nodes of the neural network And determines the crowd behavior class accordingly.

According to an embodiment, the neural network operator 744 registers a new crowd behavior class in the crowd behavior class table 743 if all of the output values of the output nodes of the neural network are below a predetermined threshold.

It is to be understood that both the foregoing general description and the following detailed description of the present invention are exemplary and explanatory and are intended to provide further explanation of the invention as claimed. It will be understood that variations and specific embodiments which may occur to those skilled in the art are included within the scope of the present invention.

Further, the apparatus according to the present invention can be implemented as a computer-readable code on a computer-readable recording medium. A computer-readable recording medium includes all kinds of recording apparatuses in which data that can be read by a computer system is stored. Examples of the recording medium include ROM, RAM, optical disk, magnetic tape, floppy disk, hard disk, nonvolatile memory and the like. The computer-readable recording medium may also be distributed over a networked computer system so that computer readable code can be stored and executed in a distributed manner.

70 Structure learning based crowd behavior recognition device
71 Training data set storage
72 image feature vector extracting unit
73 Department of Structural Engineering
731 Pretreatment unit
732 Situation Network Generator
733 Path pattern extracting unit
74 Classifier
741 Path feature selector
742 Neural network setting section
743 Crowd Action Class Table
744 Neural network operation unit

Claims

As a method for recognizing crowd behavior based on computer learning,
The computer comprising:
(i) extracting an image feature vector composed of a plurality of elements for each unit image from an image signal including a unit image obtained for each imaging window;
(ii) inputting the image feature vector extracted during the video section into a classifier set to judge a crowd behavior class based on a plurality of elements of the image feature vector, Determining a crowd behavior class for an image feature vector of the image feature vector;
(iii) if the crowd behavior class has not been determined in the step (ii) or if no crowd behavior class has yet been established, the extracted image feature vectors or the image feature vector extracted from the training data set Preprocessing them;
(iv) creating context networks in which a graphical representation of a dependency between elements constituting the image feature vectors extracted in one video segment;
(v) extracting at least one path patterns from the situation networks for each of the crowd behavior classes; And
and (vi) setting the classifier based on the extracted path patterns so that a corresponding crowd behavior class is determined when the image feature vectors are input.

The method of claim 1, wherein the image feature vector is a Histogram of Oriented Optical Flows (HOOF)
Wherein the element values of the image feature vector are values of optical flow histogram bins for each azimuth angle section.

The method of claim 1, wherein the context network is a Bayesian network expressed as a directed acyclic graph.

4. The method of claim 3, wherein the bezier network is obtained through a K2 algorithm.

4. The method of claim 3, wherein the path pattern is each of all possible paths from the root node to the end node of the context network.

The method of claim 1, wherein the classifier is a neural network-
The step (vi)
Selecting path features having an appearance probability corresponding to the upper two thirds of the path patterns, based on the frequencies of the path patterns, for each crowd behavior class;
Constructing the neural network such that image feature vector elements constituting the selected path features are each assigned to input nodes of a neural network and crowd behavior classes are respectively assigned to output nodes of the neural network; And
Wherein a weight of the neural network is determined such that when the elements of the image feature vectors extracted from the image signal corresponding to each of the crowd behavior classes included in the training data set are input to the input nodes of the neural network, weight values of the at least one of the plurality of learning objects.

The method of claim 1, wherein the classifier is a neural network-
Between the step (ii) and the step (iii)
Further comprising the step of adding a new crowd behavior class if the crowd behavior class has not been determined in step (ii) or before any crowd behavior class has been established yet.

A computer program recorded on a recording medium, the computer program being implemented to perform the steps of the structure learning based crowd behavior recognition method according to any one of claims 1 to 7.

A training data set storage unit for storing a training data set including a video signal and a crowd behavior class related to crowd behavior observed in the video region for each predetermined video region;
An image feature vector extracting unit for extracting an image feature vector composed of a plurality of elements for each unit image from the received image signal or the image signal in the stored training data set;
A structure learning unit for generating context networks in which graphical representations of dependency relationships between the extracted elements of the image feature vector are extracted and path patterns are extracted from context networks for each group of behavior classes; And
If the elements of the image feature vector are input based on the path patterns extracted from the image feature vectors that need a new crowd behavior class or the image feature vectors extracted from the training data set, one of the crowd behavior classes And a classifier configured to determine a corresponding mob action class of the mob.

The method of claim 9, wherein the image feature vector is a HOOF vector,
Wherein the element values of the image feature vector are values of optical flow histogram bins for each azimuth angle section.

The apparatus of claim 9, wherein the context network is a bezier network represented by a directional acyclic graph.

12. The apparatus of claim 11, wherein the bezier network is obtained through a K2 algorithm.

12. The apparatus of claim 11, wherein the path pattern is each of all possible paths from a root node to an end node of the context network.

The method of claim 9, wherein the classifier is a neural network-
A path feature selecting unit for selecting path features as path patterns having an appearance probability corresponding to the upper 2/3 for each crowd behavior class; And
A neural network setting unit configured to set weight values of the neural network such that corresponding context classes are determined when the elements of the image feature vector are input to input nodes of the neural network; And
And a crowd behavior class table storing the crowd behavior classes included in the training data set or the newly added crowd behavior class and storing context networks and path patterns extracted for each crowd behavior class,
Wherein the neural network is configured such that image feature vector elements constituting selected path features are respectively assigned to input nodes of the neural network and also crowd behavior classes are respectively assigned to output nodes.

15. The apparatus of claim 14,
Inputting elements of an image feature vector to input nodes of the set neural network, determining a crowd behavior class based on output values of the output nodes of the neural network, and outputting all output values of the output nodes of the neural network to a predetermined threshold value Further comprising a neural network operation unit for additionally registering a new crowd behavior class in the crowd behavior class table if the current crowd behavior class is lower than the current crowd behavior class.