LU505427B1

LU505427B1 - Group behavior analysis method and device

Info

Publication number: LU505427B1
Application number: LU505427A
Authority: LU
Inventors: Baochang Zhang; Jinhu Lyu; Ying Luo; Kexin Liu
Original assignee: Acad Of Mathematics And Systems Science Chinese Acad Of Sciences; Univ Beihang
Priority date: 2023-11-03
Filing date: 2023-11-03
Publication date: 2024-05-03

Abstract

The embodiment of that invention provide a group behavior analysis method and a group behavior analysis device, and the method includes following steps: constructing a target loss function based on a network loss function and a variance minimization method, and constructing a target supervision model by in-depth supervision based on the target loss function; constructing a target depth neural network based on the multi-channel encoder, the multi-channel decoder and the target supervision model; and performing group behavior analysis through the target depth neural network to determine the group behavior state according to the analysis result. By adopting the group behavior analysis method provided by the invention, the accuracy and reliability of analysis results can be effectively improved, the analysis efficiency of group behavior can be improved, and the consumption of human resources can be reduced.

Description

DESCRIPTION LU505427

GROUP BEHAVIOR ANALYSIS METHOD AND DEVICE

TECHNICAL FIELD

The application relates to the technical field of behavior analysis, and in particular to a group behavior analysis method and a group behavior analysis device.

BACKGROUND

With the rapid development of urbanization, the number of abnormal group behaviors has also increased greatly, and the occurrence of abnormal group behaviors usually poses harm to social public safety. Therefore, how to analyze group behavior has become particularly important.

At this stage, the analysis of group behavior is usually carried out by staff. Specifically, first of all, real-time images of corresponding places can be obtained through monitoring equipment installed in various public places. Then, the staff can monitor the images obtained by various monitoring devices, and can manually analyze whether abnormal group behaviors occur based on the monitored images, so as to deal with abnormal group behaviors according to the analysis results. However, the analysis of group behavior by manpower depends on the professional level of the staff, which will lead to low accuracy of the analysis results.

Therefore, there is an urgent need for an accurate group behavior analysis method to solve the above problems.

SUMMARY

As the existing methods have the above problems, the embodiment of the application provides a group behavior analysis method and a group behavior analysis device.

In a first aspect, an embodiment of the present application provides a group behavior analysis method, the method includes: constructing a target loss function based on a network loss function and a variance minimization method, and constructing a target supervision model by in-depth supervision based on the target loss function; constructing a target depth neural network based on the multi-channel encoder, the multi-channel decoder and the target supervision model; and performing group behavior analysis through the target depth neural network to determine LU505427 the group behavior state according to the analysis result.

Optionally, the construction of the target supervision model based on the target loss function and the depth supervision mode includes: obtaining a preset collocation coefficient, and constructing a target supervision model by a depth supervision mode based on the preset collocation coefficient and the target loss function, where the depth supervision mode includes a spatial abstract loss function and a pixel space position loss function.

Optionally, the construction of the target depth neural network based on the multi-channel encoder and the multi-channel decoder and the target supervision model includes: constructing a multi-scale residual encoder based on a first preset number of maximally pooled initial Inception convolutional blocks and a second preset number of multi-channel encoders with different scales; constructing a target multi-channel codec based on the multi-scale residual encoder and multi-channel decoder, and constructing a target depth neural network based on the target multi-channel codec and the target supervision model.

Optionally, the group behavior analysis through the target depth neural network to determine the group behavior state according to the analysis result includes: obtaining the pixel values of the target monitoring area through the target depth neural network, and determining the sum of pixel values of the target monitoring area based on the target monitoring area; obtaining a preset pixel threshold of the target scene corresponding to the target monitoring area through the target depth neural network, and judging whether the sum of the pixel values is greater than the preset pixel threshold, if it is greater than the preset pixel threshold, the target depth neural network determines that there is abnormal group behavior in the target monitoring area.

In a second aspect, an embodiment of the present application provides a group behavior analysis device, the device includes a model construction module, a network construction module and a behavior analysis module, where: the model construction module is used for constructing a target loss function based on a network loss function and a variance minimization method, and constructing a target LU505427 supervision model by in-depth supervision based on the target loss function; the network construction module is used for constructing a target depth neural network based on the multi-channel encoder, the multi-channel decoder and the target supervision model; the behavior analysis module is used for performing group behavior analysis through the target depth neural network to determine the group behavior state according to the analysis result.

Optionally, the model construction module is used for: obtaining a preset collocation coefficient, and constructing a target supervision model by a depth supervision mode based on the preset collocation coefficient and the target loss function, where the depth supervision mode includes a spatial abstract loss function and a pixel space position loss function.

Optionally, the network construction module is used for: constructing a multi-scale residual encoder based on a first preset number of maximally pooled initial Inception convolutional blocks and a second preset number of multi-channel encoders with different scales; constructing a target multi-channel codec based on the multi-scale residual encoder and multi-channel decoder, and constructing a target depth neural network based on the target multi-channel codec and the target supervision model.

Optionally, the behavior analysis module is used for: obtaining the pixel values of the target monitoring area through the target depth neural network, and determining the sum of pixel values of the target monitoring area based on the target monitoring area; obtaining a preset pixel threshold of the target scene corresponding to the target monitoring area through the target depth neural network, and judging whether the sum of the pixel values is greater than the preset pixel threshold; if it is greater than the preset pixel threshold, the target depth neural network determines that there is abnormal group behavior in the target monitoring area.

In the third aspect, the embodiment of the present application also provides an electronic device, which includes a memory, a processor and a computer program stored on the memory and executable on the processor, and the processor implements the steps of the group LU505427 behavior analysis method as described in the first aspect when executing the program.

In the fourth aspect, the embodiment of the present application also provides a non-transient computer-readable storage medium, which stores a computer program, and the computer program, when executed by a processor, implements the steps of the group behavior analysis method according to the first aspect.

As can be seen from the above technical scheme, the group behavior analysis method and the group behavior analysis device provided by the embodiment of the present application analyze the group behavior based on the target depth neural network by constructing the target depth neural network, so as to determine the group behavior as the state according to the analysis result. In this way, analyzing the group behavior based on the target depth neural network can avoid the influence of the professional level of the staff on the analysis results, thus effectively improving the accuracy and reliability of the analysis results and improving the analysis efficiency of the group behavior. At the same time, analyzing group behavior based on target depth neural network, that is, realizing automatic analysis of group behavior through target depth neural network, can also reduce the consumption of human resources and further improve the analysis efficiency of group behavior.

BRIEF DESCRIPTION OF THE FIGURES

In order to more clearly explain the embodiments of the present application or the technical scheme in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are some embodiments of the present application. For ordinary people in the field, other drawings can be obtained according to these drawings without creative work.

FIG. 1 is a flow diagram of a group behavior analysis method provided by an embodiment of the present application;

FIG. 2 1s a schematic diagram of an analysis result based on a network test set provided by an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a target depth neural network provided by an embodiment of the present application;

FIG. 4 is a schematic structural diagram of a group behavior analysis device provided by LU505427 an embodiment of the present application;

FIG. 5 is a logic block diagram of an electronic device provided by an embodiment of the present application.

DESCRIPTION OF THE APPLICATION

In order to make the purpose, technical scheme and advantages of the embodiment of the application clearer, the technical scheme in the embodiment of the application will be described clearly and completely with the attached drawings. Obviously, the described embodiment is a part of the embodiment of the application, but not the whole embodiment.

Based on the embodiments in the present application, all other embodiments obtained by ordinary technicians in the field without creative work belong to the scope of protection of the present application.

FIG. 1 shows a flow diagram of a group behavior analysis method provided by this embodiment, including:

S101, constructing a target loss function based on a network loss function and a variance minimization method, and constructing a target supervision model by in-depth supervision based on the target loss function.

The target loss function refers to a new loss function based on the network loss function and the variance minimization method.

The target supervision model refers to a new supervision model constructed by in-depth supervision based on the above-mentioned target loss function.

In practice, a target monitoring model can be constructed based on network loss function, variance minimization method and depth monitoring mode, and a target depth neural network can be constructed based on a multi-channel encoder, a multi-channel decoder and the above-mentioned target monitoring model, and the group behavior can be analyzed through the above-mentioned target depth neural network to determine whether abnormal group behavior occurs. Specifically, first of all, based on the redundancy measurement principle of automatic control system, the variance minimization method can be introduced into the network loss function to construct the target loss function. That is to say, qloss= X X (Zi-Zj) can be introduced into the network loss function as a part of the target loss function to construct the target loss function Qloss. In this way, the introduction of variance minimization LU505427 method can train weights and reduce noise variance, thus improving the traditional in-depth supervision method and further narrowing the gap between pictures. Then, based on the above-mentioned target loss function, the target supervision model can be constructed by in-depth supervision.

S102, constructing a target depth neural network based on the multi-channel encoder, the multi-channel decoder and the target supervision model; where the target depth neural network refers to a depth neural network constructed based on the target supervision model, the multi-channel encoder and the multi-channel decoder.

In practice, after constructing the target supervision model, the target depth neural network can be constructed based on the multi-channel encoder, the multi-channel decoder and the above-mentioned target supervision model. Specifically, the multi-channel encoder and the multi-channel decoder can be integrated together, and then a target depth neural network can be constructed based on the multi-channel encoder, the multi-channel decoder and the target monitoring model.

S103, performing group behavior analysis through the target depth neural network to determine the group behavior state according to the analysis result.

In practice, after the target depth neural network is constructed, the group behavior can be analyzed based on the target depth neural network to determine the group behavior state according to the analysis results, where the group behavior state can at least include abnormal group behavior and normal group behavior, so that the staff can take different ways to deal with different group behavior states directly according to the analysis results..

It can be understood that because the color of the density map is related to the size of pixel values, in order to enable the staff to see the group behavior state of the target monitoring area more intuitively, after determining the group behavior state of the target monitoring area, the corresponding density map can also be generated according to the pixel values. Specifically, different pixel thresholds can be set for each scene. For example, the first preset pixel threshold can be set. When the sum of real-time pixel values is greater than or equal to the first preset pixel threshold, it can be considered that the probability of abnormal group behavior in the target monitoring area is higher or the scale of abnormal group behavior is larger, that is, the severity of abnormal group behavior is higher. At this time, the density map corresponding to the target monitoring area can be expressed in conspicuous colors such LU505427 as red, so that the staff can see the group behavior state and the corresponding severity of the target monitoring area more intuitively and deal with it accordingly.

At the same time, a second preset pixel threshold can be set. When the sum of real-time pixel values is greater than or equal to the second preset pixel threshold and less than the first preset pixel threshold, it can be considered that the probability of abnormal group behavior in the target monitoring area is higher than that when the sum of real-time pixel values is greater than the first preset pixel threshold, or the scale of group behavior is smaller. That is, the severity of abnormal group behavior is lower than when the sum of the real-time pixel values is greater than the first preset pixel threshold. At this time, the density map corresponding to the target monitoring area can be expressed in more conspicuous colors such as yellow, so that the staff can see the group behavior state and the corresponding severity of the target monitoring area more intuitively and deal with it accordingly.

Further, a third preset pixel threshold can be set. When the sum of real-time pixel values is less than the third preset pixel threshold, it can be considered that the viewing angle fault may occur in the target monitoring area, such as the camera's poor viewing angle or the camera being blocked. At this time, the density map corresponding to the target monitoring area can be expressed in blue and other colors, so that the staff can see the group behavior state and the corresponding severity of the target monitoring area more intuitively and deal with it accordingly. If the staff determines that the camera's visual angle is normal and there is no occlusion, it can be considered that there is no abnormal group behavior in the target monitoring area, that is, the current group behavior state is normal group behavior. In this way, it can effectively reduce the staff's time-consuming handling of abnormal group behaviors, and can control the group behavior state of the monitoring area as a whole, further reducing the time-consuming handling of abnormal group behaviors and improving the processing efficiency.

As shown in FIG. 2, in order to verify the accuracy of the target depth provided by this application, a verification experiment is conducted based on the shanghaitech partA data set.

In this verification, the input image is processed into a 3-channel picture with the size of 512*512, and the network includes the Inception coding plus the maximum pooled downsampling (repeated once). After 4 times of mean square error processing coding, the information of output pictures at different levels is then decoded and fused by multi-channels, LU505427 and the feature map of 4 channels with the size of 128*128 is obtained after 10 times of decoding, and the predicted density map is output after two times of nearest neighbor interpolation upsampling.

In the verification experiment, the application firstly divides the shanghaitech partA data set into two parts: a training set and a test set. For the training set, the label can be used to get the corresponding true density map, and then the target depth neural network can be used to obtain the predicted density map is obtained, and the weights are constantly updated. For the test set, crowd pictures are input, and the final output of the target depth neural network 1s its corresponding predicted density map and number of people, and then whether the behavior is abnormal or not based on the target depth neural network is judged. The uniform distribution in FIG. 2 refers to the case that the group behavior state is normal. Abnormal distribution refers to the situation that the group behavior state is abnormal group behavior. It can be seen from FIG. 2 that the density map generated by the target depth neural network based on the above target multi-channel codec has better quality and higher accuracy, thus effectively improving the accuracy of the analysis results of the target depth neural network and improving the analysis efficiency.

Further, on the basis of the above-mentioned method embodiment, a target supervision model can be constructed based on the preset collocation coefficient, and the corresponding part of the processing in S101 can be as follows: obtaining the preset collocation coefficient,

and constructing the target supervision model through in-depth supervision based on the LU505427 preset collocation coefficient and the target loss function.

Where the depth supervision method includes Spatial abstraction loss function (SAL) and Spatial Correlation loss function (SCL).

The preset collocation coefficient refers to the coefficient of a preset loss function, such as 0.2.

In practice, when constructing the target depth supervision model, the preset collocation coefficient can also be obtained. Then, based on the preset collocation coefficient and the target loss function, the target supervision model can be constructed by depth supervision methods such as spatial abstract loss function and pixel spatial position loss function. In this way, the accuracy of the analysis results of the target depth neural network can be further improved.

As shown in FIG. 3, in FIG. 3, input xp represents image input, Encoder represents encoder, Decoder represents the whole decoding process, which can be composed of various

Deconvblocks, and x represents the feature map obtained after encoding, Wi represents decoding and up-sampling modules, in which the number of up-sampling modules can be set according to actual requirements; Zi represents the output diagram of the i-th decoding channel; Wj o represents the channel fusion module of the i-th channel; li represents the difference between the real density map Ground truth and the output maps after it is down-sampled to the same scale as each output map, including MSE, SAL and SCL; di represents the coefficient when adding li, and y; represents the output result that the number of channels of each output graph is fused into 1; qi represents the value obtained by Qloss of each output result; Pi represents the coefficient when each qloss is added, i = 1, 2, 3, 4, ... I (i.e. gloss can be composed of multiple gloss); Qp represents the value obtained by adding the qloss combination of all decoded channels; Deconv block refers to the module of decoder;

Upsampling blcok refers to the upsampling link; Intermediate output modute refers to the decoding result of each channel without channel fusion; and the general number of channels is not 1; MSE loss refers to the value obtained by MSE between each output characteristic graph and Ground truth; Structure variance loss refers to the structural error expressed by variance;

Reinforced Deep supervision refers to the enhanced deep supervision model proposed in the practical example of this application, that is, the target supervision model, and deep supervision refers to the deep supervision process. LU505427

First of all, in order to further narrow the gap between pictures, the variance minimization method qloss= X X (Zi-Zj) can be added to the network loss function.

Specifically, suppose that if the density map (denoted as Y, 128 1) obtained by downsampling the same true value exists at the same time, there are two different measurement systems S1 and S2, whose K-th measurement values are Zi(k) and Z2(k) respectively, and the corresponding noises are V1(k) and Vz(k) respectively, then:

Z1(k)=Y+V1(k)

Z2(k)=Y+V2(k)

It can be concluded that the measurement noise variance of the measurement system S1 (a system with relatively high measurement noise) can be as follows: where, Var[V1(k)] represents the measurement noise variance of the measurement channel S1 (with relatively high measurement noise), AZ2(k) represents the self-differential sequence of the measurement value of Z, and AZ 1(k) represents the self-differential sequence of the measurement value of Z1, where k = 1, 2, …

Therefore, qloss= X X (Zi-Z;) can be added to the network loss function as a part of the target loss function for training weights and reducing noise variance. The specific function of

Qloss 1s:

Then, the objective loss function can be constructed by SAL based on the multi-channel network structure, that is, the auxiliary outputs Z1, Z2, Z3, ZA, etc. are used for the construction of the objective loss function, so as to alleviate the problem of gradient disappearance and enhance the gradient flow through the network. For example, the error caused by the spatial abstraction between the encoded result and the real density map can be recorded as Lsa (Spatial abstraction loss), and each Zi and the real density map obtained by processing the data set are processed as follows:

En LU505427

Log = 52, He {23 — (ME

Where N represents the total number of pixels in a density map, K represents the set total number of downsampling operations, Z represents the output characteristic map of each output channel (the number of channels is 1), Y represents the Ground truth, © CH represents the downsampling operation with a kernel size of j*j for each output characteristic map Z, and @; (7) represents the downsampling operation with a kernel size of j*j for each real density map Y. wild refers to downsampling the image with a kernel size of j*j, and the code can be set to j = 0, 2, 4, 8. wil) image values can be obtained after maximum pooling processing with different kernel sizes and step sizes of 2.

After that, mean square error processing can be performed on the results of downsampling Z1-Z1 with the real image, and mean square error processing can be performed on the final output Zi and the real value Y to increase the local correlation between the output and the real value. At the same time, the loss of pixel spatial position can be calculated, which is recorded as Lsc (Spatial Correlation loss).

JET ZENE where Zi; represents the pixel value of each output map, Yj represents the pixel value of the position corresponding to Zij in the Ground truth, I represents the total number of pixels in the horizontal direction, J represents the total number of pixels in the vertical direction, 1 represents the number of pixels in the horizontal direction, and j represents the number of pixels in the vertical direction. Y; and Z; respectively represent the pixels in the real density map and the output density map predicted by the network, and the total number of pixels 1s 1*], where 1 and j are the horizontal and vertical coordinates of pixels and 1*j is the maximum value of pixels. SCL represents the difference between two density maps based on normalized cross-correlation (NCC) similarity, which is not sensitive to the linear change of density map intensity, and compared with the loss of traditional monitoring methods (such as mean square error MSE), SCL 1s easier to calculate and program experimentally.

Then, the preset collocation parameters of the target loss function can be obtained, and LU505427 the specific formula of the target supervision model can be obtained as follows:

Le aly RE; YH 01 2. > Ze 7 VIE + 16 > Lead + À > Ofass where L represents a value representing a combined loss; i represents the serial number of the output channel, i = 0, 1, 2, 3, ...; ai represents the coefficient when the loss values calculated between the output graph and the Ground truth are combined; j represents the serial number of SAL downsampling kernel, j = 0, 1, 2, 3, ...; Zi represents the output characteristic diagram of the i-th output channel; Y represents Ground truth; “= represents the result of downsampling operation with the kernel of j*j for the output result of the i-th output channel;

Y; represents the result of downsampling operation with kernel size of j*j by Ground truth;

Lsci represents the calculation result of SCL on the I-th output channel, and A is the preset collocation parameter of the target loss function Qloss. 0.1 and 10 are the parameters of SAL and SCL determined by experiments, and the experimental results show that the effect is better when A=0.2.

Further, on the basis of the above method embodiment, a target multi-channel codec can be constructed and then a target depth neural network can be constructed, and the corresponding processing in the S102 can be as follows: a multi-scale residual coder can be constructed based on a first preset number of maximally pooled initial Inception convolutional blocks and a second preset number of multi-channel encoders with different scales, a target multi-channel codec can be constructed based on a multi-scale residual encoder and a multi-channel decoder, and a target depth neural network can be constructed based on the target multi-channel codec and the target supervision model.

The first preset number refers to the preset maximum number of initial Inception convolution blocks, such as two.

The second preset number refers to the preset number of multi-channel encoders with different scales, such as four.

In practice, first, a first preset number of maximally pooled initial Inception convolutional blocks can be set, and a second preset number of multi-channel encoders with different scales can be set. Then, the above-mentioned multi-channel coders with different scales can be interconnected by shortcuts, and a multi-scale residual encoder can be constructed based on the above-mentioned first preset number of maximally pooled initial

Inception convolutional blocks and the second preset number of multi-channel coders with LU505427 different scales. After that, a multi-channel decoder can be set up, and the multi-scale residual encoder and the multi-channel decoder can be integrated together to obtain the target multi-channel codec. Then, based on the above-mentioned target multi-channel codec and combined with the target supervision model, a target deep neural network can be constructed.

In this way, multi-scale multi-channel encoders can be fully integrated, thus further improving the analysis efficiency and accuracy of the target depth neural network.

Further, on the basis of the above-mentioned method embodiment, the group behavior analysis can be performed through the target depth neural network to determine the group behavior state according to the analysis result, and the corresponding processing of the above-mentioned S103 can be as follows: obtaining the pixel values of the target monitoring area through the target depth neural network, and determining the sum of the pixel values of the target monitoring area based on the pixel values of the target monitoring area; obtaining a preset pixel threshold of the target scene corresponding to the target monitoring area through the target depth neural network, and judging whether the sum of the pixel values is greater than the preset pixel threshold; if it is greater than the preset pixel threshold, the target depth neural network determines that there is abnormal group behavior in the target monitoring area.

The target monitoring area refers to any area where group behavior analysis is needed.

The preset pixel threshold refers to the upper limit of the sum of corresponding pixel values in different scenes (such as stations and shopping malls). When the sum of actual pixel values is greater than this value, it can be considered that abnormal group behavior has occurred.

In practice, the group behavior can be analyzed through the above-mentioned target depth neural network, and the analysis results can be obtained, so that the staff can carry out corresponding treatment according to the analysis results. Specifically, first of all, the pixel value of the target monitoring area at the current moment can be obtained through the target depth neural network, and the pixel values in the target monitoring area can be summed to obtain the sum of the pixel values in the target monitoring area. Then, the target scene corresponding to the target monitoring area can be determined, the preset pixel threshold of the target scene can be obtained, and the sum of pixel values of the target monitoring area can be compared with the preset pixel threshold of the target scene to judge whether the sum of LU505427 pixel values of the target monitoring area is greater than the preset pixel threshold of the target scene. If the sum of the pixel values of the target monitoring area 1s greater than the preset pixel threshold of the target scene, the target depth neural network can determine that abnormal group behavior has occurred in the target monitoring area at the current moment.

On the contrary, there is no abnormal group behavior in the target monitoring area. When the target depth neural network determines that abnormal group behavior occurs in the target monitoring area, it can also send a warning message to the terminal, so that the staff can handle it accordingly according to the warning message. Therefore, it can effectively reduce the processing time of abnormal group behaviors, and further improve the processing efficiency of abnormal group behaviors.

Further, on the basis of the above method embodiment, the embodiment of the present application also provides a group behavior analyzing device, as shown in FIG. 4, which includes a model construction module 401, a network construction module 402 and a behavior analyzing module 403, where:

The model construction module 401 1s used for constructing a target loss function based on a network loss function and a variance minimization method, and constructing a target supervision model by in-depth supervision based on the target loss function; the network construction module 402 is used for constructing a target depth neural network based on the multi-channel encoder, the multi-channel decoder and the target supervision model; the behavior analysis module 403 is used for performing group behavior analysis through the target depth neural network to determine the group behavior state according to the analysis result.

Optionally, the model construction module 401 is used for: obtaining a preset collocation coefficient, and constructing a target supervision model by a depth supervision mode based on the preset collocation coefficient and the target loss function, where the depth supervision mode includes a spatial abstract loss function and a pixel space position loss function.

Optionally, the network construction module 402 is used for: constructing a multi-scale residual encoder based on a first preset number of maximally pooled initial Inception convolutional blocks and a second preset number of multi-channel LU505427 encoders with different scales; constructing a target multi-channel codec based on the multi-scale residual encoder and multi-channel decoder, and constructing a target depth neural network based on the target multi-channel codec and the target supervision model.

Optionally, the behavior analysis module 403 is used for: obtaining the pixel values of the target monitoring area through the target depth neural network, and determining the sum of pixel values of the target monitoring area based on the target monitoring area; obtaining a preset pixel threshold of the target scene corresponding to the target monitoring area through the target depth neural network, and judging whether the sum of the pixel values is greater than the preset pixel threshold, if it is greater than the preset pixel threshold, the target depth neural network determines that there is abnormal group behavior in the target monitoring area.

The group behavior analysis device described in this embodiment can be used to execute the above method embodiment, and its principle and technical effect are similar, so the details are not repeated here.

As shown in FIG. 5, the embodiment of the present application also provides an electronic device, the electronic device may include a processor 501, a memory 502 and a bus 503: where: the processor 501 and the memory 502 communicate with each other through the bus 503.

The processor 501 is used to call the program instructions in the memory 502 to execute the methods provided by the above method embodiment.

In addition, the above-mentioned logic instructions in the memory 502 can be realized in the form of software functional units and can be stored in a computer-readable storage medium when they are sold or used as independent products. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product, which is stored in a storage medium and includes several instructions to make a computer device (which can be a personal computer, a server, a network device, etc.) execute all or part of the steps of the method described in various embodiments of the present LU505427 application. The above-mentioned storage media include: U disk, mobile hard disk,

Read-Only Memory (ROM), Random Access Memory (RAM), magnetic disk or optical disk and other media that can store program codes.

On the other hand, the embodiment of the present application also provides a non-transient computer-readable storage medium, on which a computer program is stored, which is implemented when executed by a processor to execute the method provided by the above method embodiment.

The device embodiments described above are only schematic, in which the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place or distributed to multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of this embodiment. Ordinary technicians in this field can understand and implement it without creative labor.

From the description of the above embodiments, those skilled in the art can clearly understand that each embodiment can be realized by means of software and necessary general hardware platform, and of course it can also be realized by hardware. Based on this understanding, the essence of the above technical scheme or the part that has contributed to the prior art can be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc, and includes several instructions for making a computer device (which can be a personal computer, a server, or a network device, etc.) execute the methods described in various embodiments or some parts of the embodiments.

Finally, it should be explained that the above embodiments are only used to illustrate the technical scheme of the present application, but not to limit it, Although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that the technical scheme described in the foregoing embodiments can still be modified, or some technical features thereof can be replaced by equivalents; however, these modifications or substitutions do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of various embodiments of the present application.

Claims

CLAIMS LU505427

1. À group behavior analysis method, characterized in that the method comprises: constructing a target loss function based on a network loss function and a variance minimization method, and constructing a target supervision model by in-depth supervision based on the target loss function; where the target supervision model can be obtained as follows: Le 0) HR VEE + 61 2, 2, 12, HS + 10 > boot) + À > less where L represents a value representing a combined loss; i represents the serial number of the output channel, i = 0, 1, 2, 3, ...; ai represents the coefficient when the loss values calculated between the output graph and the ground truth are combined; j represents the serial number of SAL downsampling kernel, j = 0, 1, 2, 3, …; Zi represents the output characteristic diagram of the i-th output channel; Y represents ground truth; “= represents the result of downsampling operation with the kernel of j*] for the output result of the i-th output channel; Y; represents the result of downsampling operation with kernel size of j*j by ground truth; Lsci represents the calculation result of SCL on the I-th output channel, and A is the preset collocation parameter of the target loss function Qloss; constructing a target depth neural network based on the multi-channel encoder, the multi-channel decoder and the target supervision model; performing group behavior analysis through the target depth neural network to determine the group behavior state according to the analysis result; where the construction of the target supervision model based on the target loss function and the depth supervision mode includes: obtaining a preset collocation coefficient, and constructing a target supervision model by a depth supervision mode based on the preset collocation coefficient and the target loss function, where the depth supervision mode includes a spatial abstract loss function and a pixel space position loss function; where the construction of the target depth neural network based on the multi-channel encoder and the multi-channel decoder and the target supervision model includes: constructing a multi-scale residual encoder based on a first preset number of maximally pooled initial inception convolutional blocks and a second preset number of multi-channel encoders with different scales;

constructing a target multi-channel codec based on the multi-scale residual encoder and LU505427 multi-channel decoder, and constructing a target depth neural network based on the target multi-channel codec and the target supervision model; where the group behavior analysis through the target depth neural network to determine the group behavior state according to the analysis result includes: obtaining the pixel values of the target monitoring area through the target depth neural network, and determining the sum of pixel values of the target monitoring area based on the target monitoring area; obtaining a preset pixel threshold of the target scene corresponding to the target monitoring area through the target depth neural network, and judging whether the sum of the pixel values is greater than the preset pixel threshold, if it is greater than the preset pixel threshold, the target depth neural network determines that there is abnormal group behavior in the target monitoring area; where different pixel thresholds are set for each scene, and the first preset pixel threshold is set; when the sum of real-time pixel values is greater than or equal to the first preset pixel threshold, it is considered that the probability of abnormal group behavior in the target monitoring area is higher; a second preset pixel threshold is set, when the sum of real-time pixel values is greater than or equal to the second preset pixel threshold and less than the first preset pixel threshold, it is considered that the probability of abnormal group behavior in the target monitoring area is higher than when the sum of real-time pixel values is greater than the first preset pixel threshold; and a third preset pixel threshold is set; when the sum of real-time pixel values is less than the third preset pixel threshold, it is considered that the viewing angle fault may occur in the target monitoring area.

2. A group behavior analysis device, characterized in that the device comprises a model construction module, a network construction module and a behavior analysis module, where: the model construction module is used for constructing a target loss function based on a network loss function and a variance minimization method, and constructing a target supervision model by in-depth supervision based on the target loss function; the network construction module is used for constructing a target depth neural network based on the multi-channel encoder, the multi-channel decoder and the target supervision model;

the behavior analysis module is used for performing group behavior analysis through the LU505427 target depth neural network to determine the group behavior state according to the analysis result,

where the target supervision model can be obtained as follows:

where L represents a value representing a combined loss; i represents the serial number of the output channel, i = 0, 1, 2, 3, ...; ai represents the coefficient when the loss values calculated between the output graph and the ground truth are combined; j represents the serial number of SAL downsampling kernel, j = 0, 1, 2, 3, …; Zi represents the output characteristic diagram of the i-th output channel; Y represents ground truth; *is: represents the result of downsampling operation with the kernel of j*] for the output result of the i-th output channel; Y; represents the result of downsampling operation with kernel size of j*j by ground truth; Lsci represents the calculation result of SCL on the I-th output channel, and A is the preset collocation parameter of the target loss function Qloss;

where the model construction module is used for obtaining a preset collocation coefficient, and constructing a target supervision model by a depth supervision mode based on the preset collocation coefficient and the target loss function, where the depth supervision mode includes a spatial abstract loss function and a pixel space position loss function;

where the network construction module is used for constructing a multi-scale residual encoder based on a first preset number of maximally pooled initial inception convolutional blocks and a second preset number of multi-channel encoders with different scales; constructing a target multi-channel codec based on the multi-scale residual encoder and multi-channel decoder, and constructing a target depth neural network based on the target multi-channel codec and the target supervision model;

where the behavior analysis module is used for obtaining the pixel values of the target monitoring area through the target depth neural network, and determining the sum of pixel values of the target monitoring area based on the target monitoring area; obtaining a preset pixel threshold of the target scene corresponding to the target monitoring area through the target depth neural network, and judging whether the sum of the pixel values is greater than the preset pixel threshold; if it is greater than the preset pixel threshold, the target depth neural network determines that there is abnormal group behavior in the target monitoring area;

where different pixel thresholds are set for each scene, and the first preset pixel threshold LU505427 is set, when the sum of real-time pixel values is greater than or equal to the first preset pixel threshold, it is considered that the probability of abnormal group behavior in the target monitoring area is higher; a second preset pixel threshold is set, when the sum of real-time pixel values is greater than or equal to the second preset pixel threshold and less than the first preset pixel threshold, it is considered that the probability of abnormal group behavior in the target monitoring area is higher than when the sum of real-time pixel values is greater than the first preset pixel threshold; and a third preset pixel threshold is set; when the sum of real-time pixel values is less than the third preset pixel threshold, it is considered that the viewing angle fault may occur in the target monitoring area.

3. An electronic device, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the group behavior analysis method according to claim 1 when executing the program.

4. A non-transient computer-readable storage medium, storing a computer program, characterized in that the computer program, when executed by a processor, implements the steps of the group behavior analysis method according to claim 1.