CN115830515A - Video target behavior identification method based on spatial grid - Google Patents

Video target behavior identification method based on spatial grid Download PDF

Info

Publication number
CN115830515A
CN115830515A CN202310047339.6A CN202310047339A CN115830515A CN 115830515 A CN115830515 A CN 115830515A CN 202310047339 A CN202310047339 A CN 202310047339A CN 115830515 A CN115830515 A CN 115830515A
Authority
CN
China
Prior art keywords
target
gaussian
pixel
frame
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310047339.6A
Other languages
Chinese (zh)
Other versions
CN115830515B (en
Inventor
施晓东
徐俊瑜
刘佳
韩东
谢诏光
孙镱诚
陆中祥
丁阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 28 Research Institute
Original Assignee
CETC 28 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 28 Research Institute filed Critical CETC 28 Research Institute
Priority to CN202310047339.6A priority Critical patent/CN115830515B/en
Publication of CN115830515A publication Critical patent/CN115830515A/en
Application granted granted Critical
Publication of CN115830515B publication Critical patent/CN115830515B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a video target behavior identification method based on a spatial grid, which comprises the steps of establishing a data set, wherein the data set comprises the type and the state of a target; identifying the type and the state of a target in a video frame through a target identification algorithm, and detecting a moving target in the video frame through a moving target detection algorithm; based on spatial grid positioning, the behavior and the action of the target in the video frame are analyzed through target detection and motion detection in combination with the peripheral situation of the grid. The invention aims at the scene of video target behavior identification based on spatial grids. The action of the target is identified through target detection identification and moving target detection, and the behavior of the target is identified through space grid positioning and combining the peripheral situation of the grid.

Description

Video target behavior identification method based on spatial grid
Technical Field
The invention relates to the fields of geographic space rasterization processing and situation perception, in particular to a video target behavior identification method based on a spatial grid.
Background
Video target behavior identification based on spatial grids is a very useful method for studying battlefield target behavior. By analyzing the behavior of the battlefield target for a long time, the obtained data is more scientific, more objective and has more reference value. Although a great deal of research and innovation is carried out on the existing behavior analysis and identification method, most of the existing behavior analysis and identification methods are based on the traditional evaluation method, and the actual use requirements are not met in the aspects of accuracy, timeliness and practicability.
For example, CN111222487a in the prior art discloses a video target behavior recognition method and an electronic device, where the method includes: acquiring a video to be identified, wherein the video comprises an image frame of the video to be identified; acquiring one or more local target images through a target detection model; matching the obtained local target images through a target tracking model to obtain one or more target image sequences; performing quality scoring on the target image behaviors in each target image sequence through a target behavior quality scoring model to obtain a high-quality target image subsequence; and performing behavior recognition on the obtained high-quality target image subsequence through a behavior recognition model to obtain a behavior recognition result. The method only carries out behavior recognition on the high-quality target image subsequence in the video target image sequence, and on one hand, the influence of the low-quality target behavior recognition result on the whole video target behavior recognition result is eliminated; on the other hand, the efficiency of identifying the video target behaviors can be improved because only high-quality target behaviors are identified. However, the spatial information is not processed, so when the method is applied to analyzing battlefield target behaviors, the result is deviated, and the requirement cannot be met.
Therefore, there is a need to solve the above problems.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to provide a video target behavior recognition method based on a spatial grid.
The technical scheme is as follows: in order to achieve the above purpose, the invention discloses a video target behavior recognition method based on a spatial grid, which comprises the following steps:
(1) Establishing a data set, wherein the data set comprises the type and the state of a target;
(2) The type and the state of the target in the video frame are identified through a target identification algorithm,
(3) Detecting a moving object of the video frame by a moving object detection algorithm;
(4) Based on spatial grid positioning, the behavior and the action of the target in the video frame are analyzed through target detection and motion detection by combining the peripheral conditions of the grid.
The data set in the step (1) comprises the type and the state of the target, in the process of manufacturing the data set, the data set comprises the type of the target to be identified, and if the target is in a fighting posture, the target is noted to be in a fighting state in the label.
Preferably, the step (2) specifically comprises the following steps:
(2.1) detecting through a multi-scale feature map,
(2.2) directly carrying out classification and regression on the features extracted by carrying out convolution calculation on feature maps with different sizes through a convolution network;
and (2.3) training the network by adopting an a priori box.
Furthermore, the specific steps of detecting through the multi-scale feature map in the step (2.1) are as follows: the neural network structure used for calculation is divided into six layers of feature maps for carrying out image classification and regression, the feature maps of each layer are different in size, the feature map at the front end of the network is larger, the feature maps are smaller in the backward direction along with the addition of the pooling layer, the feature map with the larger scale is used for processing a smaller target, and the feature map with the smaller scale is used for processing a larger target.
Further, the specific steps of the prior frame adopted by the network training in the step (2.3) are as follows:
setting boxes with different sizes and aspect ratios by taking pixels of the feature map as centers, wherein each pixel is provided with a plurality of prior boxes with different sizes and aspect ratios for detecting targets with different sizes and aspect ratios; training a network model by using a prior frame which is most suitable for the detection target in the picture; the size of the prior frame is linearly increased, and the following formula is satisfied:
Figure SMS_1
wherein m is the number of feature maps, the value of m is 5,
Figure SMS_2
representing the ratio of the size of the kth prior box to the picture size,
Figure SMS_3
and
Figure SMS_4
respectively represent
Figure SMS_5
Minimum and maximum values of;
matching the generated prior frame with a real detection target follows 2 criteria, wherein the first criterion is to find the prior frame with the maximum coincidence degree with the real detection target in the picture in the feature map, represent the prior frame by an IOU (input output Unit), and then match the prior frame with the maximum IOU value with the real detection target; the second matching criterion is to avoid that the difference between the number of positive and negative samples is too large, and for the prior frame with the residual IOU value which is not the maximum, if the IOU value of the prior frame and the real target exceeds the set threshold value, the prior frame is considered to be matched with the real target; the final output of the network is the class confidence and position coordinate information of the predicted target, so the loss function is the weighted sum of the class confidence error and the predicted position error of the predicted target:
Figure SMS_6
wherein ,
Figure SMS_7
Figure SMS_8
n: represents the number of positive samples in the prior box;
Figure SMS_9
: only 0 or 1 is obtained, if the value is equal to 1, the jth real target in the representative picture is matched with the ith prior frame, and the type of the real target is p;
c: a confidence level representing a target category;
l: representing a predicted value for a real target;
g: position information representing a real target;
Figure SMS_10
: a category confidence error;
Figure SMS_11
: a predicted position error;
Figure SMS_12
: a loss function;
pos is a positive sample set;
and (4) Neg: a negative sample set;
Figure SMS_13
: the confidence of the ith target class is p, where:
Figure SMS_14
Figure SMS_15
: the coordinates of the center position of the prediction frame, the width of the prediction frame and the height of the prediction frame;
Figure SMS_16
: predicting a predicted value of an mth detection target predicted by a jth prior frame in the image;
Figure SMS_17
: the position of the m detection target predicted by the jth prior frame in the image is calculated according to a formula
Figure SMS_18
Expressed as:
Figure SMS_19
moreover, in the moving object detection in the step (3), the gray value of each pixel point is represented by a plurality of gaussian distributions, and each gaussian distribution function has different weights; if the pixel in the current video frame accords with the established Gaussian model, the pixel is considered as the background, otherwise, the pixel is considered as the foreground; and then updating parameters of the Gaussian model, sequencing different Gaussian distributions according to the priority, and selecting the consistent Gaussian distribution as a background model through a set threshold.
Further, the step (3) is calculated by detecting a moving objectThe method comprises the steps of carrying out moving object detection on a video frame, selecting K Gaussian distribution functions to represent the gray value of each pixel point in an image, and selecting M models as description backgrounds from the K Gaussian distributions; different weight values are given to different Gaussian distributions
Figure SMS_20
Where i represents a different Gaussian distribution, so i ≦ K; selecting proper weight values and threshold values, and when the weight values meet the threshold values, regarding the pixels meeting the Gaussian distribution as backgrounds, and regarding the rest as foregrounds; let the gray-scale value of the pixel value at a certain time t be
Figure SMS_21
Expressing its probability density function as a combination of K gaussian distribution functions, then:
Figure SMS_22
, wherein :
Figure SMS_23
represents the ith weight of the Gaussian mixture model at the time t, and the sum of all the weights is 1;
Figure SMS_24
representing the mean value of the ith model pixel gray value of the mixed Gaussian model at the time t;
Figure SMS_25
representing the covariance of the ith model pixel gray value of the Gaussian mixture model at time t;
Figure SMS_26
representing the gray value of the ith pixel;
arranging the K Gaussian distribution functions in a descending order, and then selecting the first M Gaussian distributions as backgrounds according to a preset threshold; when processing new image, comparing and matching pixel points on the image with the established Gaussian mixture model, if a certain pixel point is matched with the Gaussian mixture modelThe ith Gaussian distribution in the profile satisfies:
Figure SMS_27
wherein
Figure SMS_28
A gray value representing the pixel value at time t;
Figure SMS_29
representing the mean value of the ith model pixel gray value of the Gaussian mixture model at the moment t;
Figure SMS_30
representing the covariance of the ith model pixel gray value of the Gaussian mixture model at time t;
then the point is considered to match the ith gaussian distribution and the successfully matched function is updated with the following parameters:
Figure SMS_31
, wherein ,
Figure SMS_32
(0≦
Figure SMS_33
≦ 1) represents the learning rate, the larger the value, the more frequent the background update in the video;
Figure SMS_34
represents the ith weight of the Gaussian mixture model at the time t, and the sum of all the weights is 1;
Figure SMS_35
representing the mean value of the ith model pixel gray value of the mixed Gaussian model at the time t;
Figure SMS_36
representing the variance of the ith model pixel gray value of the Gaussian mixture model at the moment t;
if the pixel is not matched with the Gaussian distribution function, the parameters of the Gaussian distribution function do not need to be changed, and only the corresponding weight is updated, and the corresponding formula is as follows:
Figure SMS_37
if the pixel is not matched with any Gaussian distribution function corresponding to the pixel, judging the pixel as a foreground, and replacing the Gaussian model with the minimum weight in the established model; the mean value of the replaced new Gaussian function is the gray value of the current pixel; normalizing the weight of the updated background model, and performing Gaussian distribution function according to the weight
Figure SMS_38
The values are sorted in descending order; and then screening the foreground according to the set threshold value T, setting the first M Gaussian distributions meeting the conditions as the background, and setting the rest Gaussian distributions as the foreground.
Preferably, the combat zone corresponding to the high-scale single map range in the step (4) is composed of R × C low-scale single maps, and the combat zone and the combat basic zone are respectively divided, that is, one combat zone includes R × C combat basic zones, and boundary lines of the combat basic zones are connected to form an attack and defense line and a cooperative line; according to the positions of the combat zone and the combat basic zone to which the target belongs, the behavior of the target is judged by combining the sea, land and air environment and the surrounding three-dimensional environment analysis of the grid; if the target is detected at the same position of the previous frame and the current frame and no moving target is detected at the same position, the detected target is in a static state; if the target is detected in the previous frame and the target is detected near the same position in the current frame, and the moving target is detected in the area at the same time, the detected target is in a moving state; if the target is detected to be in a normal state in the previous frame, the target is detected to be in a fighting state in the same position of the current frame, and the moving target is detected to be in the fighting state in the same position, the detected target is in the fighting state; if the target is static or moving in the area, the target is identified as an intrusion behavior; if the target is fighting in our area, it is identified as an attack.
Has the advantages that: compared with the prior art, the invention has the following remarkable advantages: the invention combines the space grid with the target recognition target detection and the motion detection algorithm to analyze the battlefield target behavior, so that the result is more accurate, the speed is higher, and the invention is more suitable for the requirements of the current battlefield.
Drawings
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is a schematic diagram of multi-scale feature map detection according to the present invention;
FIG. 3 is a schematic diagram of action recognition in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
As shown in fig. 1, the invention relates to a video target behavior recognition method based on spatial grid, which comprises the following steps:
(1) Establishing a data set, wherein the data set comprises the type and the state of a target, the data set comprises the type and the state of the target, in the process of manufacturing the data set, the data set comprises the type of the target to be identified, and if the target is in a fighting posture, the target is noted to be in a fighting state in a label;
(2) The method for identifying the type and the state of the target in the video frame through the target identification algorithm specifically comprises the following steps:
(2.1) detecting through a multi-scale feature map, as shown in fig. 2, dividing a neural network structure for calculation into six layers of feature maps for carrying out image classification and regression, wherein the feature maps of each layer are different in size, the feature map at the front end of the network is larger, the feature map is smaller in the backward direction along with the addition of a pooling layer, a smaller target is processed by the feature map with a larger scale, and a larger target is processed by the feature map with a smaller scale;
(2.2) directly carrying out convolution calculation on the feature graphs with different sizes through a convolution network to extract features, and carrying out classification and regression on the features; when a general neural network is used for detecting a target, a convolution network is usually used for extracting the features of a picture, then the extracted features are sent into a full-connection network for classification or regression, but the invention directly uses the convolution network to perform convolution calculation on feature graphs with different sizes and extracts the features for classification and regression, and the method is shown in figure 2;
(2.3) the network training adopts a prior frame, the pixels of the characteristic diagram are taken as the center, the boxes with different sizes and length-width ratios are arranged, and each pixel is provided with a plurality of prior frames with different sizes and length-width ratios for detecting the targets with different sizes and length-width ratios; training a network model by using a prior frame which is most suitable for a detection target in the picture; the size of the prior frame is linearly increased, and the following formula is satisfied:
Figure SMS_39
wherein m is the number of characteristic maps, the value of m is 5,
Figure SMS_40
representing the ratio of the size of the kth prior box to the picture size,
Figure SMS_41
and
Figure SMS_42
respectively represent
Figure SMS_43
Minimum and maximum values of;
matching the generated prior frame with a real detection target follows 2 criteria, wherein the first criterion is to find the prior frame with the maximum coincidence degree with the real detection target in the picture in the feature map, represent the prior frame by an IOU (input output Unit), and then match the prior frame with the maximum IOU value with the real detection target; the second matching criterion is to avoid the difference between the positive and negative sample numbers, and the residual IOU value is not the maximum prior frame, if the IOU value with the real target exceeds the set valueA threshold value, which is also considered to match the prior frame with the real target; the final output of the network is the class confidence and position coordinate information of the predicted target, so the loss function is the weighted sum of the class confidence error and the predicted position error of the predicted target:
Figure SMS_44
wherein ,
Figure SMS_45
Figure SMS_46
n: represents the number of positive samples in the prior box;
Figure SMS_47
: only 0 or 1 is obtained, if the value is equal to 1, the jth real target in the representative picture is matched with the ith prior frame, and the type of the real target is p;
c: a confidence level representing a target category;
l: representing a predicted value for a real target;
g: position information representing a real target;
Figure SMS_48
: a category confidence error;
Figure SMS_49
: a predicted position error;
Figure SMS_50
: a loss function;
pos is a positive sample set;
and (4) Neg: a negative sample set;
Figure SMS_51
: the confidence of the ith target class is p, where:
Figure SMS_52
Figure SMS_53
: the coordinates of the center position of the prediction frame, the width of the prediction frame and the height of the prediction frame;
Figure SMS_54
: predicting a predicted value of an mth detection target predicted by a jth prior frame in the image;
Figure SMS_55
: the position of the m detection target predicted by the jth prior frame in the image is calculated according to a formula
Figure SMS_56
Expressed as:
Figure SMS_57
(3) Performing moving object detection on the video frame through a moving object detection algorithm, wherein in the moving object detection, the gray value of each pixel point is represented by a plurality of Gaussian distributions, and each Gaussian distribution function has different weights; if the pixel in the current video frame accords with the established Gaussian model, the pixel is considered as the background, otherwise, the pixel is considered as the foreground; then updating parameters of the Gaussian model, sequencing different Gaussian distributions according to priorities, and selecting the consistent Gaussian distribution through a set threshold value to serve as a background model;
performing moving object detection on a video frame through a moving object detection algorithm, selecting K Gaussian distribution functions to represent the gray value of each pixel point in an image, and selecting M models as description backgrounds from the K Gaussian distributions; different weight values are given to different Gaussian distributions
Figure SMS_58
Where i represents a different Gaussian distribution, so i ≦ K; selecting proper onesWhen the weight value meets the threshold value, the pixels meeting the Gaussian distribution are considered as the background, and the rest pixels are considered as the foreground; let the gray-scale value of the pixel value at a certain time t be
Figure SMS_59
Expressing its probability density function as a combination of K gaussian distribution functions, then:
Figure SMS_60
, wherein :
Figure SMS_61
represents the ith weight of the Gaussian mixture model at the time t, and the sum of all the weights is 1;
Figure SMS_62
representing the mean value of the ith model pixel gray value of the Gaussian mixture model at the moment t;
Figure SMS_63
representing the covariance of the ith model pixel gray value of the Gaussian mixture model at time t;
Figure SMS_64
representing the gray value of the ith pixel;
arranging the K Gaussian distribution functions in a descending order, and then selecting the first M Gaussian distributions as backgrounds according to a preset threshold; when a new image is processed, comparing and matching pixel points on the image with the established Gaussian mixture model, and if a certain pixel point meets the ith Gaussian distribution in the Gaussian mixture model:
Figure SMS_65
wherein
Figure SMS_66
A gray value representing the pixel value at time t;
Figure SMS_67
representing the mean value of the ith model pixel gray value of the Gaussian mixture model at the moment t;
Figure SMS_68
representing the covariance of the ith model pixel gray value of the Gaussian mixture model at time t;
then the point is considered to match the ith gaussian distribution and the successfully matched function is updated with the following parameters:
Figure SMS_69
, wherein ,
Figure SMS_70
(0≦
Figure SMS_71
≦ 1) represents the learning rate, the larger the value, the more frequent the background update in the video;
Figure SMS_72
represents the ith weight of the Gaussian mixture model at the time t, and the sum of all the weights is 1;
Figure SMS_73
representing the mean value of the ith model pixel gray value of the Gaussian mixture model at the moment t;
Figure SMS_74
representing the variance of the ith model pixel gray value of the Gaussian mixture model at the moment t;
if the pixel is not matched with the Gaussian distribution function, the parameters of the Gaussian distribution function do not need to be changed, and only the corresponding weight is updated, wherein the corresponding formula is as follows:
Figure SMS_75
if the pixel is not matched with any Gaussian distribution function corresponding to the pixel, judging the pixel as a foreground, and replacing the Gaussian model with the minimum weight in the established model; the mean value of the replaced new Gaussian function is the gray value of the current pixel; normalizing the weight of the updated background model, and performing Gaussian distribution function according to the weight
Figure SMS_76
The values are sorted in descending order; and then screening the foreground according to the set threshold value T, setting the front M Gaussian distributions meeting the conditions as the background, and setting the rest Gaussian distributions as the foreground.
(4) Based on spatial grid positioning, combining the peripheral conditions of grids, and analyzing the behavior and the action of a target in a video frame through target detection and motion detection;
the method comprises the following steps of 1, forming a combat region corresponding to the range of 100 ten thousand single maps by 144-frame 1; according to the positions of the combat zone and the combat basic zone to which the target belongs, the behavior of the target is judged by combining the sea, land and air environment and the surrounding three-dimensional environment analysis of the grid; the schematic diagram of motion recognition is shown in fig. 3, if an object is detected at the same position in the previous frame and the current frame and no moving object is detected at the same position, the detected object is in a static state; if the target is detected in the previous frame and the target is detected near the same position in the current frame, and the moving target is detected in the area at the same time, the detected target is in a moving state; if the target is detected to be in a normal state in the previous frame, the target is detected to be in a fighting state in the same position of the current frame, and the moving target is detected to be in the same position, the detected target is in the fighting state; if the target is static or moving in the area, the target is identified as an intrusion behavior; if the target fights in the area, the target is identified as an attack.

Claims (8)

1. A video target behavior identification method based on spatial grids is characterized by comprising the following steps:
(1) Establishing a data set, wherein the data set comprises the type and the state of a target;
(2) The type and the state of the target in the video frame are identified through a target identification algorithm,
(3) Detecting a moving object of the video frame by a moving object detection algorithm;
(4) Based on spatial grid positioning, the behavior and the action of the target in the video frame are analyzed through target detection and motion detection in combination with the peripheral situation of the grid.
2. The method for identifying the behavior of the video target based on the spatial grid as claimed in claim 1, wherein: the data set in the step (1) comprises the type and the state of the target, in the process of manufacturing the data set, the data set comprises the type of the target to be identified, and if the target is in a fighting posture, the target is noted to be in a fighting state in the label.
3. The method of claim 2, wherein the video object behavior recognition based on spatial grid is characterized in that: the step (2) specifically comprises the following steps:
(2.1) detecting through a multi-scale feature map,
(2.2) directly carrying out classification and regression on the features extracted by carrying out convolution calculation on feature maps with different sizes through a convolution network;
and (2.3) training the network by adopting an a priori box.
4. The method according to claim 3, wherein the video target behavior recognition method based on the spatial grid is characterized in that: the specific steps of detecting through the multi-scale characteristic diagram in the step (2.1) are as follows: the neural network structure used for calculation is divided into six layers of feature maps for carrying out image classification and regression, the feature maps of each layer are different in size, the feature map at the front end of the network is larger, the feature maps are smaller in the backward direction along with the addition of the pooling layer, the feature map with the larger scale is used for processing a smaller target, and the feature map with the smaller scale is used for processing a larger target.
5. The method according to claim 4, wherein the video target behavior recognition method based on the spatial grid is characterized in that: the specific steps of the network training in the step (2.3) adopting the prior frame are as follows:
setting boxes with different sizes and aspect ratios by taking pixels of the feature map as centers, wherein each pixel is provided with a plurality of prior boxes with different sizes and aspect ratios for detecting targets with different sizes and aspect ratios; training a network model by using a prior frame which is most suitable for a detection target in the picture; the size of the prior frame is linearly increased, and the following formula is satisfied:
Figure QLYQS_1
wherein m is the number of feature maps, the value of m is 5,
Figure QLYQS_2
representing the ratio of the size of the kth prior box to the picture size,
Figure QLYQS_3
and
Figure QLYQS_4
respectively represent
Figure QLYQS_5
Minimum and maximum values of;
matching the generated prior frame with a real detection target follows 2 criteria, wherein the first criterion is to find the prior frame with the maximum coincidence degree with the real detection target in the picture in the feature map, represent the prior frame by an IOU (input output Unit), and then match the prior frame with the maximum IOU value with the real detection target; the second matching criterion is to avoid the difference between the number of positive and negative samples, and the residual IOU value is not the maximum prior frame, if the IOU value with the real target exceeds the set thresholdThe prior frame is also considered to match the real target; the final output of the network is the class confidence and position coordinate information of the predicted target, so the loss function is the weighted sum of the class confidence error and the predicted position error of the predicted target:
Figure QLYQS_6
wherein ,
Figure QLYQS_7
Figure QLYQS_8
n: represents the number of positive samples in the prior box;
Figure QLYQS_9
: only 0 or 1 is obtained, if the value is equal to 1, the jth real target in the representative picture is matched with the ith prior frame, and the type of the real target is p;
c: a confidence level representing a target category;
l: representing a predicted value for a real target;
g: position information representing a real target;
Figure QLYQS_10
: a category confidence error;
Figure QLYQS_11
: a predicted position error;
Figure QLYQS_12
: a loss function;
pos is a positive sample set;
and (4) Neg: a negative sample set;
Figure QLYQS_13
: the confidence of the ith target class is p, where:
Figure QLYQS_14
Figure QLYQS_15
: the coordinates of the center position of the prediction frame, the width of the prediction frame and the height of the prediction frame;
Figure QLYQS_16
: predicting the predicted value of the mth detection target predicted by the jth prior frame in the image;
Figure QLYQS_17
: the position of the m detection target predicted by the jth prior frame in the image is calculated according to a formula
Figure QLYQS_18
Expressed as:
Figure QLYQS_19
6. the method according to claim 5, wherein the video target behavior recognition method based on the spatial grid is characterized in that: in the moving object detection in the step (3), the gray value of each pixel point is represented by a plurality of Gaussian distributions, and each Gaussian distribution function has different weights; if the pixel in the current video frame accords with the established Gaussian model, the pixel is considered as the background, otherwise, the pixel is considered as the foreground; and then updating parameters of the Gaussian model, sequencing different Gaussian distributions according to the priority, and selecting the consistent Gaussian distribution as a background model through a set threshold.
7. According to claim6 the video target behavior recognition method based on the spatial grid is characterized in that: in the step (3), moving object detection is performed on the video frame through a moving object detection algorithm, K Gaussian distribution functions are selected to represent the gray value of each pixel point in an image, and M models are selected from the K Gaussian distributions to be used as models for describing the background; different weight values are given to different Gaussian distributions
Figure QLYQS_20
Where i represents a different Gaussian distribution, so i ≦ K; selecting proper weight values and threshold values, and when the weight values meet the threshold values, regarding the pixels meeting the Gaussian distribution as backgrounds, and regarding the rest of the pixels as foregrounds; let the gray-scale value of the pixel value at a certain time t be
Figure QLYQS_21
Expressing its probability density function as a combination of K gaussian distribution functions, then:
Figure QLYQS_22
, wherein :
Figure QLYQS_23
represents the ith weight of the Gaussian mixture model at the time t, and the sum of all the weights is 1;
Figure QLYQS_24
representing the mean value of the ith model pixel gray value of the Gaussian mixture model at the moment t;
Figure QLYQS_25
representing the covariance of the ith model pixel gray value of the Gaussian mixture model at time t;
Figure QLYQS_26
representing the gray value of the ith pixel;
arranging the K Gaussian distribution functions in a descending order, and then selecting the first M Gaussian distributions as backgrounds according to a preset threshold; when a new image is processed, comparing and matching pixel points on the image with the established Gaussian mixture model, and if a certain pixel point meets the ith Gaussian distribution in the Gaussian mixture model:
Figure QLYQS_27
wherein
Figure QLYQS_28
A gray value representing the pixel value at time t;
Figure QLYQS_29
representing the mean value of the ith model pixel gray value of the Gaussian mixture model at the moment t;
Figure QLYQS_30
representing the covariance of the ith model pixel gray value of the Gaussian mixture model at time t;
then the point is considered to match the ith gaussian distribution and the successfully matched function is updated with the following parameters:
Figure QLYQS_31
, wherein ,
Figure QLYQS_32
(0≦
Figure QLYQS_33
≦ 1) represents the learning rate, the larger the value, the more frequent the background update in the video;
Figure QLYQS_34
represents the ith weight of the Gaussian mixture model at time t, and all weightsThe sum is 1;
Figure QLYQS_35
representing the mean value of the ith model pixel gray value of the Gaussian mixture model at the moment t;
Figure QLYQS_36
representing the variance of the ith model pixel gray value of the Gaussian mixture model at the moment t;
if the pixel is not matched with the Gaussian distribution function, the parameters of the Gaussian distribution function do not need to be changed, and only the corresponding weight is updated, wherein the corresponding formula is as follows:
Figure QLYQS_37
if the pixel is not matched with any Gaussian distribution function corresponding to the pixel, judging the pixel as a foreground, and replacing the Gaussian model with the minimum weight in the established model; the mean value of the replaced new Gaussian function is the gray value of the current pixel; normalizing the weight of the updated background model, and performing Gaussian distribution function according to the weight
Figure QLYQS_38
The values are sorted in descending order; and then screening the foreground according to the set threshold value T, setting the front M Gaussian distributions meeting the conditions as the background, and setting the rest Gaussian distributions as the foreground.
8. The method according to claim 7, wherein the video target behavior recognition method based on spatial grid is characterized in that: the combat zone corresponding to the high-proportion single map range in the step (4) consists of R, C and low-proportion single maps, and the combat zone and the combat basic zone are respectively divided, namely one combat zone comprises R, C combat basic zones, and boundary lines of the combat basic zones are connected to form an attack and defense line and a cooperative line; according to the positions of the combat zone and the combat basic zone to which the target belongs, the behavior of the target is judged by combining the sea, land and air environment and the surrounding three-dimensional environment analysis of the grid; if the target is detected at the same position of the previous frame and the current frame and no moving target is detected at the same position, the detected target is in a static state; if the target is detected in the previous frame and the target is detected near the same position in the current frame, and the moving target is detected in the area at the same time, the detected target is in a moving state; if the target is detected to be in a normal state in the previous frame, the target is detected to be in a fighting state in the same position of the current frame, and the moving target is detected to be in the same position, the detected target is in the fighting state; if the target is static or moving in the area, the target is identified as an intrusion behavior; if the target fights in the area, the target is identified as an attack.
CN202310047339.6A 2023-01-31 2023-01-31 Video target behavior recognition method based on space grid Active CN115830515B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310047339.6A CN115830515B (en) 2023-01-31 2023-01-31 Video target behavior recognition method based on space grid

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310047339.6A CN115830515B (en) 2023-01-31 2023-01-31 Video target behavior recognition method based on space grid

Publications (2)

Publication Number Publication Date
CN115830515A true CN115830515A (en) 2023-03-21
CN115830515B CN115830515B (en) 2023-05-02

Family

ID=85520637

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310047339.6A Active CN115830515B (en) 2023-01-31 2023-01-31 Video target behavior recognition method based on space grid

Country Status (1)

Country Link
CN (1) CN115830515B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103258332A (en) * 2013-05-24 2013-08-21 浙江工商大学 Moving object detection method resisting illumination variation
CN111477034A (en) * 2020-03-16 2020-07-31 中国电子科技集团公司第二十八研究所 Large-scale airspace use plan conflict detection and release method based on grid model
CN112070035A (en) * 2020-09-11 2020-12-11 联通物联网有限责任公司 Target tracking method and device based on video stream and storage medium
CN115098993A (en) * 2022-05-16 2022-09-23 南京航空航天大学 Unmanned aerial vehicle conflict detection method and device for airspace digital grid and storage medium
CN115493591A (en) * 2022-06-13 2022-12-20 中国人民解放军海军航空大学 Multi-route planning method
CN115578668A (en) * 2022-09-15 2023-01-06 浙江大华技术股份有限公司 Target behavior recognition method, electronic device, and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103258332A (en) * 2013-05-24 2013-08-21 浙江工商大学 Moving object detection method resisting illumination variation
CN111477034A (en) * 2020-03-16 2020-07-31 中国电子科技集团公司第二十八研究所 Large-scale airspace use plan conflict detection and release method based on grid model
CN112070035A (en) * 2020-09-11 2020-12-11 联通物联网有限责任公司 Target tracking method and device based on video stream and storage medium
CN115098993A (en) * 2022-05-16 2022-09-23 南京航空航天大学 Unmanned aerial vehicle conflict detection method and device for airspace digital grid and storage medium
CN115493591A (en) * 2022-06-13 2022-12-20 中国人民解放军海军航空大学 Multi-route planning method
CN115578668A (en) * 2022-09-15 2023-01-06 浙江大华技术股份有限公司 Target behavior recognition method, electronic device, and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
机器学习算法那些事: "目标检测|SSD原理与实现" *
杨超宇: "基于计算机视觉的目标检测跟踪及特征分类研究" *

Also Published As

Publication number Publication date
CN115830515B (en) 2023-05-02

Similar Documents

Publication Publication Date Title
CN109766830B (en) Ship target identification system and method based on artificial intelligence image processing
CN111914664A (en) Vehicle multi-target detection and track tracking method based on re-identification
CN111259930A (en) General target detection method of self-adaptive attention guidance mechanism
CN111767882A (en) Multi-mode pedestrian detection method based on improved YOLO model
CN107633226B (en) Human body motion tracking feature processing method
CN113034548A (en) Multi-target tracking method and system suitable for embedded terminal
CN113221787B (en) Pedestrian multi-target tracking method based on multi-element difference fusion
CN106933816A (en) Across camera lens object retrieval system and method based on global characteristics and local feature
CN112836639A (en) Pedestrian multi-target tracking video identification method based on improved YOLOv3 model
CN112949572A (en) Slim-YOLOv 3-based mask wearing condition detection method
CN112818905B (en) Finite pixel vehicle target detection method based on attention and spatio-temporal information
CN110334584A (en) A kind of gesture identification method based on the full convolutional network in region
CN111274964B (en) Detection method for analyzing water surface pollutants based on visual saliency of unmanned aerial vehicle
CN110633727A (en) Deep neural network ship target fine-grained identification method based on selective search
CN117333948A (en) End-to-end multi-target broiler behavior identification method integrating space-time attention mechanism
CN116740652A (en) Method and system for monitoring rust area expansion based on neural network model
CN113095332B (en) Saliency region detection method based on feature learning
CN116309270A (en) Binocular image-based transmission line typical defect identification method
CN115272778A (en) Recyclable garbage classification method and system based on RPA and computer vision
CN115439926A (en) Small sample abnormal behavior identification method based on key region and scene depth
CN115830515B (en) Video target behavior recognition method based on space grid
CN114943873A (en) Method and device for classifying abnormal behaviors of construction site personnel
CN114170625A (en) Context-aware and noise-robust pedestrian searching method
CN114581769A (en) Method for identifying houses under construction based on unsupervised clustering
CN111046861B (en) Method for identifying infrared image, method for constructing identification model and application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant