CN115830515B - Video target behavior recognition method based on space grid - Google Patents

Video target behavior recognition method based on space grid Download PDF

Info

Publication number
CN115830515B
CN115830515B CN202310047339.6A CN202310047339A CN115830515B CN 115830515 B CN115830515 B CN 115830515B CN 202310047339 A CN202310047339 A CN 202310047339A CN 115830515 B CN115830515 B CN 115830515B
Authority
CN
China
Prior art keywords
target
gaussian
frame
pixel
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310047339.6A
Other languages
Chinese (zh)
Other versions
CN115830515A (en
Inventor
施晓东
徐俊瑜
刘佳
韩东
谢诏光
孙镱诚
陆中祥
丁阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 28 Research Institute
Original Assignee
CETC 28 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 28 Research Institute filed Critical CETC 28 Research Institute
Priority to CN202310047339.6A priority Critical patent/CN115830515B/en
Publication of CN115830515A publication Critical patent/CN115830515A/en
Application granted granted Critical
Publication of CN115830515B publication Critical patent/CN115830515B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a video target behavior recognition method based on a space grid, which comprises the steps of establishing a data set, wherein the data set comprises the types and states of targets; identifying the type and state of a target in the video frame through a target identification algorithm, and detecting the moving target of the video frame through a moving target detection algorithm; based on space grid positioning, and combining with grid surrounding conditions, analyzing the behavior and action of the target in the video frame through target detection and motion detection. The invention aims at a scene which is based on the video target behavior identification of a space grid. The motion of the target is identified through target detection and identification and moving target detection, and the behavior of the target is identified through space grid positioning and combining with grid surrounding conditions.

Description

Video target behavior recognition method based on space grid
Technical Field
The invention relates to the field of geographic space rasterization processing and situation awareness, in particular to a video target behavior recognition method based on a space grid.
Background
Video object behavior identification based on spatial grid is a very useful method to study battlefield object behavior. By analyzing the behavior of the battlefield target for a long time, the obtained data is more scientific, more objective and more reference value. Although many researches and innovations are carried out on the current behavior analysis and identification method, most of the current behavior analysis and identification method is based on the traditional evaluation method, and the actual use requirements are not met in terms of accuracy, timeliness and practicability.
For example, CN111222487a discloses a video target behavior recognition method and an electronic device, the method includes: acquiring a video to be identified, wherein the video comprises an image frame of the video to be identified; acquiring one or more local target images through a target detection model; matching the acquired local target images through a target tracking model to acquire one or more target image sequences; performing quality scoring on the target image behaviors in each target image sequence through a target behavior quality scoring model to obtain a high-quality target image subsequence; and performing behavior recognition on the obtained high-quality target image subsequence through a behavior recognition model to obtain a behavior recognition result. According to the method, behavior recognition is only carried out on a high-quality target image subsequence in the video target image sequence, so that on one hand, the influence of a low-quality target behavior recognition result on the whole video target behavior recognition result is eliminated; on the other hand, since only high-quality target behaviors are identified, video target behavior identification efficiency can be improved. However, the spatial information is not processed, so that when the spatial information is applied to analysis of battlefield target behaviors, the result is deviated and the requirement cannot be met.
There is therefore a need to solve the above problems.
Disclosure of Invention
The invention aims to: the first object of the invention is to provide a video target behavior recognition method based on a space grid, which recognizes the action of a target through target detection recognition and moving target detection, recognizes the behavior of the target through space grid positioning in combination with grid surrounding conditions, analyzes the behavior of the target in a battlefield, enables the result to be more accurate and faster, and is more suitable for the requirements of the current battlefield.
The technical scheme is as follows: in order to achieve the above purpose, the invention discloses a video target behavior recognition method based on a space grid, which comprises the following steps:
(1) Establishing a data set, wherein the data set contains the type and the state of the target;
(2) The type and state of the object in the video frame are identified by the object identification algorithm,
(3) Detecting a moving target of the video frame through a moving target detection algorithm;
(4) Based on space grid positioning, and combining with grid surrounding conditions, analyzing the behavior and action of the target in the video frame through target detection and motion detection.
Wherein, in the step (1), the data set contains the type and state of the target, and in the process of making the data set, the data set contains the type of the target to be identified, and if the target is in a combat attitude, the target is noted in the tag to be in the combat state.
Preferably, the step (2) specifically includes the following steps:
(2.1) detecting by a multi-scale feature map,
(2.2) classifying and regressing the features extracted by the convolution calculation of the feature graphs with different sizes directly through a convolution network;
(2.3) network training employs a priori blocks.
Furthermore, the specific steps of the detection in the step (2.1) through the multi-scale feature map are as follows: the neural network structure used for calculation is divided into six layers of characteristic diagrams for classifying and regressing pictures, the characteristic diagrams of each layer are different in size, the characteristic diagram at the front end of the network is larger, the characteristic diagram is smaller after the pooling layer is added, the characteristic diagram with larger scale is used for processing smaller targets, and the characteristic diagram with smaller scale is used for processing larger targets.
Further, the specific steps of adopting a priori frame for network training in the step (2.3) are as follows:
setting boxes with different sizes and length-width ratios by taking pixels of the feature map as centers, wherein each pixel is provided with a plurality of prior boxes with different sizes and length-width ratios for detecting targets with different sizes and length-width ratios; the detection target in the picture can use the prior frame most suitable for the detection target to train the network model; the size of the prior frame linearly increases, satisfying the following formula:
Figure SMS_1
wherein m is the number of feature graphs, and m has a value of 5, < >>
Figure SMS_2
Representing the proportion of the size of the kth a priori frame to the picture size, +.>
Figure SMS_3
and
Figure SMS_4
Respectively indicate->
Figure SMS_5
Minimum and maximum values of (2);
the generated prior frame and the real detection target match follow 2 criteria, wherein the first criterion is that a prior frame with the largest coincidence degree with the real detection target in the picture in the feature map is found first and is represented by an IOU, and then the prior frame with the largest IOU value is matched with the real detection target; the second matching criterion is to avoid that the number of positive and negative samples is too large, and for the remaining prior frame with the IOU value not being the maximum, if the IOU value of the real target exceeds a set threshold value, the prior frame is considered to be matched with the real target; the network finally outputs the category confidence and position coordinate information of the predicted target, so the loss function is a weighted sum of the category confidence error and the predicted position error of the predicted target:
Figure SMS_6
wherein ,
Figure SMS_7
Figure SMS_8
n: representing the number of positive samples in the a priori block;
Figure SMS_9
: only 0 or 1, if 1, the j-th real object in the representing picture is matched with the i-th prior frame, and the type of the real object is p;
c: confidence representing the target class;
l: representing predicted values for real targets;
g: position information representing a real target;
Figure SMS_10
: category confidence error;
Figure SMS_11
: predicted position error;
Figure SMS_12
: a loss function;
pos, positive sample set;
neg: a negative sample set;
Figure SMS_13
: the z-th target class is the confidence of p, where:
Figure SMS_14
Figure SMS_15
: the central position coordinates of the prediction frame, the width of the prediction frame and the height of the prediction frame;
Figure SMS_16
: the e-th detection of the i-th prior frame prediction in an imageA predicted value of the target;
Figure SMS_17
: the position of the e-th detection target predicted by the i-th prior frame in the image, the calculation formula +.>
Figure SMS_18
Expressed as:
Figure SMS_19
in the moving object detection in the step (3), the gray value of each pixel point is represented by a plurality of gaussian distributions, and each gaussian distribution function has different weights; if a pixel in the current video frame meets the established Gaussian model, the pixel is considered to be background, otherwise the pixel is considered to be foreground; and then updating parameters of the Gaussian model, sorting different Gaussian distributions according to priority, and selecting the Gaussian distribution which accords with the priority through a set threshold value to serve as a background model.
In the step (3), moving object detection is performed on the video frame through a moving object detection algorithm, K Gaussian distribution functions are selected to represent gray values of all pixel points in an image, and M models which describe the background are selected from the K Gaussian distribution functions; different gaussian distributions give different weights
Figure SMS_20
Wherein a represents different Gaussian distributions, so a+.K; selecting proper weights and thresholds, and when the weights meet the thresholds, considering pixels meeting the Gaussian distribution as the background, and considering the rest pixels as the foreground; let the gray value of the pixel value at a certain instant t be +.>
Figure SMS_21
The probability density function is represented by a combination of K gaussian distribution functions, then:
Figure SMS_22
, wherein :
Figure SMS_23
Representing the weights of the a-th mixed Gaussian model at the time t, and the sum of all the weights is 1;
Figure SMS_24
representing the mean value of the pixel gray values of the a-th mixed Gaussian model at the moment t;
Figure SMS_25
representing a covariance matrix of an a-th mixed Gaussian model at a moment t;
Figure SMS_26
representing the pixel gray value of the a-th mixed Gaussian model at the moment t;
the K Gaussian distribution functions are arranged in a descending order, and then the first M Gaussian distributions serving as the background are selected according to a preset threshold value; when a new image is processed, comparing and matching pixel points on the image with the established Gaussian mixture model, and if a certain pixel point and an a-th Gaussian distribution in the Gaussian mixture model meet the following conditions:
Figure SMS_27
wherein
Figure SMS_28
A gradation value representing the pixel value at time t;
Figure SMS_29
representing the standard deviation of the pixel gray values of the a-th mixed Gaussian model at the moment t; />
Then the point is considered to match the a-th gaussian distribution and then the successfully matched function is updated with the following parameters:
Figure SMS_30
, wherein ,
Figure SMS_31
Representing learning rate, 0 +.>
Figure SMS_32
The larger the value is, the larger the frequency of background updating in the video is;
Figure SMS_33
representing the variance of the pixel gray values of the a-th mixed Gaussian model at the moment t;
if the pixel is not matched with the Gaussian distribution function, the Gaussian distribution function parameter is not changed, only the corresponding weight is updated, and the corresponding formula is as follows:
Figure SMS_34
if the pixel is not matched with any Gaussian distribution function corresponding to the pixel, judging the pixel point as a foreground, and replacing a Gaussian model with the minimum weight in the established model; the mean value of the replaced new Gaussian function is the gray value of the current pixel; normalizing the weight of the updated background model, and making each Gaussian distribution function according to +.>
Figure SMS_35
The values are arranged in descending order; and then screening the foreground according to the set threshold T, setting the first M Gaussian distributions meeting the conditions as the background, and setting the rest Gaussian distributions as the foreground.
Preferably, in the step (4), the combat zone corresponding to the high-proportion single-frame map range is formed by dividing an r×c low-proportion single-frame map into a combat zone and a combat basic zone, i.e. one combat zone comprises r×c combat basic zones, and boundary lines of the combat basic zones are connected to form an attack line and a coordinate line; judging the behavior of the target according to the positions of the battle area and the battle basic area to which the target belongs and by combining the analysis of the sea, land, air and surrounding three-dimensional environments of the grid; if the object is detected at the same position of the previous frame and the current frame and no moving object is detected at the same position, the detected object is in a stationary state; if a target is detected in the previous frame and a target is detected near the same position in the current frame and a moving target is detected at the same time in the region, the detected target is in a moving state; if the previous frame detects that the target is in a normal state, the current frame detects that the target is in a fight state at the same position, and detects that the moving target is in a fight state at the same position; if the target is static or moving in the I area, the target is characterized as an intrusion behavior; if the target is combat in I area, then it is characterized as an attack.
The beneficial effects are that: compared with the prior art, the invention has the following remarkable advantages: according to the invention, through the combination of the space grid and the target recognition target detection and motion detection algorithm, the battlefield target behavior is analyzed, so that the result is more accurate, the speed is faster, and the method is more suitable for the requirements of the current battlefield.
Drawings
FIG. 1 is a schematic flow chart of the present invention;
FIG. 2 is a schematic diagram of a multi-scale feature map detection in accordance with the present invention;
FIG. 3 is a schematic diagram of motion recognition according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
As shown in fig. 1, the method for identifying video target behavior based on spatial grid of the present invention comprises the following steps:
(1) Establishing a data set, wherein the data set comprises the type and the state of the target, and in the process of manufacturing the data set, the data set comprises the type of the target to be identified, and if the target is in a combat attitude, the target is noted to be in the combat state in the tag;
(2) The method for identifying the type and the state of the target in the video frame through the target identification algorithm specifically comprises the following steps:
(2.1) detecting through a multi-scale feature map, as shown in fig. 2, dividing the neural network structure for calculation into six layers of feature maps for classifying and regressing pictures, wherein the feature maps of each layer are different in size, the feature map at the front end of the network is larger, the smaller the later the feature map is, the smaller the size of the feature map is, the larger the size of the feature map is, and the larger the size of the feature map is, so that the larger the size of the feature map is;
(2.2) classifying and regressing the features extracted by the convolution calculation of the feature graphs with different sizes directly through a convolution network; in general, when a neural network is used for detecting a target, a convolutional network is generally used for extracting the characteristics of a picture, and then the extracted characteristics are sent into a fully-connected network for classification or regression;
(2.3) adopting a priori frame for network training, wherein the specific steps of adopting the priori frame for network training in the step (2.3) are as follows:
setting boxes with different sizes and length-width ratios by taking pixels of the feature map as centers, wherein each pixel is provided with a plurality of prior boxes with different sizes and length-width ratios for detecting targets with different sizes and length-width ratios; the detection target in the picture can use the prior frame most suitable for the detection target to train the network model; the size of the prior frame linearly increases, satisfying the following formula:
Figure SMS_36
wherein m is the number of feature graphs, and m has a value of 5, < >>
Figure SMS_37
Representing the proportion of the size of the kth a priori frame to the picture size, +.>
Figure SMS_38
and
Figure SMS_39
Respectively indicate->
Figure SMS_40
Minimum and maximum values of (2);
the generated prior frame and the real detection target match follow 2 criteria, wherein the first criterion is that a prior frame with the largest coincidence degree with the real detection target in the picture in the feature map is found first and is represented by an IOU, and then the prior frame with the largest IOU value is matched with the real detection target; the second matching criterion is to avoid that the number of positive and negative samples is too large, and for the remaining prior frame with the IOU value not being the maximum, if the IOU value of the real target exceeds a set threshold value, the prior frame is considered to be matched with the real target; the network finally outputs the category confidence and position coordinate information of the predicted target, so the loss function is a weighted sum of the category confidence error and the predicted position error of the predicted target:
Figure SMS_41
wherein ,
Figure SMS_42
Figure SMS_43
n: representing the number of positive samples in the a priori block;
Figure SMS_44
: only 0 or 1, if 1, the j-th real object in the representing picture is matched with the i-th prior frame, and the type of the real object is p;
c: confidence representing the target class;
l: representing predicted values for real targets;
g: position information representing a real target;
Figure SMS_45
: category confidence error;
Figure SMS_46
: predicted position error;
Figure SMS_47
: a loss function;
pos, positive sample set;
neg: a negative sample set;
Figure SMS_48
: the z-th target class is the confidence of p, where:
Figure SMS_49
Figure SMS_50
: the central position coordinates of the prediction frame, the width of the prediction frame and the height of the prediction frame;
Figure SMS_51
: a predicted value of an e-th detection target predicted by an i-th prior frame in the image;
Figure SMS_52
: the position of the e-th detection target predicted by the i-th prior frame in the image, the calculation formula +.>
Figure SMS_53
Expressed as:
Figure SMS_54
(3) Detecting a moving target of the video frame through a moving target detection algorithm, wherein in the moving target detection, the gray value of each pixel point is represented by a plurality of Gaussian distribution functions, and each Gaussian distribution function has different weights; if a pixel in the current video frame meets the established Gaussian model, the pixel is considered to be background, otherwise the pixel is considered to be foreground; then, updating parameters of the Gaussian model, sorting different Gaussian distributions according to priority, and selecting the Gaussian distribution which accords with the priority through a set threshold value to serve as a background model;
in the step (3), moving object detection is carried out on the video frame through a moving object detection algorithm, K Gaussian distribution functions are selected to represent gray values of all pixel points in an image, and M models which describe the background are selected from the K Gaussian distribution functions; different gaussian distributions give different weights
Figure SMS_55
Wherein a represents different Gaussian distributions, so a+.K; selecting proper weights and thresholds, and when the weights meet the thresholds, considering pixels meeting the Gaussian distribution as the background, and considering the rest pixels as the foreground; let the gray value of the pixel value at a certain instant t be +.>
Figure SMS_56
The probability density function is represented by a combination of K gaussian distribution functions, then:
Figure SMS_57
, wherein :
Figure SMS_58
Representing the weights of the a-th mixed Gaussian model at the time t, and the sum of all the weights is 1;
Figure SMS_59
representing the mean value of the pixel gray values of the a-th mixed Gaussian model at the moment t;
Figure SMS_60
representing a covariance matrix of an a-th mixed Gaussian model at a moment t;
Figure SMS_61
representing the pixel gray value of the a-th mixed Gaussian model at the moment t; />
The K Gaussian distribution functions are arranged in a descending order, and then the first M Gaussian distributions serving as the background are selected according to a preset threshold value; when a new image is processed, comparing and matching pixel points on the image with the established Gaussian mixture model, and if a certain pixel point and an a-th Gaussian distribution in the Gaussian mixture model meet the following conditions:
Figure SMS_62
wherein
Figure SMS_63
A gradation value representing the pixel value at time t;
Figure SMS_64
representing the standard deviation of the pixel gray values of the a-th mixed Gaussian model at the moment t;
then the point is considered to match the a-th gaussian distribution and then the successfully matched function is updated with the following parameters:
Figure SMS_65
, wherein ,
Figure SMS_66
Representing learning rate, 0 +.>
Figure SMS_67
The larger the value is, the larger the frequency of background updating in the video is;
Figure SMS_68
representing the variance of the pixel gray values of the a-th mixed Gaussian model at the moment t;
if the pixel is not matched with the Gaussian distribution function, the Gaussian distribution function parameter is not changed, only the corresponding weight is updated, and the corresponding formula is as follows:
Figure SMS_69
if the pixel is not matched with any Gaussian distribution function corresponding to the pixel, judging the pixel point as a foreground, and replacing a Gaussian model with the minimum weight in the established model; the mean value of the replaced new Gaussian function is the gray value of the current pixel; normalizing the weight of the updated background model, and making each Gaussian distribution function according to +.>
Figure SMS_70
The values are arranged in descending order; and then screening the foreground according to the set threshold T, setting the first M Gaussian distributions meeting the conditions as the background, and setting the rest Gaussian distributions as the foreground.
(4) Based on space grid positioning, and combining with grid surrounding conditions, analyzing the behavior and action of a target in a video frame through target detection and motion detection;
the combat zone corresponding to the map range of 1:100 tens of thousands of units consists of 144 maps of 1:10 tens of thousands of units, and a combat zone and a combat basic zone are respectively marked, namely, one combat zone comprises 144 combat basic zones, and boundary lines of the combat basic zones are connected to form attack and defense lines and a cooperative line; judging the behavior of the target according to the positions of the battle area and the battle basic area to which the target belongs and by combining the analysis of the sea, land, air and surrounding three-dimensional environments of the grid; the motion recognition schematic diagram is shown in fig. 3, if the object is detected at the same position of the previous frame and the current frame and no moving object is detected at the same position, the detected object is in a static state; if a target is detected in the previous frame and a target is detected near the same position in the current frame and a moving target is detected at the same time in the region, the detected target is in a moving state; if the previous frame detects that the target is in a normal state, the current frame detects that the target is in a fight state at the same position, and detects that the moving target is in a fight state at the same position; if the target is static or moving in the I area, the target is characterized as an intrusion behavior; if the target is combat in I area, then it is characterized as an attack.

Claims (7)

1. The method for identifying the video target behavior based on the space grid is characterized by comprising the following steps of:
(1) Establishing a data set, wherein the data set contains the type and the state of the target;
(2) The type and state of the object in the video frame are identified by the object identification algorithm,
(3) Detecting a moving target of the video frame through a moving target detection algorithm;
(4) Based on space grid positioning, and combining with grid surrounding conditions, analyzing the behavior and action of a target in a video frame through target detection and motion detection;
in the step (4), a combat zone and a combat basic zone are respectively marked by an R-C single map with a low proportion corresponding to the high proportion single map range, namely, one combat zone comprises R-C combat basic zones, and boundary lines of the combat basic zones are connected to form attack and defense lines and cooperation lines; judging the behavior of the target according to the positions of the battle area and the battle basic area to which the target belongs and by combining the analysis of the sea, land, air and surrounding three-dimensional environments of the grid; if the object is detected at the same position of the previous frame and the current frame and no moving object is detected at the same position, the detected object is in a stationary state; if a target is detected in the previous frame and a target is detected near the same position in the current frame and a moving target is detected at the same time in the region, the detected target is in a moving state; if the previous frame detects that the target is in a normal state, the current frame detects that the target is in a fight state at the same position, and detects that the moving target is in a fight state at the same position; if the target is static or moving in the I area, the target is characterized as an intrusion behavior; if the target is combat in I area, then it is characterized as an attack.
2. The method for identifying video object behaviors based on spatial grid according to claim 1, wherein: in the step (1), the data set includes the type and state of the target, and in the process of making the data set, the data set includes the type of the target to be identified, and if the target is in a combat attitude, the target is noted in the tag to be in the combat state.
3. The method for identifying video object behaviors based on spatial grid according to claim 2, wherein: the step (2) specifically comprises the following steps:
(2.1) detecting by a multi-scale feature map,
(2.2) classifying and regressing the features extracted by the convolution calculation of the feature graphs with different sizes directly through a convolution network;
(2.3) network training employs a priori blocks.
4. A method for identifying video object behaviors based on a spatial grid according to claim 3, wherein: the specific steps of the detection in the step (2.1) through the multi-scale feature map are as follows: the neural network structure used for calculation is divided into six layers of feature graphs for classifying and regressing pictures, the feature graphs of each layer are different in size, the feature graph at the front end of the network is large, the feature graph is smaller after the pooling layer is added, the feature graph with large scale is used for processing small targets, and the feature graph with small scale is used for processing large targets.
5. The method for identifying video object behaviors based on spatial grid as recited in claim 4, wherein: the specific steps of adopting a priori frame for network training in the step (2.3) are as follows:
setting boxes with different sizes and length-width ratios by taking pixels of the feature map as centers, wherein each pixel is provided with a plurality of prior boxes with different sizes and length-width ratios for detecting targets with different sizes and length-width ratios; the detection target in the picture can use the prior frame most suitable for the detection target to train the network model; the size of the prior frame linearly increases, satisfying the following formula:
Figure QLYQS_1
wherein m is the number of feature graphs, and m has a value of 5, < >>
Figure QLYQS_2
Representing the proportion of the size of the kth a priori frame to the picture size, +.>
Figure QLYQS_3
and
Figure QLYQS_4
Respectively indicate->
Figure QLYQS_5
Minimum and maximum values of (2);
the generated prior frame and the real detection target match follow 2 criteria, wherein the first criterion is that a prior frame with the largest coincidence degree with the real detection target in the picture in the feature map is found first and is represented by an IOU, and then the prior frame with the largest IOU value is matched with the real detection target; the second matching criterion is to avoid that the number of positive and negative samples is too large, and for the remaining prior frame with the IOU value not being the maximum, if the IOU value of the real target exceeds a set threshold value, the prior frame is considered to be matched with the real target; the network finally outputs the category confidence and position coordinate information of the predicted target, so the loss function is a weighted sum of the category confidence error and the predicted position error of the predicted target:
Figure QLYQS_6
wherein ,
Figure QLYQS_7
Figure QLYQS_8
n: representing the number of positive samples in the a priori block;
Figure QLYQS_9
: only 0 or 1, if 1, the j-th real object in the representing picture is matched with the i-th prior frame, and the type of the real object is p;
c: confidence representing the target class;
l: representing predicted values for real targets;
g: position information representing a real target;
Figure QLYQS_10
: category confidence error;
Figure QLYQS_11
: predicted position error;
Figure QLYQS_12
: a loss function;
pos, positive sample set;
neg: a negative sample set;
Figure QLYQS_13
: the z-th target class is the confidence of p, where:
Figure QLYQS_14
Figure QLYQS_15
: the central position coordinates of the prediction frame, the width of the prediction frame and the height of the prediction frame;
Figure QLYQS_16
: a predicted value of an e-th detection target predicted by an i-th prior frame in the image;
Figure QLYQS_17
: the position of the e-th detection target predicted by the i-th prior frame in the image, the calculation formula +.>
Figure QLYQS_18
Expressed as:
Figure QLYQS_19
6. the method for identifying video object behaviors based on spatial grid according to claim 5, wherein: in the moving object detection in the step (3), the gray value of each pixel point is represented by a plurality of gaussian distributions, and each gaussian distribution function has different weights; if a pixel in the current video frame meets the established Gaussian model, the pixel is considered to be background, otherwise the pixel is considered to be foreground; and then updating parameters of the Gaussian model, sorting different Gaussian distributions according to priority, and selecting the Gaussian distribution which accords with the priority through a set threshold value to serve as a background model.
7. The method for identifying video object behaviors based on spatial grid as claimed in claim 6, wherein: in the step (3), moving object detection is carried out on the video frame through a moving object detection algorithm, K Gaussian distribution functions are selected to represent gray values of all pixel points in an image, and M models which describe the background are selected from the K Gaussian distribution functions; different gaussian distributions give different weights
Figure QLYQS_20
Wherein a represents different Gaussian distributions, so a+.K; selecting proper weights and thresholds, and when the weights meet the thresholds, considering pixels meeting the Gaussian distribution as the background, and considering the rest pixels as the foreground; let it be assumed that the gray value of the pixel value at a certain time tIs->
Figure QLYQS_21
The probability density function is represented by a combination of K gaussian distribution functions, then:
Figure QLYQS_22
, wherein :
Figure QLYQS_23
Representing the weights of the a-th mixed Gaussian model at the time t, and the sum of all the weights is 1;
Figure QLYQS_24
representing the mean value of the pixel gray values of the a-th mixed Gaussian model at the moment t;
Figure QLYQS_25
representing a covariance matrix of an a-th mixed Gaussian model at a moment t;
Figure QLYQS_26
representing the pixel gray value of the a-th mixed Gaussian model at the moment t;
the K Gaussian distribution functions are arranged in a descending order, and then the first M Gaussian distributions serving as the background are selected according to a preset threshold value; when a new image is processed, comparing and matching pixel points on the image with the established Gaussian mixture model, and if a certain pixel point and an a-th Gaussian distribution in the Gaussian mixture model meet the following conditions:
Figure QLYQS_27
wherein
Figure QLYQS_28
A gradation value representing the pixel value at time t;
Figure QLYQS_29
representing the standard deviation of the pixel gray values of the a-th mixed Gaussian model at the moment t;
then the point is considered to match the a-th gaussian distribution and then the successfully matched function is updated with the following parameters:
Figure QLYQS_30
, wherein ,
Figure QLYQS_31
Representing learning rate, 0 +.>
Figure QLYQS_32
The larger the value is, the larger the frequency of background updating in the video is;
Figure QLYQS_33
representing the variance of the pixel gray values of the a-th mixed Gaussian model at the moment t;
if the pixel is not matched with the Gaussian distribution function, the Gaussian distribution function parameter is not changed, only the corresponding weight is updated, and the corresponding formula is as follows:
Figure QLYQS_34
if the pixel is not matched with any Gaussian distribution function corresponding to the pixel, judging the pixel point as a foreground, and replacing a Gaussian model with the minimum weight in the established model; the mean value of the replaced new Gaussian function is the gray value of the current pixel; normalizing the weight of the updated background model, and making each Gaussian distribution function according to +.>
Figure QLYQS_35
The values are arranged in descending order; then, foreground screening is carried out according to the set threshold T, and the first M Gaussian distributions meeting the conditions are setPut as background and the rest as foreground. />
CN202310047339.6A 2023-01-31 2023-01-31 Video target behavior recognition method based on space grid Active CN115830515B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310047339.6A CN115830515B (en) 2023-01-31 2023-01-31 Video target behavior recognition method based on space grid

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310047339.6A CN115830515B (en) 2023-01-31 2023-01-31 Video target behavior recognition method based on space grid

Publications (2)

Publication Number Publication Date
CN115830515A CN115830515A (en) 2023-03-21
CN115830515B true CN115830515B (en) 2023-05-02

Family

ID=85520637

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310047339.6A Active CN115830515B (en) 2023-01-31 2023-01-31 Video target behavior recognition method based on space grid

Country Status (1)

Country Link
CN (1) CN115830515B (en)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103258332B (en) * 2013-05-24 2016-06-29 浙江工商大学 A kind of detection method of the moving target of resisting illumination variation
CN111477034B (en) * 2020-03-16 2021-01-29 中国电子科技集团公司第二十八研究所 Large-scale airspace use plan conflict detection and release method based on grid model
CN112070035A (en) * 2020-09-11 2020-12-11 联通物联网有限责任公司 Target tracking method and device based on video stream and storage medium
CN115098993A (en) * 2022-05-16 2022-09-23 南京航空航天大学 Unmanned aerial vehicle conflict detection method and device for airspace digital grid and storage medium
CN115493591A (en) * 2022-06-13 2022-12-20 中国人民解放军海军航空大学 Multi-route planning method
CN115578668A (en) * 2022-09-15 2023-01-06 浙江大华技术股份有限公司 Target behavior recognition method, electronic device, and storage medium

Also Published As

Publication number Publication date
CN115830515A (en) 2023-03-21

Similar Documents

Publication Publication Date Title
CN106960214B (en) Object recognition method based on image
CN104599275B (en) The RGB-D scene understanding methods of imparametrization based on probability graph model
CN108062574B (en) Weak supervision target detection method based on specific category space constraint
CN108446634B (en) Aircraft continuous tracking method based on combination of video analysis and positioning information
CN111914664A (en) Vehicle multi-target detection and track tracking method based on re-identification
CN111767882A (en) Multi-mode pedestrian detection method based on improved YOLO model
CN107633226B (en) Human body motion tracking feature processing method
CN108038435B (en) Feature extraction and target tracking method based on convolutional neural network
CN108399420B (en) Visible light ship false alarm rejection method based on deep convolutional network
CN111882586B (en) Multi-actor target tracking method oriented to theater environment
CN113221787A (en) Pedestrian multi-target tracking method based on multivariate difference fusion
CN110334703B (en) Ship detection and identification method in day and night image
CN110647802A (en) Remote sensing image ship target detection method based on deep learning
CN113761259A (en) Image processing method and device and computer equipment
CN109377511B (en) Moving target tracking method based on sample combination and depth detection network
CN112818905B (en) Finite pixel vehicle target detection method based on attention and spatio-temporal information
CN110008899B (en) Method for extracting and classifying candidate targets of visible light remote sensing image
CN113221956B (en) Target identification method and device based on improved multi-scale depth model
CN111274964A (en) Detection method for analyzing water surface pollutants based on visual saliency of unmanned aerial vehicle
CN116109950A (en) Low-airspace anti-unmanned aerial vehicle visual detection, identification and tracking method
CN112036384A (en) Sperm head shape recognition method, device and equipment
CN115082551B (en) Multi-target detection method based on unmanned aerial vehicle aerial video
CN111444816A (en) Multi-scale dense pedestrian detection method based on fast RCNN
CN115082781A (en) Ship image detection method and device and storage medium
CN104331711B (en) SAR image recognition methods based on multiple dimensioned fuzzy mearue and semi-supervised learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant