CN108596032B

CN108596032B - Detection method, device, equipment and medium for fighting behavior in video

Info

Publication number: CN108596032B
Application number: CN201810234688.8A
Authority: CN
Inventors: 张凯; 卢维; 殷俊; 穆方波
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2018-03-21
Filing date: 2018-03-21
Publication date: 2020-09-29
Anticipated expiration: 2038-03-21
Also published as: CN108596032A

Abstract

The invention discloses a method, a device, equipment and a medium for detecting a fighting behavior in a video, which are used for solving the problems that the real three-dimensional motion is difficult to accurately analyze and the accuracy of a detection result is low in the prior art. The method for detecting the fighting behaviors comprises the steps of conducting three-dimensional motion analysis on each feature point in a first pair of images and a second pair of images which are continuous in a binocular video, determining a motion amplitude mean value, a motion direction entropy value and an influence area of each feature point, judging whether the first pair of images are target images or not, determining each duration according to the continuous target images, and determining whether the fighting behaviors occur in the videos corresponding to the durations according to the target motion amplitude mean value, the target motion direction entropy value, the target influence area, the durations and a fighting detection model which is trained in advance. The real three-dimensional motion can be accurately analyzed by the framing detection model; and then whether the behavior of fighting a shelf takes place is accurately judged, and the accuracy of behavior detection of fighting a shelf is improved.

Description

Detection method, device, equipment and medium for fighting behavior in video

Technical Field

The invention relates to the technical field of computer vision, in particular to a method, a device, equipment and a medium for detecting a fighting behavior in a video.

Background

Due to the fact that the specific process of fighting behaviors is very complicated, it is difficult to accurately establish a model for fighting behavior detection, even if the limb information of a person can be detected in certain specific scenes, it is difficult to track the information in the fighting process, and it is needless to say that specific fighting actions are defined based on the information. Therefore, detection of fighting is generally based on abnormal motion detection. In the behavior analysis of the traditional two-dimensional image, an optical flow field is approximated to a two-dimensional velocity vector field, motion information can be approximately described through the optical flow field, abnormal motion is defined to be violent and disordered motion, and corresponding abnormal optical flow is large in optical flow amplitude and disordered in optical flow direction.

The existing crowd abnormity analysis method is mainly used for judging the acuteness and chaos degree of the pointer movement based on the moving range of the foreground of the front and rear frames of the monocular camera video and the size and direction of the optical flow field between image sequences; or depth information is acquired based on a binocular camera, background modeling is carried out by utilizing the depth information, and then fighting behavior judgment is carried out according to the chaos degree and the speed index of the two-dimensional optical flow field; or an evaluation strategy of the motion intensity is formulated by utilizing the depth information, the first threshold value is judged by analyzing the corresponding two-dimensional optical flow according to the information of different depths, and the framing behavior is still analyzed according to the two-dimensional optical flow vector field of the image.

Because the two-dimensional optical flow field can only accurately describe motion information perpendicular to the optical axis direction of the camera, and has no description capacity for motion parallel to the optical axis direction of the camera, the traditional fighting detection algorithm based on two-dimensional optical flow analysis has some obvious defects, for example, firstly, for the same three-dimensional motion, the corresponding module values of the two-dimensional optical flow occurring at different depth of field are different; secondly, for the same object, the sizes of corresponding image areas at different depths of field are also obviously different, and the same optical flow result cannot be obtained by correspondingly doing the same motion; thirdly, the same target point does the motion with different directions and the same size, and the module values of the two-dimensional optical flow are not equal. Since the two-dimensional optical flow field is not equal to the motion vector field, the real motion in the three-dimensional scene described by the two-dimensional optical flow field is inaccurate and has great errors, so that the real three-dimensional motion is difficult to be accurately analyzed in the prior art, and the accuracy of the detection result is low when abnormal behaviors such as fighting and the like are detected.

Disclosure of Invention

The invention provides a method, a device, equipment and a medium for detecting a fighting behavior in a video, which are used for solving the problems that the real three-dimensional motion is difficult to accurately analyze and the accuracy of a detection result is low in the prior art.

The embodiment of the invention provides a detection method of a fighting behavior in a video, which comprises the following steps:

the following processing is carried out on a first pair of images and a second pair of images which are continuous in the binocular video: for each first feature point in the first pair of images and the second pair of images, determining a first motion vector corresponding to each first feature point according to a coordinate pair of each first feature point in the first pair of images and the second pair of images and a first calibration parameter corresponding to the binocular video, and determining a first motion vector direction of each first feature point and a first motion amplitude mean value of the first pair of images; projecting the first motion vector direction of each first feature point to a grid corresponding to a preset grid space, and determining a first motion direction entropy value of the first pair of images; determining a first influence area of the first pair of images on the ground according to the number of ground grids corresponding to the ground projected by the three-dimensional coordinates of each first feature point in the world coordinate system;

for each first pair of images, judging whether a first motion amplitude mean value, a first motion direction entropy value and a first influence area of the first pair of images are all larger than a corresponding preset first threshold value, and if so, marking the first pair of images as a first target image; determining each first time length according to the marked continuous first target images;

for each first time length, determining a first target motion amplitude mean value, a first target motion direction entropy value and a first target influence area according to each first target image corresponding to the first time length; and determining whether the frame-fighting behavior of the video corresponding to the first time length occurs according to the pre-trained frame-fighting detection model, the first target motion amplitude mean value, the first target motion direction entropy value, the first target influence area and the first time length.

Further, the training process of the fighting detection model comprises the following steps:

performing the following processing on each continuous first pair of sample images and second pair of sample images in the sample binocular video: for each second feature point in the first pair of sample images and the second pair of sample images, determining a second motion vector corresponding to each second feature point according to a coordinate pair of each second feature point in the first pair of sample images and the second pair of sample images and a second calibration parameter corresponding to the sample binocular video, and determining a second motion vector direction of each second feature point and a second motion amplitude mean value of the first pair of sample images; projecting the second motion vector direction of each second feature point to a grid corresponding to a preset grid space, and determining a second motion direction entropy value of the first pair of sample images; determining a second influence area of the first pair of sample images on the ground according to the number of ground grids corresponding to the projection of the three-dimensional coordinates of each second feature point in the world coordinate system to the ground;

for each first pair of sample images, judging whether a second motion amplitude value mean value, a second motion direction entropy value and a second influence area of the first pair of sample images are all larger than a corresponding preset second threshold value, and if so, marking the first pair of sample images as a second target image; determining each second time length according to the marked continuous second target images;

and aiming at each second duration, determining a second target motion amplitude mean value, a second target motion direction entropy value and a second target influence area according to each second target image corresponding to the second duration, and training the fighting detection model according to the second target motion amplitude mean value, the second target motion direction entropy value, the second target influence area, the second duration and identification information of whether each frame of image in the sample binocular video has fighting behavior.

Further, determining a coordinate pair of each second feature point in the first pair of sample images and the second pair of sample images, the method further comprising:

acquiring a disparity map corresponding to the first pair of sample images according to the first left image and the first right image of the first pair of sample images;

for each second feature point in the second feature point set, determining each first candidate feature point adjacent to the second feature point in the first left image; adding each first candidate feature point to a candidate set; for each first candidate feature point in the candidate set, according to the disparity map, determining a first pixel point corresponding to the first candidate feature point in a first right map, and determining a second pixel point and a third pixel point corresponding to the first candidate feature point in a second left map and a second right map of a second pair of sample images; determining a first neighborhood containing the second pixel point in a second left image, determining a second neighborhood containing a third pixel point in a second right image, and determining a first target pixel point and a second target pixel point which are matched with the first candidate feature point and the first pixel point in the first neighborhood and the second neighborhood respectively; and judging whether the correlation between the first candidate feature point, the first pixel point, the first target pixel point and the second target pixel point is greater than a set threshold value or not, if so, determining that the first candidate feature point, the first pixel point, the first target pixel point and the second target pixel point are second target feature points, moving the second feature points out of the second feature point set, updating the second target feature points into second feature points and adding the second feature points into the second feature point set.

Further, the determining the second motion magnitude mean of the first pair of sample images comprises:

acquiring a second motion amplitude corresponding to each second feature point according to the second motion vector corresponding to each second feature point;

and extracting each second motion amplitude larger than a preset second threshold value, and determining a second motion amplitude mean value of the first pair of sample images according to each extracted second motion amplitude.

Further, the projecting the second motion vector direction of each second feature point to a preset grid space, and the determining the second motion direction entropy of the first pair of sample images includes:

according to the second motion amplitude corresponding to each second feature point, each target second feature point of which the corresponding second motion amplitude is larger than a preset third threshold is obtained; and are

And projecting the second motion vector direction of each target second feature point to a preset grid space, and determining a second motion direction entropy value of the first pair of sample images according to the corresponding grid.

Further, determining a second area of influence of the first pair of sample images on the ground according to the number of ground grids corresponding to the ground projected by the coordinates of each second feature point in the world coordinate system includes:

and respectively acquiring a third three-dimensional coordinate and a fourth three-dimensional coordinate of each target second characteristic point in a world coordinate system according to each target second characteristic point, projecting the third three-dimensional coordinate and the fourth three-dimensional coordinate of each target second characteristic point to the ground, and determining a second influence area of the first pair of sample images on the ground according to the number of ground grids corresponding to each third three-dimensional coordinate and each fourth three-dimensional coordinate.

Further, determining the coordinate pair of each first feature point in the first pair of images and the second pair of images, the method further comprising:

acquiring a disparity map corresponding to the first pair of images according to the third left image and the third right image of the first pair of images;

for each first feature point in the first feature point set, determining each second candidate feature point adjacent to the first feature point in the third left image; adding each second candidate feature point to the candidate set; for each second candidate feature point in the candidate set, according to the disparity map, determining a fourth pixel point corresponding to the second candidate feature point in a third right image, and determining a fifth pixel point and a sixth pixel point corresponding to the second candidate feature point in a fourth left image and a fourth right image of the second pair of images; determining a third neighborhood containing the fifth pixel point in a fourth left image, determining a fourth neighborhood containing a sixth pixel point in a fourth right image, and determining a third target pixel point and a fourth target pixel point which are matched with the second candidate feature point and the fourth pixel point in the third neighborhood and the fourth neighborhood respectively; and determining the correlation among the second candidate feature point, the fourth pixel point, the third target pixel point and the fourth target pixel point, judging whether the correlation is greater than a set threshold value, if so, determining the second candidate feature point, the fourth pixel point, the third target pixel point and the fourth target pixel point as first target feature points, moving the first feature points out of the first feature point set, updating the first target feature points into the first feature points and adding the first feature points into the first feature point set.

Further, said determining the first motion magnitude mean of the first pair of images comprises:

and according to the first motion vector corresponding to each first feature point, acquiring a first motion amplitude corresponding to each first feature point, extracting a first motion amplitude larger than a preset second threshold value, and according to each extracted first motion amplitude, determining a first motion amplitude mean value of the first pair of images.

Further, the projecting the first motion vector direction according to each first feature point to a preset grid space, and the determining the first motion direction entropy of the first pair of images includes:

acquiring each target first characteristic point of which the corresponding first motion amplitude is greater than a preset third threshold according to the first motion amplitude corresponding to each first characteristic point; and are

And projecting the first motion vector direction of each target first feature point to a preset grid space, and determining a first motion direction entropy value of the first pair of images according to the corresponding grid.

Further, determining the first area of influence of the first pair of images on the ground according to the number of ground grids corresponding to the ground projected by the coordinates of each first feature point in the world coordinate system comprises:

and respectively acquiring a seventh three-dimensional coordinate and an eighth three-dimensional coordinate of each target first characteristic point in a world coordinate system according to each target first characteristic point, projecting the seventh three-dimensional coordinate and the eighth three-dimensional coordinate of each target first characteristic point to the ground, and determining a first influence area of the first pair of images on the ground according to the number of ground grids corresponding to each seventh three-dimensional coordinate and eighth three-dimensional coordinate.

The embodiment of the invention provides a detection device for fighting behaviors in videos, which comprises:

the acquisition module is used for carrying out the following processing on a first pair of continuous images and a second pair of continuous images in the binocular video: for each first feature point in the first pair of images and the second pair of images, determining a first motion vector corresponding to each first feature point according to a coordinate pair of each first feature point in the first pair of images and the second pair of images and a first calibration parameter corresponding to the binocular video, and determining a first motion vector direction of each first feature point and a first motion amplitude mean value of the first pair of images; projecting the first motion vector direction of each first feature point to a grid corresponding to a preset grid space, and determining a first motion direction entropy value of the first pair of images; determining a first influence area of the first pair of images on the ground according to the number of ground grids corresponding to the ground projected by the coordinates of each first feature point in a world coordinate system;

the duration determining module is used for judging whether a first motion amplitude mean value, a first motion direction entropy value and a first influence area of each first pair of images are all larger than a corresponding preset first threshold value or not, and if so, marking the first pair of images as a first target image; determining each first time length according to the marked continuous first target images;

the fighting behavior determining module is used for determining a first target motion amplitude mean value, a first target motion direction entropy value and a first target influence area according to each first target image corresponding to each first time length; and determining whether the frame-fighting behavior of the video corresponding to the first time length occurs according to the pre-trained frame-fighting detection model, the first target motion amplitude mean value, the first target motion direction entropy value, the first target influence area and the first time length.

Further, the fighting behavior detection device further comprises:

the training module is used for carrying out the following processing on each continuous first pair of sample images and second pair of sample images in the sample binocular video: for each second feature point in the first pair of sample images and the second pair of sample images, determining a second motion vector corresponding to each second feature point according to a coordinate pair of each second feature point in the first pair of sample images and the second pair of sample images and a second calibration parameter corresponding to the sample binocular video, and determining a second motion vector direction of each second feature point and a second motion amplitude mean value of the first pair of sample images; projecting the second motion vector direction of each second feature point to a grid corresponding to a preset grid space, and determining a second motion direction entropy value of the first pair of sample images; respectively acquiring the number of ground grids corresponding to the projection of the coordinates of each second feature point in the world coordinate system to the ground, and determining a second influence area of the first pair of sample images on the ground; for each first pair of sample images, judging whether a second motion amplitude value mean value, a second motion direction entropy value and a second influence area of the first pair of sample images are all larger than a corresponding preset second threshold value, and if so, marking the first pair of sample images as a second target image; determining each second time length according to the marked continuous second target images; and aiming at each second duration, determining a second target motion amplitude mean value, a second target motion direction entropy value and a second target influence area according to each second target image corresponding to the second duration, and training the fighting detection model according to the second target motion amplitude mean value, the second target motion direction entropy value, the second target influence area, the second duration and identification information of whether each frame of image in the sample binocular video has fighting behavior.

Further, the training module is further configured to determine that each second feature point is in front of a coordinate pair in the first pair of sample images and the second pair of sample images, and obtain a disparity map corresponding to the first pair of sample images according to a first left image and a first right image of the first pair of sample images; for each second feature point in the second feature point set, determining each first candidate feature point adjacent to the second feature point in the first left image according to the second feature point of the first left image in the first pair of sample images; adding each first candidate feature point to a candidate set; for each first candidate feature point in the candidate set, determining a first pixel point corresponding to the first candidate feature point in a first right image according to the disparity map, and determining a second pixel point and a third pixel point corresponding to the first candidate feature point in a second left image and a second right image of a second pair of sample images according to the first candidate feature point and the first pixel point respectively; determining a first neighborhood containing the second pixel point in a second left image, determining a second neighborhood containing a third pixel point in a second right image, and determining a first target pixel point and a second target pixel point which are matched with the first candidate feature point and the first pixel point in the first neighborhood and the second neighborhood respectively; and judging whether the correlation between the first candidate feature point, the first pixel point, the first target pixel point and the second target pixel point is greater than a set threshold value or not, if so, determining that the first candidate feature point, the first pixel point, the first target pixel point and the second target pixel point are second target feature points, moving the second feature points out of the second feature point set, updating the second target feature points into second feature points and adding the second feature points into the second feature point set.

Further, the training module is specifically configured to obtain a second motion amplitude corresponding to each second feature point according to a second motion vector corresponding to each second feature point; and extracting each second motion amplitude larger than a preset second threshold value, and determining a second motion amplitude mean value of the first pair of sample images according to each extracted second motion amplitude.

Further, the training module is specifically configured to obtain each target second feature point, where a corresponding second motion amplitude is greater than a preset third threshold, according to a second motion amplitude corresponding to each second feature point; and projecting the second motion vector direction of each target second feature point to a preset grid space, and determining the second motion direction entropy of the first pair of sample images according to the corresponding grids.

Further, the training module is specifically configured to obtain a third three-dimensional coordinate and a fourth three-dimensional coordinate of each target second feature point in a world coordinate system according to each second target second feature point, project the third three-dimensional coordinate and the fourth three-dimensional coordinate of each target second feature point to the ground, and determine a second influence area of the first pair of sample images on the ground according to the number of ground grids corresponding to each third three-dimensional coordinate and each fourth three-dimensional coordinate.

Further, the obtaining module is further configured to obtain a disparity map corresponding to the first pair of images according to a third left image and a third right image of the first pair of images before determining a coordinate pair of each first feature point in the first pair of images and the second pair of images; for each first feature point in the first feature point set, determining each second candidate feature point adjacent to the first feature point in a third left image in the first pair of images according to the first feature point of the third left image; adding each second candidate feature point to the candidate set; for each second candidate feature point in the candidate set, determining a fourth pixel point corresponding to the second candidate feature point in a third right image according to the disparity map, and determining a fifth pixel point and a sixth pixel point corresponding to the second candidate feature point in a fourth left image and a fourth right image of the second pair of images according to the second candidate feature point and the fourth pixel point respectively; determining a third neighborhood containing the fifth pixel point in a fourth left image, determining a fourth neighborhood containing a sixth pixel point in a fourth right image, and determining a fifth target pixel point and a sixth target pixel point which are matched with the second candidate feature point and the fourth pixel point in the third neighborhood and the fourth neighborhood respectively; and judging whether the correlation between the second candidate feature point, the fourth pixel point, the third target pixel point and the fourth target pixel point is greater than a set threshold value or not, if so, determining that the second candidate feature point, the fourth pixel point, the third target pixel point and the fourth target pixel point are the first target feature point, moving the first feature point out of the first feature point set, updating the first target feature point into the first feature point and adding the first feature point into the first feature point set.

Further, the obtaining module is specifically configured to obtain a first motion amplitude corresponding to each first feature point according to the first motion vector corresponding to each first feature point, extract a first motion amplitude greater than a preset second threshold, and determine a first motion amplitude mean value of the first pair of images according to each extracted first motion amplitude.

Further, the obtaining module is specifically configured to obtain, according to the first motion amplitude corresponding to each first feature point, each target first feature point of which the corresponding first motion amplitude is greater than a preset third threshold; and projecting the first motion vector direction of each target first feature point to a preset grid space, and determining a first motion direction entropy value of the first pair of images according to the corresponding grids.

Further, the obtaining module is specifically configured to obtain a seventh three-dimensional coordinate and an eighth three-dimensional coordinate of each target first feature point in a world coordinate system according to each target first feature point, project the seventh three-dimensional coordinate and the eighth three-dimensional coordinate of each target first feature point to the ground, and determine a first area of influence of the first pair of images on the ground according to the number of ground grids corresponding to each seventh three-dimensional coordinate and each eighth three-dimensional coordinate.

The embodiment of the invention provides electronic equipment, which comprises a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for finishing mutual communication through the communication bus;

a memory for storing a computer program;

and the processor is used for realizing the step of detecting the fighting behavior when executing the program stored in the memory.

The embodiment of the invention provides a computer-readable storage medium, wherein a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the step of detecting the fighting behavior is realized.

The embodiment of the invention provides a method, a device, equipment and a medium for detecting a fighting behavior in a video, wherein the detection method comprises the following steps of: for each first feature point in the first pair of images and the second pair of images, determining a first motion vector corresponding to each first feature point according to a coordinate pair of each first feature point in the first pair of images and the second pair of images and a first calibration parameter corresponding to the binocular video, and determining a first motion vector direction of each first feature point and a first motion amplitude mean value of the first pair of images; projecting the first motion vector direction of each first feature point to a grid corresponding to a preset grid space, and determining a first motion direction entropy value of the first pair of images; respectively acquiring the number of ground grids corresponding to the projection of the coordinates of each first characteristic point in a world coordinate system to the ground, and determining a first influence area of the first pair of images on the ground; for each first pair of images, if the first motion amplitude mean value, the first motion direction entropy value and the first influence area of the first pair of images are all larger than the corresponding preset first threshold value, marking the first pair of images as a first target image; determining each first time length according to the marked continuous first target images; for each first duration, determining a first target motion amplitude mean value, a first target motion direction entropy value and a first target influence area according to each first target image corresponding to the second duration; and determining whether the frame-fighting behavior of the video corresponding to the second duration occurs according to the pre-trained frame-fighting detection model, the first target motion amplitude mean value, the first target motion direction entropy value, the first target influence area and the first duration. In the embodiment of the invention, by determining the first motion amplitude mean value, the first motion direction entropy value and the first influence area of each first feature point aiming at each first feature point in two continuous images in a binocular video, judging whether each first pair of images is a first target image aiming at each first pair of images, determining each first time length according to the continuous first target images, and further determining whether the frame-hitting behavior occurs to the video corresponding to the first time length according to the first target motion amplitude mean value, the first target motion direction entropy value, the first target influence area, the first time length and a frame-hitting detection model which is trained in advance. The three-dimensional motion condition of each first characteristic point in a three-dimensional scene can be accurately obtained by the framing detection model, and the real three-dimensional motion can be accurately obtained according to the actual physical size; by determining the first target motion amplitude mean value, the first target motion direction entropy value, the first target influence area and the first time length corresponding to the first time length, the fighting detection model can analyze the three-dimensional motion condition of the target based on the three-dimensional motion vector field (namely scene flow), so as to accurately judge whether fighting behaviors occur, and improve the accuracy of fighting behavior detection.

Drawings

FIG. 1 is a flowchart of a training process of a fighting detection model according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a method for growing a seed dot according to an embodiment of the present invention;

FIG. 3A is a schematic diagram of a grid space according to an embodiment of the present invention;

FIG. 3B is a schematic diagram of a grid space according to an embodiment of the present invention;

fig. 4 is a flowchart of a method for detecting fighting behavior according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a device for detecting a fighting behavior in a video according to an embodiment of the present invention;

fig. 6 is a schematic diagram of a computer device according to an embodiment of the present invention.

Detailed Description

In order to accurately detect the fighting behavior in the video, the embodiment of the invention provides a method, a device, equipment and a medium for detecting the fighting behavior in the video.

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1:

fig. 1 is a flowchart of training a fighting detection model according to an embodiment of the present invention, and the specific processing procedure is as follows:

s101: performing the following processing on each continuous first pair of sample images and second pair of sample images in the sample binocular video: and for each second feature point in the first pair of sample images and the second pair of sample images, determining a second motion vector corresponding to each second feature point according to a coordinate pair of each second feature point in the first pair of sample images and the second pair of sample images and a second calibration parameter corresponding to the sample binocular video, and determining a second motion vector direction of each second feature point and a second motion amplitude mean value of the first pair of sample images.

In the embodiment of the invention, the training method of the fighting detection model is applied to electronic equipment, and the electronic equipment can be a PC (Personal computer), a tablet Personal computer, a smart phone, a Personal Digital Assistant (PDA), image acquisition equipment, image processing equipment and the like.

The electronic device can train the fighting detection model according to the sample binocular video, the sample binocular video comprises a left image and a right image at each time point, and therefore a pair of images, namely the left image and the right image, is provided for each time point. The first pair of sample images and the second pair of sample images may be any two consecutive frames of images in the sample binocular video, and at this time, the first pair of sample images is a previous frame of image of the second pair of sample images. Wherein the first pair of sample images may include a first left image and a first right image, and the second pair of sample images may include a second left image and a second right image.

After the first pair of sample images and the second pair of sample images are determined, coordinates of each feature point in the first pair of sample images and the second pair of sample images are determined through sparse feature point matching, specifically, for each feature point, a coordinate of the feature point in a first left image and a coordinate of the feature point in a first right image in the first pair of sample images are determined, that is, a first coordinate pair of the feature point in the first pair of sample images is determined, and similarly, a second coordinate pair of the feature point in the second pair of sample images is determined.

The process of determining the feature points in the image by sparse feature point matching belongs to the prior art, and is not repeated in the implementation of the invention.

For example, the coordinates of the pixel points corresponding to a certain second feature point in the first left image, the first right image, the second left image and the second right image in the image are respectively:

and

wherein, because of the feature point, the vertical coordinates of the corresponding pixel points in the first left image and the first right image are the same, and the vertical coordinates of the corresponding pixel points in the second left image and the second right image are also the same, that is,

and is

For convenience of presentation, order

At this time, the first coordinate pair may be expressed as

The second coordinate pair may be expressed as

Specifically, in the embodiment of the present invention, the second calibration parameter of the sample binocular video is stored in advance. And the second calibration parameter is an internal and external calibration parameter of a binocular camera corresponding to the sample binocular video. The process of calibrating the binocular camera and acquiring the second calibration parameter belongs to the prior art, and is not described in detail in the embodiment of the invention.

According to the three-dimensional reconstruction principle of the binocular camera and the first calibration parameters of the binocular camera corresponding to the sample binocular video, the first coordinate pair of the second feature point can be utilized to determine the first three-dimensional coordinate of the first coordinate pair under the camera coordinate system, and the second coordinate pair of the second feature point is utilized to determine the second three-dimensional coordinate of the second coordinate pair under the camera coordinate system.

Specifically, knowing the second calibration parameter, determining a coordinate pair in the image coordinate system, wherein the relationship between the coordinate pair and the three-dimensional coordinate in the camera coordinate system satisfies the following formula: suppose a second feature point is a point p, and its first coordinate pair is

The second coordinate pair is

The first coordinate of the second feature point corresponds to the first three-dimensional coordinate

Comprises the following steps:

similarly, the second coordinate of the second feature point corresponds to the second three-dimensional coordinate

Comprises the following steps:

wherein, B, u₀And v₀Is a specific parameter in the second calibration parameters of the binocular camera, specifically, B is the center distance of two cameras in the binocular camera, f is the focal length of the binocular camera, u₀And v₀Respectively the abscissa and the ordinate of the origin of the image coordinate system in the pixel coordinate system.

Since the first pair of sample images and the second pair of sample images are two continuous frames of images in the sample binocular video, for the convenience of expression, the first pair of sample images is assumed to be corresponding to t in the sample binocular video₁The image of the moment, the second pair of sample images are corresponding t in the sample binocular video₂An image of a time; the first three-dimensional coordinate and the second three-dimensional coordinate of each second feature point are respectively that the second feature point is at t₁Time t and₂the corresponding three-dimensional coordinates at the moment. The method for determining the second motion vector corresponding to each second feature point according to the first three-dimensional coordinate and the second three-dimensional coordinate of each second feature point may be that, for each second feature point, the first three-dimensional coordinate of the second feature point is used as a starting point, the second three-dimensional coordinate is used as an end point, the second motion vector of the second feature point is obtained, and then the second motion vector of the second feature point is obtainedAnd acquiring a second motion vector of each second feature point, wherein the second motion vector direction of each second feature point is the direction in which the corresponding first three-dimensional coordinate points to the second three-dimensional coordinate.

Specifically, it is assumed that n second feature points are common in the first pair of sample images and the second pair of sample images, where the ith second feature point p_iHas a first three-dimensional coordinate of (X)₁,Y₁,Z₁) The second three-dimensional coordinate is (X)₂,Y₂,Z₂) (ii) a The second motion vector of the second feature point is (dX)_i,dY_i,dZ_i) Wherein, dX_i＝X₂-X₁，dY_i＝Y₂-Y₁，dZ_i＝Z₂-Z₁. At this time, the second feature point p_iOf a second amplitude of motion f_iCan be calculated by the following formula:

for the n second feature points, after the second motion amplitude of each second feature point is determined, the second motion amplitude mean of the first pair of sample images may be calculated by the following formula:

wherein n and i are positive integers, and i is less than or equal to n.

S102: and determining a second motion direction entropy value of the first pair of sample images according to the projection of the second motion vector direction of each second feature point to a grid corresponding to a preset grid space.

In the embodiment of the present invention, a cube grid space is preset, each face of the cube grid space includes grids with the same number, a second solid angle range corresponding to each grid may be predetermined, and for each face, a first solid angle range corresponding to the face may also be predetermined, so that for each grid on the face, an angle proportion value corresponding to the grid may be determined according to a ratio of a second difference value corresponding to the second solid angle range corresponding to the grid to a first difference value corresponding to the first solid angle range.

After the second motion vector direction of each feature point is determined, each feature point is arranged at the center of the cube, and for each feature point, according to the second motion vector direction of the feature point, which grid the second motion vector direction of the feature point is projected to the grid space of the cube corresponds to is determined, that is, a straight line is determined by taking the center of the cube as a starting point and the second motion vector direction as a direction, and the grid where the intersection point of the straight line and the grid space of the cube is located is taken as the grid corresponding to the preset grid space to which the second motion vector direction is projected.

The second motion vector direction of the second feature point is projected to a preset grid space, after a grid corresponding to the second motion vector direction of each second feature point is determined, the second motion direction entropy of the first pair of images can be determined according to the angle proportional value corresponding to each grid, for example, the sum of the angle proportional values corresponding to each grid on which the second motion vector direction is projected can be used as the second motion direction entropy of the first pair of sample images.

S103: and determining a second influence area of the first pair of sample images on the ground according to the number of ground grids corresponding to the ground projected by the coordinates of each second feature point in the world coordinate system.

Since the three-dimensional coordinates of each feature point in the camera coordinate system have already been acquired, the relationship between the camera coordinate system and the world coordinate system can be predicted, and therefore, from the first three-dimensional coordinates and the second three-dimensional coordinates of each second feature point in the camera coordinate system, the third three-dimensional coordinates and the fourth three-dimensional coordinates corresponding to the first three-dimensional coordinates and the second three-dimensional coordinates, respectively, can be acquired in the world coordinate system.

The ground can be divided into a plurality of grids with equal areas, the shape of the divided ground grid can be rectangular, square or rhombic, and other shapes, and preferably, in order to facilitate subsequent operations, the ground grid is divided into square grids with equal areas in the embodiment of the invention. Specifically, the area of the divided ground grids may be set according to an empirical value, and in the embodiment of the present invention, the area of each ground grid may be determined to be between 4 square centimeters and 25 square centimeters.

And respectively projecting the acquired third three-dimensional coordinate and the acquired fourth three-dimensional coordinate of each second feature point in a world coordinate system onto the ground, marking the projected ground grids, counting the number of the marked ground grids projected onto the ground by aiming at each feature point, determining the projection area corresponding to the feature point according to the number of the ground grids obtained by counting and the area of each ground grid, and determining the second influence area of the first pair of sample images on the ground according to the sum of the projection areas corresponding to each feature point in the first pair of sample images.

S104: for each first pair of sample images, judging whether a second motion amplitude value mean value, a second motion direction entropy value and a second influence area of the first pair of sample images are all larger than a corresponding preset second threshold value, and if so, marking the first pair of sample images as a second target image; each second time duration is determined based on the marked consecutive second target images.

A motion amplitude mean value threshold, a motion direction entropy threshold and an influence area threshold are pre-stored in the electronic equipment, and for each first pair of sample images in the sample binocular video, a second motion amplitude mean value of the first pair of sample images is obtained

After the second motion direction entropy value E and the second influence area S, firstly, the second motion amplitude mean value of the first pair of sample images is judged

Whether the second motion direction entropy value E and the second influence area S are respectively larger than the motion amplitude mean value threshold, the motion direction entropy value threshold and the influence area threshold.

If the second motion amplitude mean of a certain first pair of sample images

If the motion amplitude value is larger than the motion amplitude value mean threshold, the second motion direction entropy value E is larger than the motion direction entropy value threshold, and the second influence area S is larger than the influence area threshold, the first pair of sample images are marked as target images, otherwise, the first pair of sample images are not marked.

Whether each first pair of sample images is marked or not can be determined by adopting the method, when each second target image is determined, a plurality of second target images possibly exist in the sample binocular video, and continuous second target images possibly exist, so that each image group can be determined, and the second time length corresponding to each image group can be determined according to the number of the second target images contained in each image group, wherein each image group comprises at least one frame of the second target images.

For example, in the sample binocular video, the images marked as the second target image are the first frame image, the second frame image, the third frame image, the fifth frame image, the seventh frame image, the eighth frame image, and so on, respectively, then the first frame image, the second frame image, and the third frame image may be determined as one image group, a second duration may be determined, the fifth frame image is one image group, a second duration may be determined, and the seventh frame image, and the eighth frame image are another image group, and another second duration may be determined.

S105: and aiming at each second duration, determining a second target motion amplitude mean value, a second target motion direction entropy value and a second target influence area according to each second target image corresponding to the second duration, and training the fighting detection model according to the second target motion amplitude mean value, the second target motion direction entropy value, the second target influence area, the second duration and identification information of whether each frame of image in the sample binocular video has fighting behavior.

For each second duration, when the second target motion amplitude mean value, the second target motion direction entropy value and the second target influence area are determined according to each second target image corresponding to the second duration, multiple implementation manners can be provided. For example: for each second duration, according to the corresponding second motion amplitude mean value, second motion direction entropy value and second influence area of each second target image, respectively taking the corresponding maximum value of the second motion amplitude mean value, the second motion direction entropy value and the second influence area as the second target motion amplitude mean value, the second target motion direction entropy value and the second target influence area; or respectively calculating the average value of the second motion amplitude mean value, the average value of the second motion direction entropy value and the average value of the second influence area according to the corresponding second motion amplitude mean value, the corresponding second motion direction entropy value and the corresponding second influence area of each second target image, and respectively taking the average values as the second target motion amplitude mean value, the corresponding second target motion direction entropy value and the corresponding second influence area.

In order to achieve training of the model, identification information of whether the framing behavior occurs in each frame of image of the sample binocular video is preset, for example, if a certain frame of image contains the framing behavior, the identification information of the frame of image can be set to 1, and if the certain frame of image does not contain the framing behavior, the identification information of the frame of image can be set to 0.

When the framing detection model is trained, the mean value of the motion amplitude of the second target, the entropy value of the motion direction of the second target, the influence area of the second target and the second duration are input into the framing detection model, the framing detection model outputs the probability of occurrence of the framing behavior of each continuous second target image, when the probability is greater than a preset framing probability threshold, it is determined that the framing behavior occurs in the continuous second target images of the current input model, and the framing detection model is trained according to the identification information of whether the framing behavior occurs in the continuous second target images of the current input model in a sample.

In the embodiment of the invention, by aiming at each second feature point in two continuous sample images in a sample binocular video, according to the first coordinate pair and the second coordinate pair of the feature point in the sample images, the first three-dimensional coordinate and the second three-dimensional coordinate of the feature point in a camera coordinate system, and the third three-dimensional coordinate and the fourth three-dimensional coordinate of each second feature point in a world coordinate system are determined, so as to further determine the second motion amplitude mean value, the second motion direction entropy value and the second influence area of each feature point, aiming at each first pair of sample images, whether each first pair of sample images is a second target image is judged, and according to the continuous second target images, each second duration is determined, further according to the second target motion amplitude mean value, the second target motion direction entropy value, the second target influence area, the second duration corresponding to each second duration and the identification information of whether each frame of image in the sample binocular video is framed, and training the fighting detection model. The three-dimensional motion condition of each second characteristic point in the three-dimensional scene can be accurately obtained by the framing detection model, and the real three-dimensional motion can be accurately obtained according to the actual physical size; by determining the second target motion amplitude mean value, the second target motion direction entropy value, the second target influence area and the second duration corresponding to the second duration, the fighting detection model can analyze the three-dimensional motion condition of the target based on the three-dimensional motion vector field (namely, the scene flow), so as to accurately judge whether the fighting behavior occurs, and improve the accuracy of fighting behavior detection.

Example 2:

in order to make the determined fighting detection model more accurate, on the basis of the above embodiment, the method further includes determining that each second feature point is before the coordinate pair in the first pair of sample images and the second pair of sample images, and the method further includes:

for each second feature point in the second feature point set, determining each first candidate feature point adjacent to the second feature point in the first left image according to the second feature point of the first left image in the first pair of sample images; adding each first candidate feature point to a candidate set; aiming at each first candidate feature point in the candidate set, determining a first pixel point corresponding to the first candidate feature point in a first right image according to the disparity map, and determining a second pixel point and a third pixel point corresponding to the first candidate feature point in a second left image and a second right image according to the first candidate feature point and the first pixel point respectively; determining a first neighborhood containing the second pixel point in a second left image, determining a second neighborhood containing a third pixel point in a second right image, and determining a first target pixel point and a second target pixel point which are matched with the first candidate feature point and the first pixel point in the first neighborhood and the second neighborhood respectively; and judging whether the correlation between the first candidate feature point, the first pixel point, the first target pixel point and the second target pixel point is greater than a set threshold value or not, if so, determining that the first candidate feature point, the first pixel point, the first target pixel point and the second target pixel point are second target feature points, moving the second feature points out of the second feature point set, updating the second target feature points into second feature points and adding the second feature points into the second feature point set.

After corresponding feature points in the first pair of sample images and the second pair of sample images are determined through sparse feature point matching, the feature points can be used as seed points and can grow around the seed points to determine more seed points, and therefore training of the model is facilitated.

When the growth is carried out around the seed point, the specific growth strategy is to traverse the four adjacent domain pixel points of the seed point and calculate the three most relevant candidate points corresponding to each adjacent domain pixel point on the other three images. Before growing for the seed point, for convenience of subsequent operations, acquiring a disparity map corresponding to a first pair of sample images for a first left image and a first right image of the first pair of sample images.

Specifically, in order to suppress the influence of illumination variation, before determining the disparity maps corresponding to the first pair of sample images, non-parametric transformation (census transformation) may be performed on the first pair of sample images, and a classical semi-global stereo matching algorithm is applied to the first pair of sample images after the census transformation to obtain the disparity maps corresponding to the first pair of sample images. Specifically, the process of obtaining the disparity map is the prior art, and the embodiment of the present invention is not described herein again.

When growing around the seed point, the seed point is stored in the second feature point set, and whether the seed point can be grown from each second feature point in the second feature point set is determined.

The second feature point is removed from the second feature point set, and for the position of the second feature point in the first left graph, each first candidate feature point adjacent to the second feature point in the first left graph is determined, specifically, a four-neighborhood pixel point with the second feature point as the center is taken as each first candidate feature point adjacent to the second feature point, that is, the first candidate feature point is a pixel point located on the upper side, the lower side, the left side and the right side of the second feature point in the first left graph. Adding each determined first candidate feature point into a candidate set, wherein the candidate set is an intermediate set for seed point growth, removing the first candidate feature point from the candidate set, determining a first pixel point of the first candidate feature point in a first right image according to the determined disparity map aiming at the removed first candidate feature point, and determining a second pixel point and a third pixel point of the first candidate feature point in a second left image and a second right image respectively according to the first candidate feature point and the first pixel point. See in particular fig. 2.

Respectively determining a first neighborhood containing the second pixel point and a second neighborhood containing a third pixel point in a second left image and a second right image, wherein the first neighborhood can be a set consisting of the second pixel point in the second left image and four neighborhood pixel points of the second pixel point; the second neighborhood can be a set consisting of the third pixel point in the second right image and four neighborhood pixel points of the third pixel point; the four adjacent domain pixels of the second pixel and the third pixel can be pixels located on the upper side, the lower side, the left side and the right side of the second pixel and the third pixel respectively. Determining a first target pixel point and a second target pixel point which are matched with the first candidate feature point and the first pixel point in the first neighborhood and the second neighborhood respectively, wherein the determination can be performed by adopting a method in the prior art when determining the pixel points which are matched with the first candidate feature point and the first pixel point in the first neighborhood and the second neighborhood, and the embodiment of the invention is not repeated herein.

And when the relevance is greater than a set threshold value, determining the first candidate feature point, the first pixel point, the first target pixel point and the second target pixel point as second target feature points, updating the second target feature points as second feature points and adding the second target feature points into a second feature point set. And updating the second target characteristic point into a second characteristic point set so as to determine whether the second target characteristic point can grow a seed point. The confidence of the seed point can be adjusted by the threshold, the value of the threshold can be set according to actual requirements, and the threshold can be set to be 0.8 in the embodiment of the invention.

Specifically, when determining the correlation between the first candidate feature point, the first pixel point, the first target pixel point, and the second target pixel point, the calculation may be performed according to the following formula:

wherein S is_iC is the correlation of the first candidate feature point, the first pixel point, the first target pixel point and the second target pixel point;

the relevance of a first candidate characteristic point in the first left image and a first target pixel point in the second left image is obtained;

the correlation between a first pixel point in the first right image and a second target pixel point in the second right image is obtained;

the correlation between the first target pixel point in the second left image and the second target pixel point in the second right image is shown.

Specifically, the correlation criterion used is moravec non-normalized cross correlation (MNCC), which is calculated as follows:

wherein, W_pA neighborhood window with the size of 5 × 5 pixels is taken as a support domain of the pixel point;

when calculating

When, I_l(x, y) and I_r(x + dx, y + dy) respectively represents the gray value of the first candidate feature point in the first left image and the gray value of the first target pixel point in the second left image;

and

the gray value of each pixel point in the support domain of the first candidate feature point and the gray value of each pixel point in the support domain of the first target pixel point are respectively the average value.

When calculating

When, I_l(x, y) and I_r(x + dx, y + dy) respectively represents the gray value of a first pixel point in the first right image and the gray value of a second target pixel point in the second right image;

and

of each pixel in the support domain of the first pixel respectivelyThe average value of the gray values and the average value of the gray values of each pixel point in the support domain of the second target pixel point.

When calculating

When, I_l(x, y) and I_r(x + dx, y + dy) respectively represents the gray value of the first target pixel point in the second left image and the gray value of the second target pixel point in the second right image;

and

the gray value of each pixel point in the support domain of the first target pixel point and the gray value of each pixel point in the support domain of the second target pixel point are respectively the average value.

And when the first characteristic point set and the candidate set do not contain any pixel point, the seed point growing process is finished.

In order to improve the efficiency of the fighting behavior detection, in an embodiment of the present invention, before determining the second feature point in the first pair of sample images and the second pair of sample images, the method further includes:

for a first pair of sample images, judging whether the first pair of sample images contains at least two frames of images before, if so, predicting second feature points in the first pair of sample images and a second pair of sample images according to feature points in a frame image before the first pair of sample images; otherwise, respectively extracting sparse feature points in the first pair of sample images and the second pair of sample images, and determining second feature points in the first pair of sample images and the second pair of sample images through sparse feature point matching.

Therefore, in the embodiment of the present invention, the seed points of the second pair of sample images and the next frame image in the sample binocular video can be predicted by using the existing kalman filtering and other methods according to each second feature point obtained after the growth process of the seed points is finished. At this time, when determining the second feature points of the second pair of sample images and the next frame image, the predicted seed points of the second pair of sample images and the next frame image can be used for seed point growth by adopting the same method as the above method, so as to obtain the second feature points of the second pair of sample images and the next frame image.

In addition, in the embodiment of the present invention, after each consecutive first pair of sample images and second pair of sample images in the sample binocular video are obtained, the first pair of sample images and the second pair of sample images may be preprocessed, that is, epipolar line correction may be performed on the first pair of sample images and the second pair of sample images according to binocular calibration parameters corresponding to the sample binocular video, and a first left image, a first right image, a second left image, and a second right image after correction are obtained; and respectively acquiring corresponding gray images of the corrected first left image, the corrected first right image, the corrected second left image and the corrected second right image, namely acquiring a first gray left image, a first gray right image, a second gray left image and a second gray right image.

And respectively extracting each sparse feature point in the first gray left image, the first gray right image, the second gray left image and the second gray right image after epipolar line correction, wherein when the sparse feature points are extracted, a Scale-invariant feature transform (SIFT) algorithm or an ORB algorithm can be adopted to extract the sparse feature points in the image. And performing sparse feature point matching on each sparse feature point extracted from the first gray left image, the first gray right image, the second gray left image and the second gray right image.

And determining each sparse feature point group successfully matched in the first gray left image, the first gray right image, the second gray left image and the second gray right image as each second feature point through sparse feature point matching, and further determining the coordinates of each second feature point in the first gray left image, the first gray right image, the second gray left image and the second gray right image according to the pixel coordinates of the pixel points of each sparse feature point group in the image coordinate system corresponding to the pixel points in the first gray left image, the first gray right image, the second gray left image and the second gray right image.

In the embodiment of the invention, the characteristic points are taken as seed points in the first pair of sample images and the second pair of sample images, and the seed points are grown around the seed points, so that more types of seed points are obtained, and further more characteristic points are obtained. When a large amplitude motion exists in a scene, the problem that a smooth and approximate constraint strategy adopted by a general optical flow algorithm can generate a large error can be avoided due to the fact that the assumption of small displacement of optical flow calculation is violated. In addition, more feature points are obtained through the growth of the seed points, so that the three-dimensional motion situation of the target in the scene can be described more truly when the three-dimensional motion situation of the target is analyzed by the fighting detection model based on the three-dimensional motion vector field (namely the scene flow).

Example 3:

in order to accurately determine whether each first pair of sample images is the second target image, on the basis of the above embodiment, in an embodiment of the present invention, the determining the second motion magnitude mean value of the first pair of sample images includes:

The second motion vector corresponding to the second feature point may be the second motion vector determined in the foregoing embodiment, and the second motion amplitude corresponding to each second feature point may be determined according to the second motion vector corresponding to each second feature point.

In the embodiment of the invention, in order to eliminate the interference of the feature points with smaller motion amplitude values to the frame detection model, a second threshold value can be preset, second motion amplitude values with second motion amplitude values larger than the second threshold value are extracted according to second motion amplitude values corresponding to each second feature point, a motion amplitude value mean value corresponding to each second motion amplitude value is calculated according to each extracted second motion amplitude value, and the motion amplitude value mean value is used as the second motion amplitude value mean value. Wherein, the second threshold value can be set according to actual requirements.

Specifically, in order to make the second motion amplitude mean value have representativeness of the motion intensity degree, the second motion amplitudes corresponding to each second feature point may also be sorted from large to small, the first w sorted second motion amplitudes are obtained, and the mean value of the w second motion amplitudes is taken as the second motion amplitude mean value. Wherein w is a positive integer. And screening out second motion amplitude values with the second motion amplitude values larger than a preset second threshold value, and taking the average value of the screened second motion amplitude values as a second motion amplitude value mean value, so that the second motion amplitude value mean value has the representativeness of the motion intensity degree.

Example 4:

in order to accurately determine whether each first pair of sample images is the second target image, on the basis of the above embodiment, in an embodiment of the present invention, the projecting the second motion vector direction according to each second feature point onto a grid corresponding to a preset grid space, and determining the second motion direction entropy of the first pair of sample images includes:

And projecting the second motion vector direction of each target second characteristic point to a preset grid space, and determining a second motion direction entropy value of the first pair of sample images according to the grid corresponding to the second motion vector direction of each target second characteristic point.

The second motion direction entropy value may be determined according to each feature point, but in order to determine whether each first pair of sample images is a second target image more accurately, according to a second motion amplitude value corresponding to each second feature point, each target second feature point whose corresponding second motion amplitude value is greater than a preset third threshold value is obtained, and according to a grid corresponding to a second motion vector direction of each target second feature point, the second motion direction entropy value of the first pair of sample images is determined.

Specifically, fig. 3A is a schematic grid space diagram provided by an embodiment of the present invention, where a cube is an inscribed cube of a sphere, each face of the cube is divided into a plurality of square grids with the same number, and the area of each square grid divided in each face is also the same. For each square grid on each face of the cube, an angle proportion value corresponding to the square grid can be determined according to a ratio of a second difference value corresponding to the second solid angle range of the square grid in the face to a first difference value corresponding to the first solid angle range of the face. Each square grid on each face of the cube is projected onto the spherical surface of the external sphere to obtain a grid space shown in fig. 3B, and the angle proportion value corresponding to each square grid in fig. 3A is used as the angle proportion value corresponding to each grid space in fig. 3B. Specifically, the method for calculating the solid angle range is the prior art, and the embodiment of the present invention is not described herein again.

Specifically, when the square grids are divided, a single face in the cube can be divided into square grids of different numbers according to requirements, and when the accuracy of calculating the second motion direction entropy is required to be improved, a larger number of square grids can be divided into the single face in the cube; when it is required to increase the speed of calculating the second direction of motion entropy, a smaller number of square meshes may be divided in a single face in the cube. The technical personnel in the field can set according to the actual requirement.

Since the second motion vector direction corresponding to each obtained target second feature point may be any direction in a three-dimensional space, in order to obtain the motion vector direction distribution of each target second feature point, the second motion vector direction of each target second feature point is projected into the grid space shown in fig. 3B, the grid to which each second motion vector direction is projected is determined, and the second motion direction entropy of the first pair of sample images is determined according to the angle ratio corresponding to each grid to which the second motion vector direction is projected.

When the mesh to which each second motion vector direction is projected is determined, for each target second feature point, a straight line may be determined with the center of sphere as a starting point and the second motion vector direction of the target second feature point as a direction, and the mesh where the intersection point of the straight line and the mesh space in the sphere is located is projected as the corresponding mesh in the mesh space in the second motion vector direction. And for each target second characteristic point, counting the angle proportion value of the grid to which the second motion vector direction of the target second characteristic point is projected. And determining a second motion direction entropy value of the first pair of sample images according to the angle proportion value of each grid projected by the second motion vector direction of each target second feature point.

Specifically, assuming that the number of the determined target second feature points is k, the second motion direction entropy value E of the first pair of sample images may be calculated using the following formula.

Wherein k and j are positive integers, j is not more than k, j represents the jth target second characteristic point, and p_jAnd representing the angle proportion value of the grid to which the second motion vector direction of the jth target second feature point is projected.

Correspondingly, when determining the second influence area of the first pair of sample images on the ground, respectively obtaining the third three-dimensional coordinate and the fourth three-dimensional coordinate of each second feature point in a world coordinate system, projecting the third three-dimensional coordinate and the fourth three-dimensional coordinate of each second feature point to the ground, and determining the second influence area of the first pair of sample images on the ground according to the number of ground grids corresponding to each third three-dimensional coordinate and each fourth three-dimensional coordinate includes:

In order to more accurately determine whether each first pair of sample images is a second target image, when a second influence area of the first pair of sample images on the ground is obtained, the second influence area of the first pair of sample images on the ground is determined according to the number of corresponding ground grids projected by the screened third three-dimensional coordinates and the screened fourth three-dimensional coordinates of each target second feature point in the world coordinate system.

In dividing the ground grid, the ground grid may be divided into square grids of equal area, and the area of each square grid may be between 4 square centimeters and 25 square centimeters. Specifically, when determining the second area of influence of the first pair of sample images on the ground, the third three-dimensional coordinate and the fourth three-dimensional coordinate of each target second feature point may be projected onto a corresponding ground grid on the ground to be marked, the maximum connected domain in the marked ground grid is determined, and then the product of the number of ground grids included in the maximum connected domain and the area of each ground grid is used as the second area of influence of the first pair of sample images on the ground.

Example 5:

fig. 4 is a flowchart of a method for detecting a fighting behavior according to an embodiment of the present invention, where the method includes the following steps:

s401: the following processing is carried out on a first pair of images and a second pair of images which are continuous in the binocular video: and determining a first motion vector corresponding to each first feature point according to a coordinate pair of each first feature point in the first pair of images and the second pair of images and a first calibration parameter corresponding to the binocular video aiming at each first feature point in the first pair of images and the second pair of images, and determining a first motion vector direction of each first feature point and a first motion amplitude mean value of the first pair of images.

Each frame of image of each time point corresponding to the binocular video comprises a left image and a right image, so that a pair of images, namely the left image and the right image, is provided for each time point, and the following processing is performed on two continuous frames of images in the binocular video in the embodiment of the invention. Specifically, the first pair of images and the second pair of images may be any two consecutive frames of images in the binocular video, and the first pair of images is a frame of image before the second pair of images. Wherein the first pair of images may comprise a third left image and a third right image, and the second pair of images may comprise a fourth left image and a fourth right image.

After the first pair of images and the second pair of images are determined, coordinates of each feature point in the first pair of images and the second pair of images are determined through sparse feature point matching, specifically, for each feature point, a coordinate of the feature point in a third left image and a coordinate of a third right image in the first pair of images are determined, namely, a third coordinate pair of the feature point in the first pair of images is determined, and similarly, a fourth coordinate pair of the feature point in the second pair of images is determined.

For example, the coordinates of a pixel point corresponding to a certain first feature point in the third left image, the third right image, the fourth left image and the fourth right image in the image are respectively:

and

wherein, because of the feature point, the vertical coordinates of the corresponding pixel points in the third left image and the third right image are the same, and the vertical coordinates of the corresponding pixel points in the fourth left image and the fourth right image are also the same, that is,

and is

For convenience of presentation, order

At this time, the third coordinate pair may be expressed as

The fourth coordinate pair may be expressed as

In the embodiment of the invention, first calibration parameters corresponding to different binocular videos are pre-stored aiming at the different binocular videos. The first calibration parameter is an internal calibration parameter and an external calibration parameter of a binocular camera corresponding to the binocular video. The process of calibrating the binocular camera and acquiring the first calibration parameter belongs to the prior art, and is not described in detail in the embodiment of the invention.

According to the three-dimensional reconstruction principle of the binocular camera and the first calibration parameters of the binocular camera corresponding to the binocular video, the third coordinate pair of the first feature point can be utilized to determine the fifth three-dimensional coordinate of the third coordinate pair under the camera coordinate system, and the fourth coordinate pair of the first feature point is utilized to determine the sixth three-dimensional coordinate of the fourth coordinate pair under the camera coordinate system.

Specifically, knowing a first calibration parameter, determining a coordinate pair in an image coordinate system, wherein the relationship between the coordinate pair and a three-dimensional coordinate in a camera coordinate system satisfies the following formula: suppose a first feature point is point q and its third coordinate pair is

The fourth coordinate pair is

The third coordinate of the first feature point corresponds to a fifth three-dimensional coordinate

Comprises the following steps:

similarly, the fourth coordinate of the first feature point corresponds to a sixth three-dimensional coordinate

Comprises the following steps:

wherein, B, u₀And v₀The parameters are specific parameters in first calibration parameters of the binocular video, specifically, B is the center distance of two cameras in the binocular video, f is the focal length of the binocular video, u₀And v₀The origin of the image coordinate system corresponds to the abscissa and ordinate in the pixel coordinate system, respectively.

Since the first pair of images and the second pair of images are two continuous frames of images in the binocular video, for the convenience of representation, the first pair of images is assumed to be corresponding to t in the binocular video₃The image of the moment, the second pair of images are corresponding to t in the binocular video₄An image of a time; the fifth three-dimensional coordinate and the sixth three-dimensional coordinate of each first feature point are respectively that the first feature point is at t₃Time t and₄the corresponding three-dimensional coordinates at the moment. The method for determining the first motion vector corresponding to each first feature point according to the fifth three-dimensional coordinate and the sixth three-dimensional coordinate of each first feature point may be directed toAnd taking the fifth three-dimensional coordinate of each first feature point as a starting point and the sixth three-dimensional coordinate as an end point, acquiring a first motion vector of the first feature point, and further acquiring the first motion vector of each first feature point, wherein the first motion vector direction of each first feature point is the direction in which the corresponding fifth three-dimensional coordinate points to the sixth three-dimensional coordinate.

Specifically, assume that there are n first feature points in the first pair of images and the second pair of images, where the ith first feature point q is_iHas a fifth three-dimensional coordinate of (X)₅,Y₅,Z₅) The sixth three-dimensional coordinate is (X)₆,Y₆,Z₆) (ii) a The first motion vector of the first feature point is (dX)_i,dY_i,dZ_i) Wherein, dX_i＝X₆-X₅，dY_i＝Y₆-Y₅，dZ_i＝Z₆-Z₅. At this time, the first feature point q_iOf a first motion amplitude f_iCan be calculated by the following formula:

for the n first feature points, after the first motion amplitude of each first feature point is determined, the mean value of the first motion amplitudes of the first pair of images may be calculated by the following formula:

wherein n and i are positive integers, and i is less than or equal to n.

S402: and determining a first motion direction entropy value of the first pair of images according to the projection of the first motion vector direction of each first feature point to a grid corresponding to a preset grid space.

After the first motion vector direction of each feature point is determined, each feature point is arranged at the center of a cube, and for each feature point, according to the first motion vector direction of the feature point, which grid the first motion vector direction of the feature point is projected to a grid space of the cube corresponds to is determined, that is, a straight line is determined by taking the center of the cube as a starting point and the first motion vector direction as a direction, and the grid where the intersection point of the straight line and the grid space of the cube is located is taken as the grid corresponding to the preset grid space to which the first motion vector direction is projected.

The first motion vector direction of the first feature point is projected to a preset grid space, after a grid corresponding to the first motion vector direction of each first feature point is determined, the first motion direction entropy of the first pair of images may be determined according to the angle proportional value corresponding to each grid, for example, the sum of the angle proportional values corresponding to each grid on which the first motion vector direction is projected may be used as the first motion direction entropy of the first pair of images.

S403: and determining a first influence area of the first pair of images on the ground according to the number of ground grids corresponding to the ground by projecting the three-dimensional coordinates of each first characteristic point in the world coordinate system.

Since the three-dimensional coordinates of each feature point in the camera coordinate system have already been acquired, the relationship between the camera coordinate system and the world coordinate system can be predicted, and therefore, from the fifth three-dimensional coordinates and the sixth three-dimensional coordinates of each first feature point in the camera coordinate system, the seventh three-dimensional coordinates and the eighth three-dimensional coordinates corresponding to the fifth three-dimensional coordinates and the sixth three-dimensional coordinates, respectively, can be acquired in the world coordinate system.

Respectively projecting the seventh three-dimensional coordinate and the eighth three-dimensional coordinate of each acquired first feature point in a world coordinate system onto the ground, marking the projected ground grids, counting the number of marked ground grids projected onto the ground by aiming at each feature point, determining the projection area corresponding to the feature point according to the number of the ground grids obtained by counting and the area of each ground grid, and determining the first influence area of the first pair of images on the ground according to the sum of the projection areas corresponding to each feature point in the first pair of images.

S404: for each first pair of images, judging whether a first motion amplitude value mean value, a first motion direction entropy value and a first influence area of each first pair of images are all larger than a corresponding preset first threshold value, and if so, marking the first pair of images as first target images; each first duration is determined based on the marked consecutive first target images.

A motion amplitude mean value threshold, a motion direction entropy threshold and an influence area threshold are pre-stored in the electronic equipment, and for each first pair of images in the binocular video, a first motion amplitude mean value of the first pair of images is obtained

After the first motion direction entropy value E and the first influence area S, first, the first motion amplitude mean value of the first pair of images is judged

Whether the first motion direction entropy value E and the first influence area S are respectively larger than the motion amplitude mean value threshold, the motion direction entropy value threshold and the influence area threshold.

If the mean value of the first motion amplitudes of a certain first pair of images

If the first pair of images is larger than the motion amplitude mean value threshold, the first motion direction entropy value E is larger than the motion direction entropy value threshold, and the first influence area S is larger than the influence area threshold, the first pair of images are marked as target images, otherwise, the first pair of images are not marked.

Whether each first pair of images is marked or not can be determined by adopting the method, after each first target image is determined, a plurality of first target images possibly exist in the images of the continuous frames in the binocular video, and the continuous first target images possibly exist, so that each image group can be determined, and the first time length corresponding to each image group can be determined according to the number of the first target images contained in each image group, wherein each image group comprises at least one frame of the first target images.

For example, in the binocular video, the images marked as the first target image are the second frame image, the third frame image, the fourth frame image, the sixth frame image, the eighth frame image, the ninth frame image, and so on, respectively, then the second frame image, the third frame image, and the fourth frame image may be determined as an image group, a first duration may be determined, the sixth frame image may be an image group, a first duration may be determined, and the eighth frame image, and the ninth frame image may be another image group, another first duration may be determined.

S405: for each first time length, determining a first target motion amplitude mean value, a first target motion direction entropy value and a first target influence area according to each first target image corresponding to the first time length; and determining whether the frame-fighting behavior of the video corresponding to the first time length occurs according to the pre-trained frame-fighting detection model, the first target motion amplitude mean value, the first target motion direction entropy value, the first target influence area and the first time length.

For each first duration, when determining the first target motion amplitude mean value, the first target motion direction entropy value and the first target influence area according to each first target image corresponding to the first duration, there may be a plurality of implementation manners. For example: for each first duration, respectively taking the maximum value of the first motion amplitude mean value, the first motion direction entropy value and the first influence area corresponding to each first target image as the first target motion amplitude mean value, the first target motion direction entropy value and the first target influence area according to the corresponding first motion amplitude mean value, the first motion direction entropy value and the first influence area of each first target image; or respectively calculating the average value of the first motion amplitude mean value, the average value of the first motion direction entropy value and the average value of the first influence area according to the corresponding first motion amplitude mean value, the first motion direction entropy value and the first influence area of each first target image, and respectively taking the average values as the first target motion amplitude mean value, the first target motion direction entropy value and the first target influence area.

When determining whether a frame-fighting behavior occurs in the video corresponding to the first duration, inputting the first target motion amplitude mean value, the first target motion direction entropy value, the first target influence area and the first duration into a pre-trained frame-fighting detection model, outputting the probability of the frame-fighting behavior occurring in each continuous first target image by the frame-fighting detection model, and when the probability is greater than a preset frame-fighting probability threshold, determining that the frame-fighting behavior occurs in the video corresponding to the continuous first target images of the current input model.

In the embodiment of the invention, by aiming at each first feature point in two continuous images in a binocular video, according to a third coordinate pair and a fourth coordinate pair of the feature point in the images, a fifth three-dimensional coordinate and a sixth three-dimensional coordinate of the feature point in a camera coordinate system, and a seventh three-dimensional coordinate and an eighth three-dimensional coordinate of each first feature point in a world coordinate system are determined, so as to further determine a first motion amplitude mean value, a first motion direction entropy value and a first influence area of each feature point, aiming at each first pair of images, whether each first pair of images is a first target image is judged, and each first time length is determined by using the continuous first target images, so as to input the pre-trained framing detection model according to the first target motion amplitude mean value, the first target motion direction entropy value, the first target influence area and the first time length corresponding to each first time length, and further determining whether the frame-fighting behavior occurs in the video corresponding to the first time length. The three-dimensional motion condition of each first characteristic point in a three-dimensional scene can be accurately obtained by the framing detection model, and the real three-dimensional motion can be accurately obtained according to the actual physical size; by determining the first target motion amplitude mean value, the first target motion direction entropy value, the first target influence area and the first time length corresponding to the first time length, the fighting detection model can analyze the three-dimensional motion condition of the target based on the three-dimensional motion vector field (namely scene flow), so as to accurately judge whether fighting behaviors occur, and improve the accuracy of fighting behavior detection.

Example 6:

in order to enable the fighting detection model to judge the fighting behavior more accurately, on the basis of the above embodiment, in an embodiment of the present invention, before determining the coordinate pair of each first feature point in the first pair of images and the second pair of images, the method further includes:

for each first feature point in the first feature point set, determining each second candidate feature point adjacent to the first feature point in a third left image in the first pair of images according to the first feature point of the third left image; adding each second candidate feature point to the candidate set; for each second candidate feature point in the candidate set, determining a fourth pixel point corresponding to the second candidate feature point in a third right image according to the disparity map, and determining a fifth pixel point and a sixth pixel point corresponding to the second candidate feature point in a fourth left image and a fourth right image according to the second candidate feature point and the fourth pixel point respectively; determining a third neighborhood containing the fifth pixel point in a fourth left image, determining a fourth neighborhood containing a sixth pixel point in a fourth right image, and determining a third target pixel point and a fourth target pixel point which are matched with the second candidate feature point and the fourth pixel point in the third neighborhood and the fourth neighborhood respectively; and determining the correlation among the second candidate feature point, the fourth pixel point, the third target pixel point and the fourth target pixel point, judging whether the correlation is greater than a set threshold value, if so, determining the second candidate feature point, the fourth pixel point, the third target pixel point and the fourth target pixel point as first target feature points, moving the first feature points out of the first feature point set, updating the first target feature points into the first feature points and adding the first feature points into the first feature point set.

After the corresponding feature points in the first pair of images and the second pair of images are determined through sparse feature point matching, the feature points can be used as seed points and can grow around the seed points to determine more seed points, so that the frame-fighting detection model can more accurately detect whether frame-fighting behaviors occur.

When the growth is carried out around the seed point, the specific growth strategy is to traverse the four adjacent domain pixel points of the seed point and calculate the three most relevant candidate points corresponding to each adjacent domain pixel point on the other three images. Before growing for the seed point, for convenience of subsequent operations, acquiring a disparity map corresponding to the first pair of images for a third left image and a third right image of the first pair of images.

Specifically, in order to suppress the influence of illumination variation, before determining the disparity maps corresponding to the first pair of images, non-parametric transformation (census transformation) may be performed on the first pair of images, and a classical semi-global stereo matching algorithm may be used to obtain the disparity maps corresponding to the first pair of images after census transformation. Specifically, the process of obtaining the disparity map is the prior art, and the embodiment of the present invention is not described herein again.

When growing around the seed point, the seed point is stored in the first feature point set, and whether the first feature point can grow out the seed point is determined for each first feature point in the first feature point set.

The first feature point is removed from the first feature point set, and for the position of the first feature point in the third left image, each second candidate feature point adjacent to the first feature point in the third left image is determined, specifically, a four-neighborhood pixel point with the first feature point as the center is taken as each second candidate feature point adjacent to the first feature point, that is, the second candidate feature point is a pixel point located on the upper side, the lower side, the left side and the right side of the first feature point in the third left image. Adding each determined second candidate feature point into a candidate set, wherein the candidate set is an intermediate set for seed point growth, removing the second candidate feature points from the candidate set, determining fourth pixel points of the second candidate feature points in a third right image according to the determined disparity map aiming at the removed second candidate feature points, and determining fifth pixel points and sixth pixel points of the second candidate feature points in a fourth left image and a fourth right image according to the second candidate feature points and the fourth pixel points respectively. See in particular fig. 2.

Determining a third neighborhood containing the fifth pixel point and a fourth neighborhood containing a sixth pixel point in a fourth left image and a fourth right image respectively, wherein the third neighborhood can be a set formed by the fifth pixel point in the fourth left image and the fourth neighborhood pixel point of the fifth pixel point; the fourth neighborhood can be a set composed of the sixth pixel in the fourth right image and the four neighborhood pixels of the sixth pixel; the four adjacent domain pixels of the fifth pixel and the sixth pixel can be pixels located at the upper, lower, left and right sides of the fifth pixel and the sixth pixel respectively. And determining a fifth target pixel point and a sixth target pixel point which are matched with the second candidate feature point and the fourth pixel point in a third neighborhood and a fourth neighborhood respectively, wherein the determination can be performed by adopting a method in the prior art when determining the pixel points which are matched with the second candidate feature point and the fourth pixel point in the third neighborhood and the fourth neighborhood, and the embodiment of the invention is not repeated herein.

And when the relevance is greater than a set threshold value, determining the second candidate feature point, the fourth pixel point, the third target pixel point and the fourth target pixel point as first target feature points, updating the first target feature points as first feature points and adding the first target feature points into the first feature point set. The first target feature point is updated into a first set of feature points to determine whether the first target feature point is capable of growing a seed point. The confidence of the seed point can be adjusted by the threshold, the value of the threshold can be set according to actual requirements, and the threshold can be set to be 0.8 in the embodiment of the invention.

Specifically, when determining the correlation between the second candidate feature point, the fourth pixel point, the third target pixel point, and the fourth target pixel point, the calculation may be performed according to the following formula:

wherein S is_iC is the correlation of the second candidate feature point, the fourth pixel point, the third target pixel point and the fourth target pixel point;

the correlation between a second candidate characteristic point in a third left image and a third target pixel point in a fourth left image is obtained;

the correlation between a fourth pixel point in the third right image and a fourth target pixel point in the fourth right image is obtained;

and the correlation between the third target pixel point in the fourth left image and the fourth target pixel point in the fourth right image is shown.

wherein, W_pThe support domain of the pixel point is taken as a neighborhood window with the size of 5 × 5 pixels；

When calculating

When, I_l(x, y) and I_r(x + dx, y + dy) respectively represents the gray value of the second candidate feature point in the third left image and the gray value of the third target pixel point in the fourth left image;

and

the gray value of each pixel point in the support domain of the second candidate feature point and the gray value of each pixel point in the support domain of the third target pixel point are respectively the average value.

When calculating

When, I_l(x, y) and I_r(x + dx, y + dy) respectively represents the gray value of a fourth pixel point in the third right image and the gray value of a fourth target pixel point in the fourth right image;

and

the gray value of each pixel point in the support domain of the fourth pixel point and the gray value of each pixel point in the support domain of the fourth target pixel point are respectively the average value.

When calculating

When, I_l(x, y) and I_r(x + dx, y + dy) respectively represents the gray value of the third target pixel point in the fourth left image and the gray value of the fourth target pixel point in the fourth right image;

and

the gray value of each pixel point in the support domain of the third target pixel point and the gray value of each pixel point in the support domain of the fourth target pixel point are respectively the average value.

In order to improve the efficiency of the fighting behavior detection, in an embodiment of the present invention, before determining the first feature point in the first pair of images and the second pair of images, the method further includes:

for a first pair of images, judging whether the first pair of images contains at least two frames of images before, if so, predicting first feature points in the first pair of images and a second pair of images according to feature points in the previous frame of images of the first pair of images; otherwise, respectively extracting sparse feature points in the first pair of images and the second pair of images, and determining first feature points in the first pair of images and the second pair of images through sparse feature point matching.

Therefore, in the embodiment of the present invention, the seed points of the second pair of images and the next frame image in the binocular video can be predicted by using the existing kalman filtering and other methods according to each first feature point obtained after the growth process of the seed points is finished. At this time, when the first feature points of the second pair of images and the next frame of image are determined, the predicted seed points of the second pair of images and the next frame of image can be used for seed point growth by adopting the same method as the above method, and further the first feature points of the second pair of images and the next frame of image can be obtained.

In addition, in the embodiment of the present invention, each consecutive first pair of images and second pair of images in the binocular video may be preprocessed, that is, the first pair of images and the second pair of images may be subjected to epipolar line correction according to a binocular calibration parameter corresponding to the binocular video, and a corrected third left image, a corrected third right image, a corrected fourth left image, and a corrected fourth right image are obtained; and acquiring corresponding gray level images of the corrected third left image, third right image, fourth left image and fourth right image, namely acquiring a third gray level left image, a third gray level right image, a fourth gray level left image and a fourth gray level right image.

And respectively extracting each sparse feature point in the third gray Scale left image, the third gray Scale right image, the fourth gray Scale left image and the fourth gray Scale right image after epipolar line correction, wherein when the sparse feature points are extracted, a Scale-invariant feature transform (SIFT) algorithm or an ORB algorithm can be adopted to extract the sparse feature points in the image. And performing sparse feature point matching on each sparse feature point extracted from the third gray left image, the third gray right image, the fourth gray left image and the fourth gray right image.

And determining each sparse feature point group successfully matched in the third gray left image, the third gray right image, the fourth gray left image and the fourth gray right image as each first feature point through sparse feature point matching, and further determining the coordinates of each first feature point in the third gray left image, the third gray right image, the fourth gray left image and the fourth gray right image according to the pixel coordinates of corresponding pixel points in the third gray left image, the third gray right image, the fourth gray left image and the fourth gray right image in an image coordinate system according to each sparse feature point group.

In the embodiment of the invention, the characteristic points are taken as seed points in the first pair of images and the second pair of images, and the characteristic points grow around the seed points, so that more kinds of seed points are obtained, and further more characteristic points are obtained. When a large amplitude motion exists in a scene, the problem that a smooth and approximate constraint strategy adopted by a general optical flow algorithm can generate a large error can be avoided due to the fact that the assumption of small displacement of optical flow calculation is violated. In addition, more feature points are obtained through the growth of the seed points, so that the three-dimensional motion situation of the target in the scene can be described more truly when the three-dimensional motion situation of the target is analyzed by the fighting detection model based on the three-dimensional motion vector field (namely the scene flow).

Example 7:

in order to accurately determine whether each first pair of images is the first target image, on the basis of the above embodiment, in an embodiment of the present invention, the determining the first motion amplitude mean value of the first pair of images includes:

In the embodiment of the invention, in order to eliminate the interference of the feature points with smaller motion amplitude values to the frame detection model, a second threshold value can be preset, first motion amplitude values with first motion amplitude values larger than the second threshold value are extracted according to first motion amplitude values corresponding to each first feature point, a motion amplitude value mean value corresponding to each extracted first motion amplitude value is calculated, and the motion amplitude value mean value is taken as the first motion amplitude value mean value. Wherein, the second threshold value can be set according to actual requirements.

Specifically, in order to make the first motion amplitude mean value have representativeness of the motion intensity degree, the first motion amplitudes corresponding to each first feature point may also be sorted from large to small, the first w sorted first motion amplitudes are obtained, and the mean value of the w first motion amplitudes is taken as the first motion amplitude mean value. Wherein w is a positive integer. The first motion amplitude value which is larger than the preset second threshold value is screened out, and the average value of the screened out first motion amplitude values is used as the first motion amplitude value average value, so that the first motion amplitude value average value has the representativeness of the motion intensity degree.

Example 8:

in order to accurately determine whether each first pair of images is the first target image, on the basis of the foregoing embodiment, in an embodiment of the present invention, the projecting the first motion vector direction of each first feature point into a preset grid space, and determining the first motion direction entropy of the first pair of images according to the grid corresponding to the first motion vector direction of the first feature point includes:

And projecting the first motion vector direction of each target first characteristic point to a preset grid space, and determining the first motion direction entropy value of the first pair of images according to the grid corresponding to the first motion vector direction of each target first characteristic point.

The first motion direction entropy value may be determined according to each feature point, but in order to determine whether each first pair of images is a first target image more accurately, according to the first motion amplitude value corresponding to each first feature point, each target first feature point whose corresponding first motion amplitude value is greater than a preset third threshold value is obtained, and according to the grid corresponding to the first motion vector direction of each target first feature point, the first motion direction entropy value of the first pair of images is determined.

Specifically, see fig. 3A, in which a cube is an inscribed cube of a sphere, each face of the cube is divided into a plurality of square grids equal in number, and the area of each square grid divided in each face is also equal. For each square grid on each face of the cube, an angle proportion value corresponding to the square grid can be determined according to a ratio of a second difference value corresponding to the second solid angle range of the square grid in the face to a first difference value corresponding to the first solid angle range of the face. Each square grid on each face of the cube is projected onto the spherical surface of the external sphere to obtain a grid space shown in fig. 3B, and the angle proportion value corresponding to each square grid in fig. 3A is used as the angle proportion value corresponding to each grid space in fig. 3B. Specifically, the method for calculating the solid angle range is the prior art, and the embodiment of the present invention is not described herein again.

Specifically, when the square grids are divided, a single face in the cube can be divided into square grids of different numbers according to requirements, and when the accuracy of calculating the entropy value of the first motion direction is required to be improved, a larger number of square grids can be divided into the single face in the cube; when it is required to increase the speed of calculating the first direction of motion entropy, a smaller number of square grids may be divided in a single face in the cube. The technical personnel in the field can set according to the actual requirement.

Since the first motion vector direction corresponding to each obtained target first feature point may be any direction in a three-dimensional space, in order to obtain the motion vector direction distribution of each fourth target feature point, the first motion vector direction of each target first feature point is projected into the grid space shown in fig. 3B, the grid to which each first motion vector direction is projected is determined, and the first motion direction entropy of the first pair of images is determined according to the angle ratio corresponding to each grid to which the first motion vector direction is projected.

Specifically, when the mesh to which each first motion vector direction is projected is determined, a straight line may be determined for each target first feature point, with the center of the sphere as a starting point and the first motion vector direction of the target first feature point as a direction, and the mesh where an intersection point of the straight line and the mesh space in the sphere is located is projected as the corresponding mesh in the mesh space in the first motion vector direction. And for each target first characteristic point, counting the angle proportion value of the grid to which the first motion vector direction of the target first characteristic point is projected. And determining a first motion direction entropy value of the first pair of images according to the angle proportion value of each grid projected by the first motion vector direction of each target first feature point.

Specifically, assuming that the number of the determined target first feature points is m, the following formula may be adopted to calculate the first motion direction entropy value E of the first pair of images.

Wherein m and h are positive integers, h is less than or equal to m, h represents the first characteristic point of the h-th target, and p_hAnd representing the angle proportion value of the grid to which the first motion vector direction of the h-th target first feature point is projected.

Correspondingly, when determining the first influence area of the first pair of images on the ground, the obtaining a seventh three-dimensional coordinate and an eighth three-dimensional coordinate of each first feature point in a world coordinate system, projecting the seventh three-dimensional coordinate and the eighth three-dimensional coordinate of each first feature point to the ground, and determining the first influence area of the first pair of images on the ground according to the number of ground grids corresponding to each seventh three-dimensional coordinate and the eighth three-dimensional coordinate includes:

In order to more accurately determine whether each first pair of images is a first target image, when a first influence area of the first pair of images on the ground is obtained, the first influence area of the first pair of images on the ground is determined according to the number of corresponding ground grids on which the seventh three-dimensional coordinate and the eighth three-dimensional coordinate of each selected first feature point of each target in a world coordinate system are projected to the ground.

In dividing the ground grid, the ground grid may be divided into square grids of equal area, and the area of each square grid may be between 4 square centimeters and 25 square centimeters. Specifically, when determining the first area of influence of the first pair of images on the ground, the seventh three-dimensional coordinate and the eighth three-dimensional coordinate of each target first feature point may be projected to a corresponding ground grid on the ground to be marked, the maximum connected domain in the marked ground grid is determined, and then the product of the number of ground grids included in the maximum connected domain and the area of each ground grid is used as the first area of influence of the first pair of images on the ground.

Example 9:

fig. 5 is a schematic structural diagram of a device for detecting a fighting behavior in a video according to an embodiment of the present invention, where the device for detecting a fighting behavior includes:

an obtaining module 51, configured to perform the following processing on a first pair of images and a second pair of images that are consecutive in a binocular video: for each first feature point in the first pair of images and the second pair of images, determining a first motion vector corresponding to each first feature point according to a coordinate pair of each first feature point in the first pair of images and the second pair of images and a first calibration parameter corresponding to the binocular video, and determining a first motion vector direction of each first feature point and a first motion amplitude mean value of the first pair of images; projecting the first motion vector direction of each first feature point to a grid corresponding to a preset grid space, and determining a first motion direction entropy value of the first pair of images; determining a first influence area of the first pair of images on the ground according to the number of ground grids corresponding to the ground projected by the three-dimensional coordinates of each first feature point in the world coordinate system;

a duration determining module 52, configured to determine, for each first pair of images, whether a first motion amplitude mean value, a first motion direction entropy value, and a first influence area of the first pair of images are greater than corresponding preset first thresholds, and if so, mark the first pair of images as a first target image; determining each first time length according to the marked continuous first target images;

the framing behavior determining module 53 is configured to determine, for each first duration, a first target motion amplitude mean value, a first target motion direction entropy value, and a first target influence area according to each first target image corresponding to the first duration; and determining whether the frame-fighting behavior of the video corresponding to the first time length occurs according to the pre-trained frame-fighting detection model, the first target motion amplitude mean value, the first target motion direction entropy value, the first target influence area and the first time length.

Specifically, the fighting behavior detection device further comprises:

a training module 54, configured to perform the following processing on each consecutive first pair of sample images and second pair of sample images in the sample binocular video: for each second feature point in the first pair of sample images and the second pair of sample images, determining a second motion vector corresponding to each second feature point according to a coordinate pair of each second feature point in the first pair of sample images and the second pair of sample images and a second calibration parameter corresponding to the sample binocular video, and determining a second motion vector direction of each second feature point and a second motion amplitude mean value of the first pair of sample images; projecting the second motion vector direction of each second feature point to a grid corresponding to a preset grid space, and determining a second motion direction entropy value of the first pair of sample images; respectively acquiring the number of ground grids corresponding to the projection of the three-dimensional coordinates of each second feature point in the world coordinate system to the ground, and determining a second influence area of the first pair of sample images on the ground; for each first pair of sample images, judging whether a second motion amplitude value mean value, a second motion direction entropy value and a second influence area of the first pair of sample images are all larger than a corresponding preset second threshold value, and if so, marking the first pair of sample images as a second target image; determining each second time length according to the marked continuous second target images; and aiming at each second duration, determining a second target motion amplitude mean value, a second target motion direction entropy value and a second target influence area according to each second target image corresponding to the second duration, and training the fighting detection model according to the second target motion amplitude mean value, the second target motion direction entropy value, the second target influence area, the second duration and identification information of whether each frame of image in the sample binocular video has fighting behavior.

Specifically, the training module 54 is further configured to determine that each second feature point is in front of a coordinate pair in the first pair of sample images and the second pair of sample images, and obtain a disparity map corresponding to the first pair of sample images according to a first left image and a first right image of the first pair of sample images; for each second feature point in the second feature point set, determining each first candidate feature point adjacent to the second feature point in the first left image according to the second feature point of the first left image in the first pair of sample images; adding each first candidate feature point to a candidate set; for each first candidate feature point in the candidate set, determining a first pixel point corresponding to the first candidate feature point in a first right image according to the disparity map, and determining a second pixel point and a third pixel point corresponding to the first candidate feature point in a second left image and a second right image of a second pair of sample images according to the first candidate feature point and the first pixel point respectively; determining a first neighborhood containing the second pixel point in a second left image, determining a second neighborhood containing a third pixel point in a second right image, and determining a first target pixel point and a second target pixel point which are matched with the first candidate feature point and the first pixel point in the first neighborhood and the second neighborhood respectively; and judging whether the correlation between the first candidate feature point, the first pixel point, the first target pixel point and the second target pixel point is greater than a set threshold value or not, if so, determining that the first candidate feature point, the first pixel point, the first target pixel point and the second target pixel point are second target feature points, moving the second feature points out of the second feature point set, updating the second target feature points into second feature points and adding the second feature points into the second feature point set.

Specifically, the training module 54 is specifically configured to obtain a second motion amplitude corresponding to each second feature point according to a second motion vector corresponding to each second feature point; and extracting each second motion amplitude larger than a preset second threshold value, and determining a second motion amplitude mean value of the first pair of sample images according to each extracted second motion amplitude.

Specifically, the training module 54 is specifically configured to obtain each target second feature point, where a corresponding second motion amplitude is greater than a preset third threshold, according to a second motion amplitude corresponding to each second feature point; and projecting the second motion vector direction of each target second feature point to a preset grid space, and determining the second motion direction entropy of the first pair of sample images according to the corresponding grids.

Specifically, the training module 54 is specifically configured to obtain a third three-dimensional coordinate and a fourth three-dimensional coordinate of each target second feature point in a world coordinate system according to each second target second feature point, project the third three-dimensional coordinate and the fourth three-dimensional coordinate of each target second feature point to the ground, and determine a second area of influence of the first pair of sample images on the ground according to the number of ground grids corresponding to each third three-dimensional coordinate and each fourth three-dimensional coordinate.

Specifically, the obtaining module 51 is further configured to, before determining the coordinate pair of each first feature point in the first pair of images and the second pair of images, obtain a disparity map corresponding to the first pair of images according to a third left image and a third right image of the first pair of images; for each first feature point in the first feature point set, determining each second candidate feature point adjacent to the first feature point in a third left image in the first pair of images according to the first feature point of the third left image; adding each second candidate feature point to the candidate set; for each second candidate feature point in the candidate set, determining a fourth pixel point corresponding to the second candidate feature point in a third right image according to the disparity map, and determining a fifth pixel point and a sixth pixel point corresponding to the second candidate feature point in a fourth left image and a fourth right image of the second pair of images according to the second candidate feature point and the fourth pixel point respectively; determining a third neighborhood containing the fifth pixel point in a fourth left image, determining a fourth neighborhood containing a sixth pixel point in a fourth right image, and determining a fifth target pixel point and a sixth target pixel point which are matched with the second candidate feature point and the fourth pixel point in the third neighborhood and the fourth neighborhood respectively; and judging whether the correlation between the second candidate feature point, the fourth pixel point, the third target pixel point and the fourth target pixel point is greater than a set threshold value or not, if so, determining that the second candidate feature point, the fourth pixel point, the third target pixel point and the fourth target pixel point are the first target feature point, moving the first feature point out of the first feature point set, updating the first target feature point into the first feature point and adding the first feature point into the first feature point set.

Specifically, the obtaining module 51 is specifically configured to obtain a first motion amplitude corresponding to each first feature point according to a first motion vector corresponding to each first feature point, extract a first motion amplitude greater than a preset second threshold, and determine a first motion amplitude mean value of the first pair of images according to each extracted first motion amplitude.

Specifically, the obtaining module 51 is specifically configured to obtain, according to the first motion amplitude corresponding to each first feature point, each target first feature point of which the corresponding first motion amplitude is greater than a preset third threshold; and projecting the first motion vector direction of each target first feature point to a preset grid space, and determining a first motion direction entropy value of the first pair of images according to the corresponding grids.

Specifically, the obtaining module 51 is specifically configured to obtain a seventh three-dimensional coordinate and an eighth three-dimensional coordinate of each target first feature point in a world coordinate system according to each target first feature point, project the seventh three-dimensional coordinate and the eighth three-dimensional coordinate of each target first feature point to the ground, and determine a first area of influence of the first pair of images on the ground according to the number of ground grids corresponding to each seventh three-dimensional coordinate and each eighth three-dimensional coordinate.

In the embodiment of the invention, a third coordinate pair and a fourth coordinate pair of a first characteristic point, a fifth three-dimensional coordinate and a sixth three-dimensional coordinate in a camera coordinate system corresponding to the third coordinate pair and the fourth coordinate pair, and a seventh three-dimensional coordinate and an eighth three-dimensional coordinate of each first characteristic point in a world coordinate system are obtained for each first characteristic point in two continuous images in a binocular video, so as to determine a first motion amplitude mean value, a first motion direction entropy value and a first influence area of each characteristic point, judge whether the first pair of images are first target images or not for each first pair of images, determine each first time length according to the continuous first target images, and further determine a first target motion amplitude mean value, a first target motion direction entropy value, a first target influence area, the first time length and a pre-trained finished fighting detection model according to each first time length, and determining whether a fighting behavior occurs to the video corresponding to the first time length. The three-dimensional motion condition of each first characteristic point in a three-dimensional scene can be accurately obtained by the framing detection model, and the real three-dimensional motion can be accurately obtained according to the actual physical size; by determining the first target motion amplitude mean value, the first target motion direction entropy value, the first target influence area and the first time length corresponding to the first time length, the fighting detection model can analyze the three-dimensional motion condition of the target based on the three-dimensional motion vector field (namely scene flow), so as to accurately judge whether fighting behaviors occur, and improve the accuracy of fighting behavior detection.

Example 10:

on the basis of the foregoing embodiments, an embodiment of the present invention further provides an electronic device, as shown in fig. 6, including: the system comprises a processor 61, a communication interface 62, a memory 63 and a communication bus 64, wherein the processor 61, the communication interface 62 and the memory 63 complete mutual communication through the communication bus 64;

the memory 63 has stored therein a computer program which, when executed by the processor 61, causes the processor 61 to perform the steps of:

the following processing is carried out on a first pair of images and a second pair of images which are continuous in the binocular video: for each first feature point in the first pair of images and the second pair of images, determining a first motion vector corresponding to each first feature point according to a coordinate pair of each first feature point in the first pair of images and the second pair of images and a first calibration parameter corresponding to the binocular video, and determining a first motion vector direction of each first feature point and a first motion amplitude mean value of the first pair of images; projecting the first motion vector direction of each first feature point to a grid corresponding to a preset grid space, and determining a first motion direction entropy value of the first pair of images; determining a first influence area of the first pair of images on the ground according to the number of ground grids corresponding to the ground projected by the coordinates of each first feature point in a world coordinate system;

for each first time length, determining a first target motion amplitude mean value, a first target motion direction entropy value and a first target influence area according to each first target image corresponding to the first time length; and determining whether the frame-fighting behavior of the video corresponding to the first time length occurs according to the pre-trained frame-fighting detection model, the first target motion amplitude mean value, the first target motion direction entropy value, the first target influence area and the first time length. Based on the same inventive concept, the embodiment of the invention also provides an electronic device, and as the principle of solving the problems of the electronic device is similar to the method for detecting the fighting behavior, the implementation of the electronic device can refer to the implementation of the method, and repeated parts are not described again.

The electronic device provided by the embodiment of the invention can be a desktop computer, a portable computer, a smart phone, a tablet computer, a Personal Digital Assistant (PDA), an image acquisition device, an image processing device and the like.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface 62 is used for communication between the above-described electronic apparatus and other apparatuses.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Alternatively, the memory may be at least one memory device located remotely from the processor.

The processor may be a general-purpose processor, including a central processing unit, a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an application specific integrated circuit, a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like.

When the processor executes the program stored in the memory in the embodiment of the invention, the following processing is carried out on the first pair of images and the second pair of images which are continuous in the binocular video: for each first feature point in the first pair of images and the second pair of images, determining a first motion vector corresponding to each first feature point according to a coordinate pair of each first feature point in the first pair of images and the second pair of images and a first calibration parameter corresponding to the binocular video, and determining a first motion vector direction of each first feature point and a first motion amplitude mean value of the first pair of images; projecting the first motion vector direction of each first feature point to a grid corresponding to a preset grid space, and determining a first motion direction entropy value of the first pair of images; determining a first influence area of the first pair of images on the ground according to the number of ground grids corresponding to the ground projected by the coordinates of each first feature point in a world coordinate system; for each first pair of images, judging whether a first motion amplitude mean value, a first motion direction entropy value and a first influence area of the first pair of images are all larger than a corresponding preset first threshold value, and if so, marking the first pair of images as a first target image; determining each first time length according to the marked continuous first target images; for each first time length, determining a first target motion amplitude mean value, a first target motion direction entropy value and a first target influence area according to each first target image corresponding to the first time length; and determining whether the frame-fighting behavior of the video corresponding to the first time length occurs according to the pre-trained frame-fighting detection model, the first target motion amplitude mean value, the first target motion direction entropy value, the first target influence area and the first time length.

In the embodiment of the invention, by aiming at each first feature point in two continuous images in a binocular video, a third coordinate pair and a fourth coordinate pair of the first feature point, a fifth three-dimensional coordinate and a sixth three-dimensional coordinate in a camera coordinate system corresponding to the third coordinate pair and the fourth coordinate pair, and a seventh three-dimensional coordinate and an eighth three-dimensional coordinate in a world coordinate system of each first feature point are obtained, so as to determine a first motion amplitude mean value, a first motion direction entropy value and a first influence area of each feature point, judge whether the first pair of images are first target images or not according to each first pair of images, determine each first time length according to the continuous first target images, and further determine a first target motion amplitude mean value, a first target motion direction entropy value, a first target influence area, the first time length and a pre-trained framing detection model corresponding to each first time length, and determining whether a fighting behavior occurs to the video corresponding to the first time length. The three-dimensional motion condition of each first characteristic point in a three-dimensional scene can be accurately obtained by the framing detection model, and the real three-dimensional motion can be accurately obtained according to the actual physical size; by determining the first target motion amplitude mean value, the first target motion direction entropy value, the first target influence area and the first time length corresponding to the first time length, the fighting detection model can analyze the three-dimensional motion condition of the target based on the three-dimensional motion vector field (namely scene flow), so as to accurately judge whether fighting behaviors occur, and improve the accuracy of fighting behavior detection.

Example 11:

on the basis of the foregoing embodiments, an embodiment of the present invention further provides a computer storage readable storage medium, in which a computer program executable by an electronic device is stored, and when the program is run on the electronic device, the electronic device is caused to execute the following steps:

Based on the same inventive concept, embodiments of the present invention further provide a computer-readable storage medium, and since a principle of solving a problem when a processor executes a computer program stored in the computer-readable storage medium is similar to that of a fighting behavior detection method, implementation of the computer program stored in the computer-readable storage medium by the processor may refer to implementation of the method, and repeated details are not repeated.

The computer readable storage medium may be any available medium or data storage device that can be accessed by a processor in an electronic device, including but not limited to magnetic memory such as floppy disks, hard disks, magnetic tape, magneto-optical disks (MO), etc., optical memory such as CDs, DVDs, BDs, HVDs, etc., and semiconductor memory such as ROMs, EPROMs, EEPROMs, nonvolatile memories (NANDFLASH), Solid State Disks (SSDs), etc.

In the computer-readable storage medium provided in the embodiment of the present invention, a computer program is stored, and when executed by a processor, the computer program implements the following processing for a first pair of images and a second pair of images that are consecutive in a binocular video: for each first feature point in the first pair of images and the second pair of images, determining a first motion vector corresponding to each first feature point according to a coordinate pair of each first feature point in the first pair of images and the second pair of images and a first calibration parameter corresponding to the binocular video, and determining a first motion vector direction of each first feature point and a first motion amplitude mean value of the first pair of images; projecting the first motion vector direction of each first feature point to a grid corresponding to a preset grid space, and determining a first motion direction entropy value of the first pair of images; determining a first influence area of the first pair of images on the ground according to the number of ground grids corresponding to the ground projected by the coordinates of each first feature point in a world coordinate system; for each first pair of images, judging whether a first motion amplitude mean value, a first motion direction entropy value and a first influence area of the first pair of images are all larger than a corresponding preset first threshold value, and if so, marking the first pair of images as a first target image; determining each first time length according to the marked continuous first target images; for each first time length, determining a first target motion amplitude mean value, a first target motion direction entropy value and a first target influence area according to each first target image corresponding to the first time length; and determining whether the frame-fighting behavior of the video corresponding to the first time length occurs according to the pre-trained frame-fighting detection model, the first target motion amplitude mean value, the first target motion direction entropy value, the first target influence area and the first time length.

It is to be noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or operation from another entity or operation without necessarily requiring or implying any actual such relationship or order between such entities or operations.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method for detecting a fighting behavior in a video is characterized by comprising the following steps:

the following processing is carried out on a first pair of images and a second pair of images which are continuous in the binocular video: for each first feature point in the first pair of images and the second pair of images, determining a first motion vector corresponding to each first feature point according to a coordinate pair of each first feature point in the first pair of images and the second pair of images and a first calibration parameter corresponding to the binocular video, and determining a first motion vector direction of each first feature point and a first motion amplitude mean value of the first pair of images; projecting the first motion vector direction of each first feature point to a grid corresponding to a preset grid space, and determining a first motion direction entropy value of the first pair of images; after the first motion vector direction of each feature point is determined, setting each feature point at the center of a cube, and determining, for each feature point, which grid the first motion vector direction of the feature point is projected to a cube grid space corresponds to according to the first motion vector direction of the feature point, that is, taking the center of the cube as a starting point and the first motion vector direction as a direction, determining a straight line, and taking a grid where an intersection point of the straight line and the cube grid space is located as a grid corresponding to the preset grid space to which the first motion vector direction is projected; determining a first motion direction entropy value of the first pair of images according to the angle proportion value corresponding to each grid; determining a first influence area of the first pair of images on the ground according to the number of ground grids corresponding to the ground projected by the three-dimensional coordinates of each first feature point in the world coordinate system;

2. The method of claim 1, wherein the training process of the fighting detection model comprises:

performing the following processing on each continuous first pair of sample images and second pair of sample images in the sample binocular video: for each second feature point in the first pair of sample images and the second pair of sample images, determining a second motion vector corresponding to each second feature point according to a coordinate pair of each second feature point in the first pair of sample images and the second pair of sample images and a second calibration parameter corresponding to the sample binocular video, and determining a second motion vector direction of each second feature point and a second motion amplitude mean value of the first pair of sample images; projecting the second motion vector direction of each second feature point to a grid corresponding to a preset grid space, and determining a second motion direction entropy value of the first pair of sample images; after the second motion vector direction of each feature point is determined, setting each feature point at the center of a cube, and determining, for each feature point, which grid the second motion vector direction of the feature point is projected to a cube grid space corresponds to according to the second motion vector direction of the feature point, that is, taking the center of the cube as a starting point and the second motion vector direction as a direction, determining a straight line, and taking a grid where an intersection point of the straight line and the cube grid space is located as a grid corresponding to the preset grid space to which the second motion vector direction is projected; determining a second motion direction entropy value of the first pair of images according to the angle proportion value corresponding to each grid; determining a second influence area of the first pair of sample images on the ground according to the number of ground grids corresponding to the ground projected by the coordinates of each second feature point in the world coordinate system;

3. The method of claim 2, wherein determining the coordinate pair of each second feature point in the first pair of sample images and the second pair of sample images further comprises:

4. The method of claim 2, wherein determining the second motion magnitude mean of the first pair of sample images comprises:

5. The method of claim 2, wherein the determining the second motion direction entropy of the first pair of sample images according to the second motion vector direction of each second feature point projected onto the corresponding grid of the preset grid space comprises:

Projecting the second motion vector direction of each target second feature point to a preset grid space, and determining a second motion direction entropy value of the first pair of sample images according to the corresponding grids; when the grids projected in each second motion vector direction are determined, determining a straight line by taking the center of a cube as a starting point and the second motion vector direction of each target second feature point as a direction for each target second feature point, and taking the grid where the intersection point of the straight line and the cube grid space is located as the corresponding grid projected in the grid space in the second motion vector direction; for each target second characteristic point, counting an angle proportion value of a grid projected by a second motion vector direction of the target second characteristic point; and determining a second motion direction entropy value of the first pair of sample images according to the angle proportion value of each grid projected by the second motion vector direction of each target second feature point.

6. The method of claim 5, wherein determining the second area of influence of the first pair of sample images on the ground based on the number of projections of the coordinates of each second feature point in the world coordinate system onto the ground grid corresponding to the ground comprises:

7. The method of claim 1, wherein determining the coordinate pair of each first feature point in the first pair of images and the second pair of images is preceded, the method further comprising:

8. The method of claim 1, wherein determining the first motion magnitude mean for the first pair of images comprises:

9. The method of claim 1, wherein the determining the first motion direction entropy of the first pair of images according to the projection of the first motion vector direction of each first feature point onto the corresponding grid of the preset grid space comprises:

Projecting the first motion vector direction of each target first feature point to a preset grid space, and determining a first motion direction entropy value of the first pair of images according to the corresponding grids; when the grids projected in each first motion vector direction are determined, determining a straight line by taking the center of a cube as a starting point and the first motion vector direction of each target first feature point as a direction for each target first feature point, and taking the grid where the intersection point of the straight line and the cube grid space is located as the corresponding grid projected in the grid space in the first motion vector direction; for each target first characteristic point, counting an angle proportion value of a grid projected by a first motion vector direction of the target first characteristic point; and determining a first motion direction entropy value of the first pair of images according to the angle proportion value of each grid projected by the first motion vector direction of each target first feature point.

10. The method of claim 9, wherein determining the first area of influence of the first pair of images on the ground based on the number of ground grids corresponding to the ground projected by the coordinates of each first feature point in the world coordinate system comprises:

11. A device for detecting fighting behaviors in a video, which is characterized by comprising:

the acquisition module is used for carrying out the following processing on a first pair of continuous images and a second pair of continuous images in the binocular video: for each first feature point in the first pair of images and the second pair of images, determining a first motion vector corresponding to each first feature point according to a coordinate pair of each first feature point in the first pair of images and the second pair of images and a first calibration parameter corresponding to the binocular video, and determining a first motion vector direction of each first feature point and a first motion amplitude mean value of the first pair of images; projecting the first motion vector direction of each first feature point to a grid corresponding to a preset grid space, and determining a first motion direction entropy value of the first pair of images; after the first motion vector direction of each feature point is determined, setting each feature point at the center of a cube, and determining, for each feature point, which grid the first motion vector direction of the feature point is projected to a cube grid space corresponds to according to the first motion vector direction of the feature point, that is, taking the center of the cube as a starting point and the first motion vector direction as a direction, determining a straight line, and taking a grid where an intersection point of the straight line and the cube grid space is located as a grid corresponding to the preset grid space to which the first motion vector direction is projected; determining a first motion direction entropy value of the first pair of images according to the angle proportion value corresponding to each grid; determining a first influence area of the first pair of images on the ground according to the number of ground grids corresponding to the ground projected by the coordinates of each first feature point in a world coordinate system;

12. The apparatus of claim 11, wherein the fighting behavior detection apparatus further comprises:

the training module is used for carrying out the following processing on each continuous first pair of sample images and second pair of sample images in the sample binocular video: for each second feature point in the first pair of sample images and the second pair of sample images, determining a second motion vector corresponding to each second feature point according to a coordinate pair of each second feature point in the first pair of sample images and the second pair of sample images and a second calibration parameter corresponding to the sample binocular video, and determining a second motion vector direction of each second feature point and a second motion amplitude mean value of the first pair of sample images; projecting the second motion vector direction of each second feature point to a grid corresponding to a preset grid space, and determining a second motion direction entropy value of the first pair of sample images; after the second motion vector direction of each feature point is determined, setting each feature point at the center of a cube, and determining, for each feature point, which grid the second motion vector direction of the feature point is projected to a cube grid space corresponds to according to the second motion vector direction of the feature point, that is, taking the center of the cube as a starting point and the second motion vector direction as a direction, determining a straight line, and taking a grid where an intersection point of the straight line and the cube grid space is located as a grid corresponding to the preset grid space to which the second motion vector direction is projected; determining a second motion direction entropy value of the first pair of images according to the angle proportion value corresponding to each grid; respectively acquiring the number of ground grids corresponding to the projection of the coordinates of each second feature point in the world coordinate system to the ground, and determining a second influence area of the first pair of sample images on the ground; for each first pair of sample images, judging whether a second motion amplitude value mean value, a second motion direction entropy value and a second influence area of the first pair of sample images are all larger than a corresponding preset second threshold value, and if so, marking the first pair of sample images as a second target image; determining each second time length according to the marked continuous second target images; and aiming at each second duration, determining a second target motion amplitude mean value, a second target motion direction entropy value and a second target influence area according to each second target image corresponding to the second duration, and training the fighting detection model according to the second target motion amplitude mean value, the second target motion direction entropy value, the second target influence area, the second duration and identification information of whether each frame of image in the sample binocular video has fighting behavior.

13. The apparatus of claim 12, wherein the training module is further configured to determine that each second feature point is in front of a coordinate pair in the first pair of sample images and the second pair of sample images, and obtain a disparity map corresponding to the first pair of sample images according to a first left image and a first right image of the first pair of sample images; for each second feature point in the second feature point set, determining each first candidate feature point adjacent to the second feature point in the first left image according to the second feature point of the first left image in the first pair of sample images; adding each first candidate feature point to a candidate set; for each first candidate feature point in the candidate set, determining a first pixel point corresponding to the first candidate feature point in a first right image according to the disparity map, and determining a second pixel point and a third pixel point corresponding to the first candidate feature point in a second left image and a second right image of a second pair of sample images according to the first candidate feature point and the first pixel point respectively; determining a first neighborhood containing the second pixel point in a second left image, determining a second neighborhood containing a third pixel point in a second right image, and determining a first target pixel point and a second target pixel point which are matched with the first candidate feature point and the first pixel point in the first neighborhood and the second neighborhood respectively; and judging whether the correlation between the first candidate feature point, the first pixel point, the first target pixel point and the second target pixel point is greater than a set threshold value or not, if so, determining that the first candidate feature point, the first pixel point, the first target pixel point and the second target pixel point are second target feature points, moving the second feature points out of the second feature point set, updating the second target feature points into second feature points and adding the second feature points into the second feature point set.

14. The apparatus according to claim 12, wherein the training module is specifically configured to obtain a second motion amplitude corresponding to each second feature point according to the second motion vector corresponding to each second feature point; and extracting each second motion amplitude larger than a preset second threshold value, and determining a second motion amplitude mean value of the first pair of sample images according to each extracted second motion amplitude.

15. The apparatus according to claim 12, wherein the training module is specifically configured to obtain, according to the second motion amplitude corresponding to each second feature point, each target second feature point whose corresponding second motion amplitude is greater than a preset third threshold; projecting the second motion vector direction of each target second feature point to a preset grid space, and determining a second motion direction entropy value of the first pair of sample images according to the corresponding grids; when the grids projected in each second motion vector direction are determined, determining a straight line by taking the center of a cube as a starting point and the second motion vector direction of each target second feature point as a direction for each target second feature point, and taking the grid where the intersection point of the straight line and the cube grid space is located as the corresponding grid projected in the grid space in the second motion vector direction; for each target second characteristic point, counting an angle proportion value of a grid projected by a second motion vector direction of the target second characteristic point; and determining a second motion direction entropy value of the first pair of sample images according to the angle proportion value of each grid projected by the second motion vector direction of each target second feature point.

16. The apparatus according to claim 15, wherein the training module is specifically configured to obtain a third three-dimensional coordinate and a fourth three-dimensional coordinate of each target second feature point in a world coordinate system according to each second target second feature point, project the third three-dimensional coordinate and the fourth three-dimensional coordinate of each target second feature point onto the ground, and determine a second area of influence of the first pair of sample images on the ground according to the number of ground grids corresponding to each third three-dimensional coordinate and each fourth three-dimensional coordinate.

17. The apparatus of claim 11, wherein the obtaining module is further configured to obtain a disparity map corresponding to the first pair of images according to a third left image and a third right image of the first pair of images before determining the coordinate pair of each first feature point in the first pair of images and the second pair of images; for each first feature point in the first feature point set, determining each second candidate feature point adjacent to the first feature point in a third left image in the first pair of images according to the first feature point of the third left image; adding each second candidate feature point to the candidate set; for each second candidate feature point in the candidate set, determining a fourth pixel point corresponding to the second candidate feature point in a third right image according to the disparity map, and determining a fifth pixel point and a sixth pixel point corresponding to the second candidate feature point in a fourth left image and a fourth right image of the second pair of images according to the second candidate feature point and the fourth pixel point respectively; determining a third neighborhood containing the fifth pixel point in a fourth left image, determining a fourth neighborhood containing a sixth pixel point in a fourth right image, and determining a fifth target pixel point and a sixth target pixel point which are matched with the second candidate feature point and the fourth pixel point in the third neighborhood and the fourth neighborhood respectively; and judging whether the correlation between the second candidate feature point, the fourth pixel point, the third target pixel point and the fourth target pixel point is greater than a set threshold value or not, if so, determining that the second candidate feature point, the fourth pixel point, the third target pixel point and the fourth target pixel point are the first target feature point, moving the first feature point out of the first feature point set, updating the first target feature point into the first feature point and adding the first feature point into the first feature point set.

18. The apparatus according to claim 11, wherein the obtaining module is specifically configured to obtain a first motion amplitude corresponding to each first feature point according to the first motion vector corresponding to each first feature point, extract a first motion amplitude larger than a preset second threshold, and determine a first motion amplitude mean of the first pair of images according to each extracted first motion amplitude.

19. The apparatus according to claim 11, wherein the obtaining module is specifically configured to obtain, according to the first motion amplitude corresponding to each first feature point, each target first feature point whose corresponding first motion amplitude is greater than a preset third threshold; projecting the first motion vector direction of each target first characteristic point to a preset grid space, and determining a first motion direction entropy value of the first pair of images according to the corresponding grids; when the grids projected in each first motion vector direction are determined, determining a straight line by taking the center of a cube as a starting point and the first motion vector direction of each target first feature point as a direction for each target first feature point, and taking the grid where the intersection point of the straight line and the cube grid space is located as the corresponding grid projected in the grid space in the first motion vector direction; for each target first characteristic point, counting an angle proportion value of a grid projected by a first motion vector direction of the target first characteristic point; and determining a first motion direction entropy value of the first pair of images according to the angle proportion value of each grid projected by the first motion vector direction of each target first feature point.

20. The apparatus according to claim 19, wherein the obtaining module is specifically configured to obtain a seventh three-dimensional coordinate and an eighth three-dimensional coordinate of each target first feature point in a world coordinate system according to the each target first feature point, project the seventh three-dimensional coordinate and the eighth three-dimensional coordinate of each target first feature point onto the ground, and determine the first area of influence of the first pair of images on the ground according to the number of ground grids corresponding to each seventh three-dimensional coordinate and each eighth three-dimensional coordinate.

21. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any one of claims 1 to 10 when executing a program stored in the memory.

22. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1-10.