CN112183153A - Object behavior detection method and device based on video analysis - Google Patents

Object behavior detection method and device based on video analysis Download PDF

Info

Publication number
CN112183153A
CN112183153A CN201910585625.1A CN201910585625A CN112183153A CN 112183153 A CN112183153 A CN 112183153A CN 201910585625 A CN201910585625 A CN 201910585625A CN 112183153 A CN112183153 A CN 112183153A
Authority
CN
China
Prior art keywords
target object
video
video frame
key point
detecting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910585625.1A
Other languages
Chinese (zh)
Inventor
汤人杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Zhejiang Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Zhejiang Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Zhejiang Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201910585625.1A priority Critical patent/CN112183153A/en
Publication of CN112183153A publication Critical patent/CN112183153A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method and a device for detecting object behaviors based on video analysis, wherein the method comprises the following steps: converting an original video into a video image sequence containing a plurality of video frames, and detecting a target object contained in each video frame in the video image sequence; respectively determining the position information of the target object in each video frame, and determining the motion track of the target object according to the position information; detecting skeleton key point information of a target object contained in each video frame in a video image sequence according to the position information of the target object in each video frame; determining the action type of the target object according to a pre-trained bone recognition model and the bone key point information of the target object contained in each video frame obtained by detection; and detecting whether the behavior of the target object is abnormal or not according to the motion track of the target object and the action type of the target object. Therefore, the method and the device can comprehensively utilize the motion trail and the motion category of the target object to realize accurate prediction of abnormal behaviors.

Description

Object behavior detection method and device based on video analysis
Technical Field
The invention relates to the technical field of computer vision, in particular to a method and a device for detecting object behaviors based on video analysis.
Background
With the rapid development of the economic science and technology level, the monitoring system is like a bamboo shoot in spring after rain, and electronic eyes are distributed on large and small streets in various cities throughout the country. People's understanding to the security is higher and higher, and the ubiquitous phenomenon of carrying out the safety control through monitored control system, monitored control system is also unpopular, has greatly promoted social security. However, the current monitoring system still needs to participate in the judgment or supervision of abnormal behaviors manually, and with the continuous development of computer vision technology, a highly automated and intelligent monitoring system inevitably plays an important role in the future monitoring system.
The pedestrian abnormal behavior analysis is firstly carried out on pedestrian detection, the existing pedestrian detection method is mature, if a background modeling method is used, a foreground moving target is extracted, the characteristic extraction is carried out in a target area, then a classifier is used for classification, and whether pedestrians are included is judged; a statistical learning-based method performs a pedestrian detection classifier based on a large amount of data, and uses gray scale, color, texture, HOG (histogram of oriented gradients) feature, and the like of an object as main object features. The classifier includes a neural network, an SVM (support vector machine), deep learning, and the like.
However, the following problems still exist in the prior art:
(1) the inspection workload is large: generally, the current detection means still mainly adopts manpower, and a large amount of manpower and material resources are consumed.
(2) The measurement standard is single, and the detection accuracy is not high: the detection system using the single model often obtains a single detection index, and detection with the single index has certain advantages in efficiency, but the accuracy is reduced.
Disclosure of Invention
In view of the above, the present invention is proposed to provide an object behavior detection method and apparatus based on video analysis that overcomes or at least partially solves the above problems.
According to an aspect of the present invention, there is provided a method for detecting object behaviors based on video analysis, including:
converting an original video into a video image sequence containing a plurality of video frames, and detecting a target object contained in each video frame in the video image sequence;
respectively determining the position information of the target object in each video frame, and determining the motion track of the target object according to the position information;
detecting skeleton key point information of a target object contained in each video frame in a video image sequence according to the position information of the target object in each video frame;
determining the action category of a target object according to a pre-trained bone recognition model and the bone key point information of the target object contained in each video frame obtained through detection;
and detecting whether the behavior of the target object is abnormal or not according to the motion track of the target object and the action type of the target object.
Optionally, detecting the target object contained in each video frame in the sequence of video images comprises:
detecting a candidate object contained in each video frame in the video image sequence;
and screening target objects contained in the video image sequence according to the candidate objects contained in each video frame.
Optionally, detecting, according to the position information of the target object in each video frame, bone keypoint information of the target object contained in each video frame in the video image sequence includes:
detecting bone key point information of candidate objects contained in each video frame in a video image sequence;
and screening the bone key point information of the target object from the bone key point information of the candidate object according to the position information of the target object in each video frame.
And detecting the bone key point information of the candidate object contained in each video frame in the video image sequence by utilizing an OpenPose algorithm.
Optionally, determining the position information of the target object in each video frame includes:
and determining the position information of the target object in each video frame according to the tracking frame of the target object.
Optionally, detecting the target object contained in each video frame in the sequence of video images comprises:
the YOLOV3 algorithm is used to detect a target object contained in each video frame in a sequence of video images.
Optionally, determining the motion trajectory of the target object according to the position information includes:
estimating a prediction tracking result of a target object according to the position information of the target object in the previous video frame number in the two adjacent video frame numbers by using a Kalman filtering algorithm;
and judging whether the predicted tracking result of the target object is matched with the actual detection result of the target object or not from the two aspects of the motion matching degree and the apparent matching degree.
Optionally, the detecting of the bone keypoint information of the candidate object contained in each video frame in the video image sequence includes:
according to an aspect of the present invention, there is provided an object behavior detection apparatus based on video analysis, including:
the target object detection module is suitable for converting an original video into a video image sequence containing a plurality of video frames and detecting a target object contained in each video frame in the video image sequence;
the motion track determining module is suitable for respectively determining the position information of the target object in each video frame and determining the motion track of the target object according to the position information;
the skeleton key point information detection module is suitable for detecting the skeleton key point information of the target object contained in each video frame in the video image sequence according to the position information of the target object in each video frame;
the action category determining module is suitable for determining the action category of the target object according to a bone recognition model trained in advance and the bone key point information of the target object contained in each video frame obtained through detection;
and the abnormal behavior judging module is suitable for detecting whether the behavior of the target object is abnormal or not according to the motion track of the target object and the action type of the target object.
Optionally, the target object detection module is adapted to:
detecting a candidate object contained in each video frame in the video image sequence;
and screening target objects contained in the video image sequence according to the candidate objects contained in each video frame.
Optionally, the bone key point information detection module is adapted to:
detecting bone key point information of candidate objects contained in each video frame in a video image sequence;
and screening the bone key point information of the target object from the bone key point information of the candidate object according to the position information of the target object in each video frame.
Optionally, the bone key point information detection module is adapted to:
and detecting the bone key point information of the candidate object contained in each video frame in the video image sequence by utilizing an OpenPose algorithm.
Optionally, the motion trajectory determination module is adapted to:
and determining the position information of the target object in each video frame according to the tracking frame of the target object.
Optionally, the target object detection module is adapted to:
the YOLOV3 algorithm is used to detect a target object contained in each video frame in a sequence of video images.
Optionally, the motion trajectory determination module is adapted to:
and determining the motion track of the target object according to the position information by utilizing a Kalman filtering algorithm and the motion matching degree index and the apparent matching degree index.
Optionally, the bone key point information detection module is adapted to:
estimating a prediction tracking result of a target object according to the position information of the target object in the previous video frame number in the two adjacent video frame numbers by using a Kalman filtering algorithm;
and judging whether the predicted tracking result of the target object is matched with the actual detection result of the target object or not from the two aspects of the motion matching degree and the apparent matching degree.
According to still another aspect of the present invention, there is provided an electronic apparatus including: the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the object behavior detection method based on the video analysis.
According to yet another aspect of the present invention, a computer storage medium is provided, in which at least one executable instruction is stored, and the executable instruction causes a processor to perform an operation corresponding to the object behavior detection based on video analysis as described above.
In summary, the invention discloses a method and a device for detecting object behaviors based on video analysis. First, an original video is converted into a video image sequence including a plurality of video frames, and a target object included in each video frame in the video image sequence is detected. And then, respectively determining the position information of the target object in each video frame, and determining the motion track of the target object according to the position information. Then, the bone key point information of the target object contained in each video frame in the video image sequence is detected according to the position information of the target object in each video frame. And then, determining the action type of the target object according to a pre-trained bone recognition model and the detected bone key point information of the target object contained in each video frame. And finally, detecting whether the behavior of the target object is abnormal or not according to the motion track of the target object and the action type of the target object. Therefore, the method and the device can comprehensively utilize the motion trail and the motion category of the target object to realize accurate prediction of abnormal behaviors.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 is a flowchart illustrating an object behavior detection method based on video analysis according to a first embodiment;
fig. 2 is a flowchart illustrating an object behavior detection method based on video analysis according to a second embodiment;
fig. 3 is a block diagram showing an object behavior detection apparatus based on video analysis according to a third embodiment;
FIG. 4 is a schematic diagram of an electronic device according to an embodiment of the invention;
FIG. 5 shows the tracking result of the tracking box of the target object;
FIG. 6 illustrates another tracking result of the tracking box of the target object;
FIG. 7 shows skeletal keypoint information extraction results;
FIG. 8 shows another skeletal keypoint information extraction result;
fig. 9 shows a stop motion category recognition result.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Example one
Fig. 1 shows a flowchart of an object behavior detection method based on video analysis according to an embodiment. As shown in fig. 1, the method comprises the steps of:
step S110: an original video is converted into a video image sequence including a plurality of video frames, and a target object included in each video frame in the video image sequence is detected.
The video frame refers to a single video image extracted from an original video, and the video image sequence refers to a set of a plurality of video frames arranged in sequence. The method for extracting a single video image from an original video may be to extract video frames one by one according to the video frame number, or may also be to extract video frames at intervals of a preset video frame number.
Specifically, target objects contained in video frames are detected, wherein the number of the video frames may be multiple, and the number of the target objects contained in a single video frame may be multiple.
Step S120: and respectively determining the position information of the target object in each video frame, and determining the motion track of the target object according to the position information.
And determining the position information of the target object in each video frame according to the tracking frame of the target object. In specific implementation, the position information of the target object in the video frame is determined according to the abscissa x and the ordinate y of the central position of the tracking frame of the target object. It should be noted that, in this embodiment, the determination method of the position information of the target object is not particularly limited, and a person skilled in the art may determine the position information of the target object in other manners.
Specifically, according to the position information of the target object in each video frame, the motion trajectory of the target object in the corresponding time period of the video image sequence can be determined.
Step S130: and detecting the bone key point information of the target object contained in each video frame in the video image sequence according to the position information of the target object in each video frame.
Specifically, firstly, obtaining skeleton key point information of a target object in a video frame; then, the bone key point information corresponding to the target object is determined according to the position information of the target object in the video frame. For example, firstly, bone key point information of a target object a, a target object b and a target object c is acquired; then, the bone key point information corresponding to the target object a is determined according to the position information of the target object a. The skeleton key point information comprises position coordinates of each key point, and the position information comprises center position coordinates of a tracking frame of the target object. And when the distance between the position coordinate of each key point in the skeleton key point information and the center position coordinate of the tracking frame in the position information is within a preset range, judging that the skeleton key point information is successfully matched with the position information of the target object, namely determining the skeleton key point information as the skeleton key point information of the target object.
Step S140: and determining the action type of the target object according to a pre-trained bone recognition model and the detected bone key point information of the target object contained in each video frame.
Specifically, the skeleton key point information of the target object in each video frame is input into a skeleton recognition model, and the skeleton recognition model outputs the specific action category of the target object in each video frame.
Step S150: and detecting whether the behavior of the target object is abnormal or not according to the motion track of the target object and the action type of the target object.
Specifically, the motion speed of the target object is determined according to the motion track of the target object, and whether the motion speed of the target object exceeds a preset motion speed threshold value is judged; and/or judging whether the action type of the target object is an abnormal action type.
In summary, the method detects the motion trajectory and the motion category of the target object in each video frame, and comprehensively utilizes the motion trajectory and the motion category of the target object to realize accurate prediction of abnormal behaviors.
Example two
Fig. 2 shows a flowchart of an object behavior detection method based on video analysis according to a second embodiment. As shown in fig. 2, the method comprises the steps of:
step S200: the bone recognition model is trained in advance.
The bone recognition model is a model for recognizing a specific motion category by using bone key point information. The input to the bone recognition model is the bone key point information and the output of the bone recognition model is the specific action category.
Specifically, a human body structure data set is prepared, an ST-GCN model is adopted as a specific bone recognition model, and the human body structure data set is used as a training data set of the ST-GCN model. The input of the ST-GCN model is a human body structure data set, and the output is an action category label. The human body structure data set refers to skeleton key point coordinate information of specific actions in a video frame, and the action category labels specifically include: stop, walk, run, squat, stand, dance. The embodiment does not limit the specific meaning of the action category label, and those skilled in the art may define the specific meaning of the action category label by other methods. As shown in fig. 9, fig. 9 shows the stop motion category recognition result.
In specific implementation, the ST-GCN model training method by using the human body structure data set comprises the following specific steps: firstly, a space-time diagram is established on a human body structure data set, and multilayer space-time diagram convolution operation is carried out on the space-time diagram to generate a higher-level characteristic diagram. Then, the human body structure data set is input into the network, and the data proportion of each node of the ST-GCN model is kept consistent. The ST-GCN model consists of nine layers of space-time convolution, 64 channels in the first three layers, 28 channels in the middle three layers and 256 channels in the last three layers. The ST-GCN model-a total of nine time convolution kernels, uses residual concatenation and performs feature regularization with Dropout. Pooling is set at the fourth and seventh time convolutional layers and the last 256 output channels are pooled globally and then sorted using a Softmax sorter. Optimization was performed using a random gradient descent method with a learning rate set to 0.1, 100 epochs set, and a 0.01 reduction per 10 Epoch iterations.
Step S210: an original video is converted into a video image sequence including a plurality of video frames, and a target object included in each video frame in the video image sequence is detected.
The video frame refers to a single video image extracted from an original video, and the video image sequence refers to a set of a plurality of video frames arranged in sequence. The method for extracting a single video image from an original video may be to extract video frames one by one according to the video frame number, or may also be to extract video frames at intervals of a preset video frame number.
Specifically, a YOLOV3 algorithm is used to detect a target object contained in each video frame in a sequence of video images. In particular, when a target object included in a video frame is detected, the detected target object is identified using a tracking frame. Wherein the tracking frame is rectangular, and the parameters describing the size of the tracking frame comprise the height h of the tracking frame and the aspect ratio R of the tracking frameaThe parameter describing the position of the tracking frame in the video frame has an abscissa x and an ordinate y of the position of the center of the tracking frame, and the parameter describing the speed of movement of the tracking frame in the sequence of video images has a speed v. It should be noted that the size of the tracking frame is determined by the size of the target object in the video frame, and the size of the tracking frame changes with the size change of the target object in the video frame. As shown in fig. 5 and 6, fig. 5 shows a tracking result of the tracking frame of the target object, and fig. 6 shows another tracking result of the tracking frame of the target object.
Step S220: and respectively determining the position information of the target object in each video frame, and determining the motion track of the target object according to the position information.
And determining the position information of the target object in each video frame according to the tracking frame of the target object. In specific implementation, the position information of the target object in the video frame is determined according to the abscissa x and the ordinate y of the central position of the tracking frame of the target object.
Specifically, a Kalman filtering algorithm, a motion matching degree index and an apparent matching degree index are utilized, and the motion track of the target object is determined according to the position information of the target object in each video frame.
When the method is specifically implemented, firstly, a prediction tracking result of a target object is estimated by using a Kalman filtering algorithm according to the position information of the target object in the previous video frame number in two adjacent video frame numbers. The estimation of the prediction tracking result of the target object by using the kalman filter algorithm specifically includes: obtainTaking the abscissa x and the ordinate y of the central position of the tracking frame of the target object i in the previous video frame in two adjacent video frames, and simultaneously referring to the height h of the tracking frame of the target object i in the previous video frame and the aspect ratio R of the tracking frameaAnd a moving speed v for estimating a result (x) of predictive tracking of the tracking frame of the target object i in a subsequent video frame in two adjacent video frames using a Kalman filtering algorithm based on the 5 parameters of the tracking frame of the target object i in the previous video framei,yi,hi,Rai,vi)。
Then, it is determined whether the result of predictive tracking of the target object matches the result of actual detection of the target object, from both the degree of motion matching and the degree of apparent matching. Acquiring the actual detection result (x) of the tracking frame of the target object j in the subsequent video frame in two adjacent video frames before judging whether the predicted tracking result of the target object is matched with the actual detection result of the target objectj,yj,hj,Raj,vj) Wherein, the number of the target objects j in the following video frame can be multiple. Judging whether the predicted tracking result of the target object is matched with the actual detection result of the target object specifically comprises the following steps: the first step is as follows: evaluating the motion matching degree of the actual detection result of the tracking frame of the target object j in the subsequent video frame and the predicted tracking result of the tracking frame of the target object i in the previous video frame, wherein the specific calculation formula is as follows:
d(1)(i,j)=(dj-yi)TSi -1(dj-yi)
wherein d isjIs the actual detection result (x) of the tracking frame of the target object j in the following video framej,yj,hj,Raj,vj),yiPredicted tracking result (x) of tracking frame for target object i in previous video framei,yi,hi,Rai,vi),SiFor the spatial covariance matrix predicted by the Kalman Filter Algorithm, T denotes the transposition, d(1)(i, j) is the Mahalanobis distance between the actual detection result and the predicted tracking result, and the Mahalanobis distance is usedAnd evaluating the motion matching degree of the actual detection result and the prediction tracking result.
A threshold function is defined for the motion matching degree, and the specific threshold function is as follows:
bij (1)=T[d(1)(i,j)≤t(1)]
wherein, t(1)Is a threshold value, T represents a chi-square distribution, T(1)In particular, 0.95 decimals of the chi-squared distribution may be used as the threshold t(1)。bij (1)Specifically, the result of the motion matching degree between the actual detection result and the prediction tracking result is represented. bij (1)The value of (b) specifically includes 1 or 0. bij (1)A value of 1 indicates that the motion matching of the actual detection result and the predicted tracking result is successful. bij (1)A value of 0 indicates that the motion matching of the actual detection result and the predicted tracking result is unsuccessful.
The second step is that: and if the motion matching of the actual detection result and the predicted tracking result is successful, further evaluating the apparent matching degree of the actual detection result and the predicted tracking result. And if the motion matching of the actual detection result and the predicted tracking result is unsuccessful, not evaluating the apparent matching degree.
Specifically, the evaluating the apparent matching degree between the actual detection result and the prediction tracking result includes: calculating a surface feature description factor r of a tracking frame of a target object j in a subsequent video framejCalculating the characteristic description factor r of the tracking frame of the target object i in 100 video framesk (i)Where k represents the number of video frames, for example, k may be 101-. Describing the characteristics in 100 video frames by a factor rk (i)Stored in the set RiIn (1).
The specific calculation formula of the apparent matching degree between the actual detection result and the prediction tracking result is as follows:
d(2)(i,j)=min{1-rj Trk (i)|rk (i)∈Ri}
wherein, T tableDisplay and transfer device, d(2)(i, j) is the minimum cosine value between the actual detection result and the feature vector of the predicted tracking result, rjSurface feature description factor, r, for a tracking box of a target object, jk (i)Feature description factor, R, of a tracking frame in a video frame of number k for a target object iiIs rk (i)A collection of (a). And evaluating the apparent matching degree of the actual detection result and the predictive tracking result by using the minimum cosine value between the actual detection result and the characteristic vector of the predictive tracking result.
Similarly, a threshold function is defined for the apparent degree of matching, and the specific threshold function is as follows:
bij (2)=T[d(2)(i,j)≤t(2)]
wherein, t(2)T represents a chi-squared distribution. bij (2)The value of (b) specifically includes 1 or 0. bij (2)A value of 1 indicates that the apparent matching of the actual detection result and the predicted tracking result is successful. bij (2)A value of 0 indicates that the apparent matching of the actual detection result and the predicted tracking result is unsuccessful.
The third step: calculating a weighted average value aiming at the motion matching degree and the apparent matching degree, wherein the specific formula is as follows:
ci,j=λd(1)(i,j)+(1-λ)d(2)(i,j)
bij (1)directly reflecting whether the actual detection result and the prediction tracking result are successfully matched in motion, bij (2)And directly reflecting whether the actual detection result is matched with the prediction tracking result in appearance successfully or not.
Using ci,jQuantizing the degree of motion matching and the degree of apparent matching when ci,jWhen the target object j in the subsequent video frame is within the preset range, the tracking frame of the target object j in the subsequent video frame is successfully matched with the tracking frame of the target object i in the previous video frame. Repeating the above process, determining the tracking frame of the target object in each video frame, and determining according to the position information of the tracking frame in each video frameAnd determining the position information of the target object in each video frame, and further determining the motion track of the target object.
Further, in order to improve matching efficiency, before determining the position information of the target object in each video frame, the target object is screened from candidate objects contained in each video frame. In specific implementation, firstly, candidate objects contained in each video frame in a video image sequence are detected; then, screening target objects included in the video image sequence according to the candidate objects included in each video frame, wherein screening the target objects included in the video image sequence specifically includes: and judging the interval time from the last successful matching of the candidate object to the current moment, if the interval time exceeds a preset threshold, indicating that the motion trail of the candidate object is terminated, and taking the candidate object as a target object.
Step S230: and detecting the bone key point information of the target object contained in each video frame in the video image sequence according to the position information of the target object in each video frame.
The bone key point information refers to information which is obtained by utilizing an openpos algorithm and is related to the posture of a target object, and the bone key point information specifically comprises the following two parts: confidence S and body-part affinity L.
Specifically, the first step: detecting skeleton key point information of candidate objects contained in each video frame in a video image sequence, wherein the detecting the skeleton key point information specifically comprises: and detecting the bone key point information of the candidate object contained in each video frame in the video image sequence by utilizing an OpenPose algorithm. In specific implementation, the process of detecting the bone key point information is as follows: the method comprises the following steps: and inputting the video frame into the VGG-19 model, and outputting a feature map F through the convolution of the first ten layers of the VGG-19 model. Step two: the feature map F is respectively fed into two branches, the first branch being used to evaluate the confidence S. The first branch evaluation confidence S specifically includes: after the feature map F is imported into the first branch, a body part corresponding to a certain key point in the feature map F is predicted, for example, the key point 01 in the feature map F is predicted to belong to the left shoulder part of the body. After predicting a body part corresponding to a certain key point, the probability of the body part corresponding to the key point is evaluated, for example, the probability that the key point 01 in the predicted feature map F belongs to the left shoulder part of the body is evaluated, and the probability obtained by the evaluation is the confidence S. The specific calculation formula of the first branch for evaluating the confidence level S is as follows:
S1=ρ1(F);
St=ρt(F,St-1,Lt-1)
wherein S istThe confidence of the key point t is represented, F represents the feature map, and L represents the body part affinity vector.
Step three: the feature map F is passed into a second branch, which is used to predict the body-part affinity L, which refers to the degree of association between different body parts. For example, the key point 01 in the feature map F belongs to the left shoulder portion of the body, the key point 02 belongs to the neck portion of the body, the left shoulder portion of the key point 01 and the neck portion of the key point 02 are associated with each other by the second branch prediction, and a set of vectors [ (x) representing the trunk between the left shoulder portion and the neck portion is further output1,y1),(x2,y2)]Wherein (x)1,y1) Indicates the position of the starting point of the trunk (i.e., the neck region), (x)2,y2) Indicating the location of the end point of the torso (i.e., the left shoulder region). The vector representing the trunk is the affinity L of the body part, and the vector contains the position information of each key point. Wherein L is1=φ1(F),
Figure BDA0002114475400000121
LtA body-part affinity vector representing the torso t.
A Label bit is agreed upon for the confidence level S, wherein,
Figure BDA0002114475400000122
S* j(p)=maxkS* j,k(p)
wherein p is the currentPosition, xj,kIs the coordinates of the jth part of the kth person. To prevent too small numbers from overwhelming the training, a constant σ is set.
Figure BDA0002114475400000131
Formula (1) represents that the score is higher as the distance between the current position (x, y) and the jth part of the kth person is higher; s* j(p)=maxkS* j,kAnd (p) is formula (2), wherein formula (2) represents that the maximum value is taken for k, and the position with the highest score is found.
Figure BDA0002114475400000132
In equation (3), equation (3) indicates that the current position p is at the c-th position of the kth person, and the Label of the current position p takes the unit vector of the position, otherwise 0 is taken.
v=(xj2,k-xj1,k)/||xj2,k-xj1,k||2In equation (4), equation (4) expresses vector division modulo to obtain unit vector.
0≤v·(p-xj1,k)≤lc,kand|v⊥·(p-xj1,k)|≤σlIn equation (5), equation (5) indicates whether p is discussed at the c site.
It should be noted that, the bone information is extracted and rendered for the object in the passing video. Wherein, the motion transformation can cause partial bone information acquisition failure, 0 is supplemented at the position of the bone information acquisition failure, the coordinate position is set as 0, and the score value is also set as 0. The method does not generate great interference on the identification of subsequent actions. As shown in fig. 7 and 8, fig. 7 shows a skeletal key point information extraction result, and fig. 8 shows another skeletal key point information extraction result.
The second step is that: and screening the bone key point information of the target object from the bone key point information of the candidate object according to the position information of the target object in each video frame. The bone key point information of the screening target object specifically comprises the following steps: the method comprises the following steps: and extracting key point position information in skeleton key point information of the target object, and extracting position information of the target object in each video frame, wherein the skeleton key point information comprises position coordinates of each key point, and the position information comprises center position coordinates of a tracking frame of the target object.
Step two: and when the distance between the position coordinate of each key point in the skeleton key point information and the center position coordinate of the tracking frame in the position information is within a preset range, judging that the skeleton key point information is successfully matched with the position information of the target object. It should be noted that, in a video frame exceeding a preset frame number, if the matching between the key point position information in the skeleton key point information and the position information of the target object is successful, it indicates that the skeleton key point information and the target object are successfully bound, that is, the skeleton key point information is determined as the skeleton key point information of the target object.
Further, considering that a short-distance target object may completely cover a long-distance target object, the extreme value of the key point position information in the skeleton key point information is compared with the coordinates of the four vertexes of the tracking frame of the target object, so that a binding error is prevented. Wherein, the extreme value of the key point position information refers to the maximum abscissa value x in the key point position informationmaxMinimum abscissa value xminMaximum longitudinal coordinate value ymaxMinimum ordinate value ymin. And if the extreme value of the position information of the key point exceeds the coordinates of the four vertexes of the tracking frame of the target object, the bone key point information and the target object cannot be bound.
Step S240: and determining the action type of the target object according to a pre-trained bone recognition model and the detected bone key point information of the target object contained in each video frame.
Specifically, the skeleton key point information of the target object in each video frame is input into a skeleton recognition model, and the skeleton recognition model outputs the specific action category of the target object in each video frame.
Step S250: and detecting whether the behavior of the target object is abnormal or not according to the motion track of the target object and the action type of the target object.
Specifically, the motion speed of the target object is determined according to the motion track of the target object, and whether the motion speed of the target object exceeds a preset motion speed threshold value is judged; and/or judging whether the action type of the target object is an abnormal action type.
The specific abnormal behavior judgment rule may adopt at least one of the following:
firstly, detecting the action type of a target object in a video frame, and judging whether the behavior of the target object is abnormal according to the action type of the target object, wherein the number of the video frames can be multiple, and the number of the target object in a single video frame can be multiple. For example, the number of frames of a certain video frame is j, and the target object i in the video frame sets sudden stop of a pedestrian in motion as one of the abnormal motion types. If the total number of the target objects with abnormal action types in 20 video frames exceeds a preset abnormal number threshold, judging that a behavior abnormal phenomenon occurs, wherein a specific judgment formula is as follows:
Figure BDA0002114475400000141
wherein j is the frame number of the video frame, i represents the target object in a single video frame, and walk is an abnormal action type label and is a preset abnormal quantity threshold.
And secondly, determining the motion speed of the target object according to the motion track of the target object, and judging whether the behavior of the target object is abnormal or not according to the motion speed of the target object. For example, the speed V of the target object is determined according to the displacement speed of the tracking frame of the target object, and if the speed exceeds the preset speed threshold value VnIf the total number of the target objects exceeds a preset abnormal number threshold, judging that the behavior abnormal phenomenon occurs, wherein a specific judgment formula is as follows:
Figure BDA0002114475400000151
where j is the frame number of the video frame, i represents the target object in a single video frame, VnTo a preset speed threshold, σ isAnd setting an abnormal quantity threshold value.
Thirdly, the movement speed and the action category of the target object are used in combination, for example, when a preset number of pedestrians have a preset abnormal behavior category, a behavior abnormality is prompted to occur.
It should be noted that, the specific meaning of the abnormal behavior determination rule is not specifically limited in this embodiment, and those skilled in the art may use other methods to define the specific meaning of the abnormal behavior determination rule.
In summary, in this way, firstly, a kalman filter algorithm, a motion matching degree index and an apparent matching degree index are used to determine a tracking frame of the target object in each video frame, and the position information of the target object is determined according to the position information of the tracking frame, so as to determine the motion trajectory of the target object. Then, the openpos algorithm is used to detect the bone key point information of the target object contained in each video frame. And finally, binding the motion track of the target object with the bone key point information of the target object, and evaluating whether the target object is abnormal or not. The advantages of this approach are: on one hand, the automatic extraction, composition and analysis of the video content are realized, and the manpower and material resources required by the detection system are greatly reduced; on the other hand, the motion trail and the motion category of the target object are comprehensively utilized, and accurate prediction of abnormal behaviors is achieved.
EXAMPLE III
Fig. 3 is a block diagram of an object behavior detection apparatus based on video analysis according to a third embodiment, where the apparatus includes:
a target object detection module 31 adapted to convert an original video into a video image sequence including a plurality of video frames, and detect a target object included in each video frame in the video image sequence;
a motion track determining module 32, adapted to determine position information of the target object in each video frame, respectively, and determine a motion track of the target object according to the position information;
a bone key point information detection module 33, adapted to detect bone key point information of a target object contained in each video frame in a video image sequence according to position information of the target object in each video frame;
the action category determining module 34 is adapted to determine an action category of the target object according to a bone recognition model trained in advance and detected bone key point information of the target object contained in each video frame;
the abnormal behavior determining module 35 is adapted to detect whether the behavior of the target object is abnormal according to the motion trajectory of the target object and the motion category of the target object.
Optionally, the target object detection module 31 is adapted to:
detecting a candidate object contained in each video frame in the video image sequence;
and screening target objects contained in the video image sequence according to the candidate objects contained in each video frame.
Optionally, the bone key point information detection module 33 is adapted to:
detecting bone key point information of candidate objects contained in each video frame in a video image sequence;
and screening the bone key point information of the target object from the bone key point information of the candidate object according to the position information of the target object in each video frame.
Optionally, the bone key point information detection module 33 is adapted to:
and detecting the bone key point information of the candidate object contained in each video frame in the video image sequence by utilizing an OpenPose algorithm.
Optionally, the motion trajectory determination module 32 is adapted to:
and determining the position information of the target object in each video frame according to the tracking frame of the target object.
Optionally, the target object detection module 31 is adapted to:
the YOLOV3 algorithm is used to detect a target object contained in each video frame in a sequence of video images.
Optionally, the motion trajectory determination module 32 is adapted to:
estimating a prediction tracking result of a target object according to the position information of the target object in the previous video frame number in the two adjacent video frame numbers by using a Kalman filtering algorithm;
and judging whether the predicted tracking result of the target object is matched with the actual detection result of the target object or not from the two aspects of the motion matching degree and the apparent matching degree.
The embodiment of the application provides a non-volatile computer storage medium, wherein at least one executable instruction is stored in the computer storage medium, and the computer executable instruction can execute an object behavior detection method based on video analysis in any method embodiment.
Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and the specific embodiment of the present invention does not limit the specific implementation of the electronic device.
As shown in fig. 4, the electronic device may include: a processor (processor)402, a Communications Interface 404, a memory 406, and a Communications bus 408.
Wherein:
the processor 402, communication interface 404, and memory 406 communicate with each other via a communication bus 408.
A communication interface 404 for communicating with network elements of other devices, such as clients or other servers.
The processor 402 is configured to execute the program 410, and may specifically execute relevant steps in the above-described embodiment of the fault location method based on multiple levels of network nodes.
In particular, program 410 may include program code comprising computer operating instructions.
The processor 402 may be a central processing unit CPU or an application Specific Integrated circuit asic or one or more Integrated circuits configured to implement embodiments of the present invention. The electronic device comprises one or more processors, which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.
And a memory 406 for storing a program 410. Memory 406 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
The program 410 may be specifically configured to cause the processor 402 to perform the operations in the above-described method embodiments.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components in a system according to embodiments of the present invention. The present invention may also be embodied as apparatus or system programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several systems, several of these systems may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims (10)

1. An object behavior detection method based on video analysis comprises the following steps:
converting an original video into a video image sequence comprising a plurality of video frames, and detecting a target object contained in each video frame in the video image sequence;
respectively determining the position information of the target object in each video frame, and determining the motion track of the target object according to the position information;
detecting skeleton key point information of the target object contained in each video frame in the video image sequence according to the position information of the target object in each video frame;
determining the action category of a target object according to a pre-trained bone recognition model and the detected bone key point information of the target object contained in each video frame;
and detecting whether the behavior of the target object is abnormal or not according to the motion track of the target object and the action category of the target object.
2. The method of claim 1, wherein said detecting a target object contained in each video frame in the sequence of video images comprises:
detecting a candidate object contained in each video frame in the sequence of video images;
and screening target objects contained in the video image sequence according to the candidate objects contained in the video frames.
3. The method according to claim 1 or 2, wherein the detecting of the skeletal keypoint information of the target object contained in each video frame of the sequence of video images from the position information of the target object in each video frame comprises:
detecting bone key point information of candidate objects contained in each video frame in the video image sequence;
and screening the bone key point information of the target object from the bone key point information of the candidate object according to the position information of the target object in each video frame.
4. The method of claim 3, wherein said detecting skeletal keypoint information of a candidate object contained in a respective video frame of said sequence of video images comprises:
and detecting the bone key point information of the candidate object contained in each video frame in the video image sequence by utilizing an OpenPose algorithm.
5. The method of claim 1, wherein the determining the position information of the target object in each video frame comprises:
and determining the position information of the target object in each video frame according to the tracking frame of the target object.
6. The method of claim 1, wherein said detecting a target object contained in each video frame in the sequence of video images comprises:
the YOLOV3 algorithm is used to detect a target object contained in each video frame in the sequence of video images.
7. The method of claim 1, wherein the determining a motion trajectory of the target object from the location information comprises:
estimating a prediction tracking result of a target object according to the position information of the target object in the previous video frame number in the two adjacent video frame numbers by using a Kalman filtering algorithm;
and judging whether the predicted tracking result of the target object is matched with the actual detection result of the target object or not from the two aspects of the motion matching degree and the apparent matching degree.
8. An object behavior detection apparatus based on video analysis, comprising:
the target object detection module is suitable for converting an original video into a video image sequence containing a plurality of video frames and detecting a target object contained in each video frame in the video image sequence;
the motion track determining module is suitable for respectively determining the position information of the target object in each video frame and determining the motion track of the target object according to the position information;
the skeleton key point information detection module is suitable for detecting skeleton key point information of the target object contained in each video frame in the video image sequence according to the position information of the target object in each video frame;
the action category determining module is suitable for determining the action category of the target object according to a bone recognition model trained in advance and the detected bone key point information of the target object contained in each video frame;
and the abnormal behavior judging module is suitable for detecting whether the behavior of the target object is abnormal or not according to the motion track of the target object and the action category of the target object.
9. An electronic device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the object behavior detection method based on video analysis in any one of claims 1-7.
10. A computer storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to perform operations corresponding to the method for detecting object behavior based on video analytics as claimed in any one of claims 1 to 7.
CN201910585625.1A 2019-07-01 2019-07-01 Object behavior detection method and device based on video analysis Pending CN112183153A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910585625.1A CN112183153A (en) 2019-07-01 2019-07-01 Object behavior detection method and device based on video analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910585625.1A CN112183153A (en) 2019-07-01 2019-07-01 Object behavior detection method and device based on video analysis

Publications (1)

Publication Number Publication Date
CN112183153A true CN112183153A (en) 2021-01-05

Family

ID=73914280

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910585625.1A Pending CN112183153A (en) 2019-07-01 2019-07-01 Object behavior detection method and device based on video analysis

Country Status (1)

Country Link
CN (1) CN112183153A (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112766192A (en) * 2021-01-25 2021-05-07 北京市地铁运营有限公司地铁运营技术研发中心 Intelligent train monitoring system
CN112818320A (en) * 2021-02-26 2021-05-18 上海交通大学 Smartphone pattern password conjecture method based on video
CN112822460A (en) * 2021-02-01 2021-05-18 深圳市瑞驰文体发展有限公司 Billiard game video monitoring method and system
CN112926541A (en) * 2021-04-09 2021-06-08 济南博观智能科技有限公司 Sleeping post detection method and device and related equipment
CN113096158A (en) * 2021-05-08 2021-07-09 北京灵汐科技有限公司 Moving object identification method and device, electronic equipment and readable storage medium
CN113221704A (en) * 2021-04-30 2021-08-06 陕西科技大学 Animal posture recognition method and system based on deep learning and storage medium
CN113378799A (en) * 2021-07-21 2021-09-10 山东大学 Behavior recognition method and system based on target detection and attitude detection framework
CN113473124A (en) * 2021-05-28 2021-10-01 北京达佳互联信息技术有限公司 Information acquisition method and device, electronic equipment and storage medium
CN113763429A (en) * 2021-09-08 2021-12-07 广州市健坤网络科技发展有限公司 Pig behavior recognition system and method based on video
CN114254492A (en) * 2021-12-08 2022-03-29 新国脉文旅科技有限公司 Passenger flow behavior track destination simulation method based on passenger flow portrayal
CN114285960A (en) * 2022-01-29 2022-04-05 北京卡路里信息技术有限公司 Video processing method and device
CN114677625A (en) * 2022-03-18 2022-06-28 北京百度网讯科技有限公司 Object detection method, device, apparatus, storage medium and program product
CN114757855A (en) * 2022-06-16 2022-07-15 广州三七极耀网络科技有限公司 Method, device, equipment and storage medium for correcting action data
CN115083022A (en) * 2022-08-22 2022-09-20 深圳比特微电子科技有限公司 Pet behavior identification method and device and readable storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105718857A (en) * 2016-01-13 2016-06-29 兴唐通信科技有限公司 Human body abnormal behavior detection method and system
CN109460702A (en) * 2018-09-14 2019-03-12 华南理工大学 Passenger's abnormal behaviour recognition methods based on human skeleton sequence

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105718857A (en) * 2016-01-13 2016-06-29 兴唐通信科技有限公司 Human body abnormal behavior detection method and system
CN109460702A (en) * 2018-09-14 2019-03-12 华南理工大学 Passenger's abnormal behaviour recognition methods based on human skeleton sequence

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
NICOLAI WOJKE等: "SIMPLE ONLINE AND REALTIME TRACKING WITH A DEEP ASSOCIATION METRIC", 《ARXIV:1703.07402V1》, pages 1 - 5 *
SIJIE YAN等: "Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition", 《ARXIV:1801.07455V2》, pages 1 - 10 *
ZHE CAO等: "Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields", 《ARXIV:1611.08050V2》, pages 1 - 9 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112766192A (en) * 2021-01-25 2021-05-07 北京市地铁运营有限公司地铁运营技术研发中心 Intelligent train monitoring system
CN112822460A (en) * 2021-02-01 2021-05-18 深圳市瑞驰文体发展有限公司 Billiard game video monitoring method and system
CN112818320A (en) * 2021-02-26 2021-05-18 上海交通大学 Smartphone pattern password conjecture method based on video
CN112926541B (en) * 2021-04-09 2022-11-08 济南博观智能科技有限公司 Sleeping post detection method and device and related equipment
CN112926541A (en) * 2021-04-09 2021-06-08 济南博观智能科技有限公司 Sleeping post detection method and device and related equipment
CN113221704A (en) * 2021-04-30 2021-08-06 陕西科技大学 Animal posture recognition method and system based on deep learning and storage medium
CN113096158A (en) * 2021-05-08 2021-07-09 北京灵汐科技有限公司 Moving object identification method and device, electronic equipment and readable storage medium
CN113473124A (en) * 2021-05-28 2021-10-01 北京达佳互联信息技术有限公司 Information acquisition method and device, electronic equipment and storage medium
CN113473124B (en) * 2021-05-28 2024-02-06 北京达佳互联信息技术有限公司 Information acquisition method, device, electronic equipment and storage medium
CN113378799A (en) * 2021-07-21 2021-09-10 山东大学 Behavior recognition method and system based on target detection and attitude detection framework
CN113763429A (en) * 2021-09-08 2021-12-07 广州市健坤网络科技发展有限公司 Pig behavior recognition system and method based on video
CN114254492A (en) * 2021-12-08 2022-03-29 新国脉文旅科技有限公司 Passenger flow behavior track destination simulation method based on passenger flow portrayal
CN114285960A (en) * 2022-01-29 2022-04-05 北京卡路里信息技术有限公司 Video processing method and device
CN114285960B (en) * 2022-01-29 2024-01-30 北京卡路里信息技术有限公司 Video processing method and device
CN114677625A (en) * 2022-03-18 2022-06-28 北京百度网讯科技有限公司 Object detection method, device, apparatus, storage medium and program product
CN114677625B (en) * 2022-03-18 2023-09-08 北京百度网讯科技有限公司 Object detection method, device, apparatus, storage medium, and program product
CN114757855A (en) * 2022-06-16 2022-07-15 广州三七极耀网络科技有限公司 Method, device, equipment and storage medium for correcting action data
CN115083022A (en) * 2022-08-22 2022-09-20 深圳比特微电子科技有限公司 Pet behavior identification method and device and readable storage medium

Similar Documents

Publication Publication Date Title
CN112183153A (en) Object behavior detection method and device based on video analysis
CN112990432B (en) Target recognition model training method and device and electronic equipment
CN109978893B (en) Training method, device, equipment and storage medium of image semantic segmentation network
CN110633745B (en) Image classification training method and device based on artificial intelligence and storage medium
CN107563372B (en) License plate positioning method based on deep learning SSD frame
CN108230291B (en) Object recognition system training method, object recognition method, device and electronic equipment
CN111814661B (en) Human body behavior recognition method based on residual error-circulating neural network
CN110738101A (en) Behavior recognition method and device and computer readable storage medium
CN107633226B (en) Human body motion tracking feature processing method
CN109002755B (en) Age estimation model construction method and estimation method based on face image
Chen et al. Research on recognition of fly species based on improved RetinaNet and CBAM
CN109658442B (en) Multi-target tracking method, device, equipment and computer readable storage medium
CN104537647A (en) Target detection method and device
Ahmad et al. Overhead view person detection using YOLO
CN109919223B (en) Target detection method and device based on deep neural network
CN110008899B (en) Method for extracting and classifying candidate targets of visible light remote sensing image
Li et al. Robust vehicle detection in high-resolution aerial images with imbalanced data
CN114821014A (en) Multi-mode and counterstudy-based multi-task target detection and identification method and device
CN105303163A (en) Method and detection device for target detection
CN113378675A (en) Face recognition method for simultaneous detection and feature extraction
Sismananda et al. Performance comparison of yolo-lite and yolov3 using raspberry pi and motioneyeos
CN111274964A (en) Detection method for analyzing water surface pollutants based on visual saliency of unmanned aerial vehicle
CN114359787A (en) Target attribute identification method and device, computer equipment and storage medium
CN113870254A (en) Target object detection method and device, electronic equipment and storage medium
CN113780145A (en) Sperm morphology detection method, sperm morphology detection device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210105

RJ01 Rejection of invention patent application after publication