CN116883453A

CN116883453A - Motion direction prediction method, electronic equipment and storage medium

Info

Publication number: CN116883453A
Application number: CN202310747820.6A
Authority: CN
Inventors: 狄建锴; 刘微; 曲磊; 赵长福; 辛洪录
Original assignee: Hisense Group Holding Co Ltd
Current assignee: Hisense Group Holding Co Ltd
Priority date: 2023-06-21
Filing date: 2023-06-21
Publication date: 2023-10-13

Abstract

The application discloses a motion direction prediction method, electronic equipment and a storage medium. And determining the gesture key points of the target object in the sub-image based on the motion direction prediction model, determining the motion direction probability value of the target object according to the sub-image and the gesture key points, predicting the target motion direction of the target object according to the motion direction probability value, and tracking the target object according to the target motion direction. Compared with a scheme of realizing target tracking based on target detection results of videos acquired in a period of time, the method and the device improve tracking efficiency and have good tracking instantaneity. The scheme provided by the application has the characteristics of high accuracy and high reasoning speed, and accords with the credibility characteristic.

Description

Motion direction prediction method, electronic equipment and storage medium

Technical Field

The present application relates to the field of machine vision, and in particular, to a motion direction prediction method, an electronic device, and a storage medium.

Background

Old urban areas of various metropolitan areas are long in history construction, complex identical or hall staggered minor road intersections exist in many cases, and the complex road networks are all around and all around, so that difficulties are brought to target tracking. In the context of target detection and tracking, the following schemes are generally included:

firstly, detecting pixel change relations in a plurality of detection frames by reasonably arranging the positions of the detection frames, and performing secondary analysis processing on pixels in a motion area of a frame differential image by utilizing the characteristic that a difference result between a target head and a target tail is relatively obvious by using an inter-frame difference method. The second, non-regular multi-expansion target joint tracking and classification based on the multiple Bernoulli. Thirdly, sensing traffic flow parameters such as moving target speed, target size information and flow in a traffic system through a computer vision technology based on a deep learning model, monitoring real-time traffic conditions, and automatically detecting and analyzing videos of the moving conditions and abnormal behaviors of the targets.

The three schemes are all used for realizing target tracking based on target detection results of videos acquired in a period of time, and the tracking efficiency is low and the real-time performance is poor.

Disclosure of Invention

The application provides a motion direction prediction method, electronic equipment and a storage medium, which are used for solving the problems of low target tracking efficiency and poor real-time performance in the prior art.

In a first aspect, the present application provides a motion direction prediction method, the method comprising:

determining a detection frame sub-image of a target object to be tracked in the target image;

inputting the sub-image into a pre-trained motion direction prediction model for processing to obtain a motion direction probability value of the target object; the motion direction prediction model is used for determining gesture key points of the target object in the sub-image and determining the motion direction probability value based on the sub-image and the gesture key points;

and predicting the target movement direction of the target object according to the movement direction probability value.

In a second aspect, the present application provides a motion direction prediction apparatus, the apparatus comprising:

the determining module is used for determining a detection frame sub-image of the target object to be tracked in the target image;

the input module is used for inputting the sub-image into a pre-trained motion direction prediction model for processing to obtain a motion direction probability value of the target object; the motion direction prediction model is used for determining gesture key points of the target object in the sub-image and determining the motion direction probability value based on the sub-image and the gesture key points;

And the prediction module is used for predicting the target motion direction of the target object according to the motion direction probability value.

In a third aspect, the present application provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;

a memory for storing a computer program;

and the processor is used for realizing the steps of the method when executing the program stored in the memory.

In a fourth aspect, the present application provides a computer readable storage medium having a computer program stored therein, which when executed by a processor, implements the method steps.

The application provides a motion direction prediction method, electronic equipment and a storage medium, wherein the method comprises the following steps: determining a detection frame sub-image of a target object to be tracked in the target image; inputting the sub-image into a pre-trained motion direction prediction model for processing to obtain a motion direction probability value of the target object; the motion direction prediction model is used for determining gesture key points of the target object in the sub-image and determining the motion direction probability value based on the sub-image and the gesture key points; and predicting the target movement direction of the target object according to the movement direction probability value.

The technical scheme has the following advantages or beneficial effects:

in the application, the electronic equipment trains a motion direction prediction model in advance, and inputs a sub-image into the motion direction prediction model after determining a detection frame sub-image of a target object to be tracked for the target image. And determining the gesture key points of the target object in the sub-image based on the motion direction prediction model, determining the motion direction probability value of the target object according to the sub-image and the gesture key points, predicting the target motion direction of the target object according to the motion direction probability value, and tracking the target object according to the target motion direction. Compared with a scheme of realizing target tracking based on target detection results of videos acquired in a period of time, the method and the device improve tracking efficiency and have good tracking instantaneity. The scheme provided by the application has the characteristics of high accuracy and high reasoning speed, and accords with the credibility characteristic.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a first motion direction prediction process according to the present application;

FIG. 2 is a schematic diagram of a second motion direction prediction process according to the present application;

FIG. 3 is a schematic diagram of a training process of a motion direction prediction model provided by the application;

FIG. 4 is a schematic diagram of a process for acquiring sample continuous frame video in a training set according to the present application;

FIG. 5 is a schematic diagram of a third motion direction prediction process according to the present application;

FIG. 6 is a schematic diagram of a fourth motion direction prediction process according to the present application;

FIG. 7 is a schematic diagram of a fifth motion direction prediction process according to the present application;

FIG. 8 is a schematic diagram of a sixth motion direction prediction process according to the present application;

FIG. 9 is a diagram of a motion direction prediction framework provided by the present application;

FIG. 10 is a schematic diagram of a training module frame of a motion direction prediction model provided by the application;

FIG. 11 is a diagram of an object detection interface provided by the present application;

FIG. 12 is a schematic diagram of selecting a target object according to the present application;

FIG. 13 is a block diagram of a motion direction prediction application module provided by the present application;

FIG. 14 is a diagram of a frame of a motion direction prediction defense deployment module provided by the application;

FIG. 15 is a schematic diagram of a trip wire of an algorithm configuration provided by the present application;

FIG. 16 is a schematic diagram of a tripwire area of an algorithm configuration provided by the present application;

FIG. 17 is a schematic diagram of the preliminary result estimated by the algorithm of the possible motion direction of each detection target according to the present application;

FIG. 18 is a schematic diagram of the estimation result of the algorithm of the possible motion direction of each detection target according to the present application;

FIG. 19 is a schematic diagram of the result of performing motion object estimation in the configuration of the tripwire region provided by the present application;

FIG. 20 is a schematic diagram of a motion direction prediction apparatus according to the present application;

fig. 21 is a schematic structural diagram of an electronic device according to the present application.

Detailed Description

For the purposes of making the objects and embodiments of the present application more apparent, an exemplary embodiment of the present application will be described in detail below with reference to the accompanying drawings in which exemplary embodiments of the present application are illustrated, it being apparent that the exemplary embodiments described are only some, but not all, of the embodiments of the present application.

It should be noted that the brief description of the terminology in the present application is for the purpose of facilitating understanding of the embodiments described below only and is not intended to limit the embodiments of the present application. Unless otherwise indicated, these terms should be construed in their ordinary and customary meaning.

The terms first, second, third and the like in the description and in the claims and in the above-described figures are used for distinguishing between similar or similar objects or entities and not necessarily for describing a particular sequential or chronological order, unless otherwise indicated. It is to be understood that the terms so used are interchangeable under appropriate circumstances.

The terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to all elements explicitly listed, but may include other elements not expressly listed or inherent to such product or apparatus.

The term "module" refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware or/and software code that is capable of performing the function associated with that element.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the application.

The foregoing description, for purposes of explanation, has been presented in conjunction with specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the embodiments to the precise forms disclosed above. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles and the practical application, to thereby enable others skilled in the art to best utilize the embodiments and various embodiments with various modifications as are suited to the particular use contemplated.

Fig. 1 is a schematic diagram of a motion direction prediction process provided by the present application, where the process includes the following steps:

s101: and determining a detection frame sub-image of the target object to be tracked in the target image.

S102: inputting the sub-image into a pre-trained motion direction prediction model for processing to obtain a motion direction probability value of the target object; the motion direction prediction model is used for determining gesture key points of the target object in the sub-image, and determining the motion direction probability value based on the sub-image and the gesture key points.

S103: and predicting the target movement direction of the target object according to the movement direction probability value.

The motion direction prediction method provided by the application is applied to electronic equipment, and the electronic equipment can be PC, tablet personal computer and other equipment, and can also be a server.

The electronic equipment determines a detection frame sub-image of the target object to be tracked in the target image, wherein a user can frame a detection frame of the target object to be tracked in the target image, and the electronic equipment determines the sub-image contained in the frame detection frame. Or detecting the target object to be tracked in the target image through a target object detection algorithm or a pre-trained target object detection model to obtain a detection frame of the target object, and determining the sub-image contained in the detection frame selected by the electronic equipment.

And the electronic equipment stores a pre-trained motion direction prediction model, and optionally, the motion direction prediction model can be obtained by training according to a sample image in a training set and information of a motion direction marked by a sample object in the sample image, when the motion direction prediction model is trained, firstly extracting gesture key points of the sample object in the sample image, then estimating the motion direction of the sample object according to the gesture key points of the sample image and the sample object, calculating a model loss value according to the estimated motion direction and the marked motion direction, adjusting model parameters of the motion direction prediction model according to the model loss value, and finally finishing the motion direction prediction model training when the model loss value meets the requirement. The electronic equipment inputs the determined detection frame sub-image of the target object into a pre-trained motion direction prediction model, the motion direction prediction model firstly determines gesture key points of the target object in the sub-image, and then determines a motion direction probability value of the target object according to the sub-image and the gesture key points. Alternatively, a probability value of the movement of the target object toward a plurality of directions around may be determined, and the target movement direction of the target object may be predicted based on the movement direction probability value. Alternatively, the motion direction with the largest probability value is taken as the target motion direction of the predicted target object.

For example, the probability value of the movement direction of the target object is determined to be 80% for the target object moving forward, 30% for the target object moving backward, 50% for the target object moving leftward, and 40% for the target object moving rightward, based on the sub-image and the posture key point. The probability value of the forward movement is determined to be highest, and thus the predicted target movement direction of the target object is the forward movement of the target object.

Trusted characteristics:

(1) Real-time performance: the scheme can feed back the reasoning result of the acquired frame in real time under the condition that each 5 frames of video frames are extracted, and accords with the real-time characteristic in the credibility characteristic;

(2) Controllability: the method and the device for predicting the direction of the moving target can set the opening and closing time period, and accord with the characteristic of controllability.

In view of the fact that the movement of the target object is continuous, in order to more accurately predict the direction of movement of the target object, the method according to the application comprises:

acquiring a plurality of images including the target image acquired in a preset period, and determining a detection frame sub-image of the target object in the images;

inputting a plurality of sub-images into the motion direction prediction model for processing to obtain a motion direction probability value of the target object; the motion direction prediction model is used for determining gesture key points of the target object corresponding to the sub-images respectively, and determining the motion direction probability value based on the sub-images and the gesture key points corresponding to the sub-images respectively.

In the application, the electronic device acquires a plurality of images including the target image acquired in the preset period, and preferably, the last frame of image acquired in the preset period can be used as the target image. The preset period is, for example, 2 seconds, 3 seconds, or the like. After a plurality of images are acquired, detection frame sub-images of a target object in the images are respectively determined, and then the sub-images are respectively input into a motion direction prediction model. The motion direction prediction model firstly determines gesture key points of target objects corresponding to the sub-images respectively, and determines the corresponding relation between the sub-images and the gesture key points. That is, determining a plurality of sub-images and gesture key points corresponding to the plurality of sub-images. And finally, determining a motion direction probability value of the target object in the target image according to the plurality of sub-images and the corresponding gesture key points.

Optionally, during training of the motion direction prediction model, a first sub-model in the motion direction prediction model may be trained by using sample continuous frame images in a training set, where the first sub-model is used for learning a correspondence between sub-images and gesture key points. And training the first sub-model by utilizing the corresponding relation between the sample continuous frame images in the training set and the pre-labeled sample images and the gesture key points of the sample objects when training the first sub-model. And training a second sub-model in the motion direction prediction model by utilizing the motion direction of the sample object corresponding to the pre-marked sample continuous frame image, wherein the second sub-model is used for learning the information of each motion direction of the sample object. And acquiring a plurality of images including the target image acquired in a preset period, determining detection frame sub-images of the target object in the images, and inputting the sub-images into the motion direction prediction model. Based on a first sub-model in the motion direction prediction model, gesture key points of target objects corresponding to the plurality of sub-images can be determined. Based on a second sub-model in the motion direction prediction model, a motion direction probability value of the target object in the target image can be determined according to the plurality of sub-images and the corresponding gesture key points.

Fig. 2 is a schematic diagram of a motion direction prediction process provided by the present application, including the following steps:

s201: and acquiring a plurality of images including the target image acquired in a preset period, and determining a detection frame sub-image of the target object in the images.

S202: inputting a plurality of sub-images into the motion direction prediction model for processing to obtain a motion direction probability value of the target object; the motion direction prediction model is used for determining gesture key points of the target object corresponding to the sub-images respectively, and determining the motion direction probability value based on the sub-images and the gesture key points corresponding to the sub-images respectively.

S203: and predicting the target movement direction of the target object according to the movement direction probability value.

In order to make the prediction result of the motion direction prediction model more accurate, in the present application, the training process of the motion direction prediction model includes:

acquiring sample continuous frame videos in a training set, inputting a sample image in the sample continuous frame videos, gesture key points of a sample object in the sample image and a corresponding relation between the marked sample image and the gesture key points of the sample object into an initial motion direction prediction model, and training the initial motion direction prediction model;

Inputting a sample image in the sample continuous frame video, gesture key points of a sample object in the sample image and semantic description information corresponding to the sample continuous frame video into a trained initial motion direction prediction model, and training the trained initial motion direction prediction model to obtain the motion direction prediction model.

In the application, the electronic equipment acquires a large number of sample continuous frame videos in a training set, and the corresponding relation between each frame of sample image and the gesture key points of the sample objects in the frame of sample image is marked for each frame of sample image. For each sample continuous frame video, semantic description information of the sample continuous frame video is marked. The semantic description information carries the movement direction information of the sample object. For example, the semantic description information is "a man wearing a yellow-green ball shirt runs from the middle of a square to the upper right of the picture, then the man runs back to the middle of the picture, then the man runs back to the upper left of the picture, then the man runs away from the picture along the center line of the picture, and disappears above the picture. And inputting the initial motion direction prediction model into the corresponding relation between the sample images in the sample continuous frame video, the gesture key points of the sample objects in the sample images and the gesture key points of the marked sample images and the sample objects, and training the initial motion direction prediction model. Therefore, the motion direction prediction model can learn the corresponding relation between the sample image and the gesture key points of the sample object.

Further, inputting the sample images in the sample continuous frame video, the gesture key points of the sample objects in the sample images and the semantic description information corresponding to the sample continuous frame video into a trained initial motion direction prediction model, and training the trained initial motion direction prediction model to obtain a motion direction prediction model. Thus, the motion direction prediction model can learn the information of each motion direction of the sample object.

Fig. 3 is a schematic diagram of a training process of a motion direction prediction model provided by the present application, including the following steps:

s301: and acquiring sample continuous frame videos in a training set, inputting a corresponding relation between a sample image in the sample continuous frame videos, a gesture key point of a sample object in the sample image and the gesture key point of the marked sample image and the sample object into an initial motion direction prediction model, and training the initial motion direction prediction model.

S302: inputting a sample image in the sample continuous frame video, gesture key points of a sample object in the sample image and semantic description information corresponding to the sample continuous frame video into a trained initial motion direction prediction model, and training the trained initial motion direction prediction model to obtain the motion direction prediction model.

In consideration of fewer sample continuous frame videos in the database, in order to enrich the training set and obtain a motion direction prediction model with higher accuracy, in the application, each sample continuous frame video stored in the database is called each first sample continuous frame video. Each first sample successive frame video has corresponding semantic description information. The electronic equipment acquires a first sample continuous frame video stored in a database and semantic description information corresponding to the first sample continuous frame video. And then, transforming the human body posture in the first sample image in the first sample continuous frame video by using a human body posture transformation algorithm to obtain a second sample continuous frame video. For example, the sample object in the first sample image faces the stage, and the human body posture transformation algorithm is used for transforming the posture of the sample object in the first sample image, so that the facing stage is transformed to be opposite to the stage. And carrying out continuous gesture transformation on the sample objects in the first sample continuous frame video, so that a second sample continuous frame video can be obtained, and generating semantic description information corresponding to the second sample continuous frame video according to the gesture information of the sample objects in the second sample continuous frame video.

And finally, taking the first sample continuous frame video and the second sample continuous frame video as sample continuous frame videos in the training set. Therefore, the number of sample continuous frame videos in the training set is expanded, and the accuracy of the motion direction prediction model is improved.

Fig. 4 is a schematic diagram of a process for acquiring a sample continuous frame video in a training set according to the present application, including the following steps:

s401: and acquiring the first sample continuous frame video stored in the database and semantic description information corresponding to the first sample continuous frame video.

S402: and transforming the human body gestures in the first sample images in the first sample continuous frame video by using a human body gesture transformation algorithm to obtain a second sample continuous frame video, and generating semantic description information corresponding to the second sample continuous frame video.

S403: and taking the first sample continuous frame video and the second sample continuous frame video as sample continuous frame videos in a training set.

In the application, a Stable Diffusion model (SD) can be adopted to transform the human body gesture in the first sample image in the first sample continuous frame video, so as to obtain the second sample continuous frame video. Preferably, the effect of SD on generating a scene picture of a single frame image is considered to be good, but due to randomness and uncertainty of SD model scene picture generation, the SD has limited capability in generating an event continuous frame video, so that the application retrains the original stable diffusion model SD, and in the model retraining process, a training data set supplements a large number of event video continuous frames, and supplements optical flow information and semantic information of each continuous frame, thereby obtaining a retraining model of a continuous frame diffusion model (continuous-Frame Diffusion Models, CFDM-PC). And transforming the human body posture in the first sample image in the first sample continuous frame video by using the CFDM-PC to obtain a second sample continuous frame video.

In order to make the determination of the movement direction probability value of the target object more accurate, in the present application, determining the movement direction probability value of the target object includes:

and determining the movement direction probability values of the target object in a plurality of directions, and enhancing the movement direction probability values of the tripwire directions according to the preset probability values in the tripwire directions of the preset tripwire areas.

In the application, a trip line area is preset, the trip line area is an area focused on, and after a motion direction probability value of a target object in a plurality of directions is determined, for example, the trip line area is positioned at the left side of the target object, the left side direction of the target object is determined to be the trip line direction facing the trip line area; for another example, if the trip wire region is on the right side of the target object, the right side direction of the target object is determined to be the trip wire direction toward the trip wire region. And enhancing the probability value of the motion direction of the tripwire direction according to the preset probability value. Optionally, adding a preset probability value to the motion direction probability value of the trip wire direction to obtain a final motion direction probability value of the trip wire direction. For example, if the right direction of the target object is the trip line direction toward the trip line region, and the probability value of the right direction of the target object is determined to be 50% based on the motion direction prediction model, and the preset probability value is 20%, the probability value obtained by enhancing the motion direction probability value of the trip line direction according to the preset probability value is 50% +20% =70%.

Fig. 5 is a schematic diagram of a motion direction prediction process provided by the present application, including the following steps:

s501: and determining a detection frame sub-image of the target object to be tracked in the target image.

S502: and inputting the sub-image into a pre-trained motion direction prediction model to process, so as to obtain motion direction probability values of the target object in multiple directions, and enhancing the motion direction probability values of the tripwire directions according to the preset probability values in the tripwire directions of the preset tripwire areas.

S503: and predicting the target movement direction of the target object according to the movement direction probability value.

In the present application, the enhancing the motion direction probability value of the trip wire direction in the trip wire direction toward the preset trip wire region according to the preset probability value includes:

and determining first position information of the target object in the target image, and if a trigger probability enhancing condition is determined according to the first position information and second position information of the preset tripwire area, enhancing a movement direction probability value of the tripwire direction according to a preset probability value in the tripwire direction of the preset tripwire area.

In the application, when the target object is far away from the tripwire area, the motion direction probability value of the tripwire direction is not enhanced, and when the target object is near to the tripwire area, the motion direction probability value of the tripwire direction is enhanced. Specifically, first position information of the target object in the target image is determined, and second position information of the trip wire region is stored in advance. And judging whether to trigger the probability enhancing condition according to the first position information and the second position information. The probability enhancing condition is, for example, a probability enhancing condition is triggered when it is determined that the target object is less than a set distance threshold from the tripwire region based on the first position information and the second position information; or triggering a probability enhancing condition when it is determined that the target object and the tripwire region intersect based on the first location information and the second location information. When the trigger probability enhancing condition is determined, in the trip wire direction towards the preset trip wire area, enhancing the motion direction probability value of the trip wire direction according to the preset probability value.

Fig. 6 is a schematic diagram of a motion direction prediction process provided by the present application, including the following steps:

s601: and determining a detection frame sub-image of the target object to be tracked in the target image.

S602: inputting the sub-image into a pre-trained motion direction prediction model for processing to obtain motion direction probability values of the target object in multiple directions, determining first position information of the target object in the target image, and if a trigger probability enhancing condition is determined according to the first position information and second position information of the preset tripwire area, enhancing the motion direction probability value of the tripwire direction according to the preset probability value in the tripwire direction of the preset tripwire area.

S603: and predicting the target movement direction of the target object according to the movement direction probability value.

Fig. 7 is a schematic diagram of a motion direction prediction process provided by the present application, including the following steps:

s701: and determining a detection frame sub-image of the target object to be tracked in the target image.

S702: inputting the sub-image into a pre-trained motion direction prediction model for processing to obtain motion direction probability values of the target object in multiple directions, determining first position information of the target object in the target image, and if a trigger probability enhancing condition is determined according to the first position information and second position information of the preset tripwire area, enhancing the motion direction probability value of the tripwire direction according to the preset probability value in the tripwire direction of the preset tripwire area.

S703: comparing the motion direction probability values of the multiple motion directions, and selecting a target motion direction probability value according to a comparison result; and predicting the target motion direction of the target object according to the target motion direction probability value.

In the application, optionally, the motion direction probability values of a plurality of motion directions are compared, and the maximum motion direction probability value is selected as the target motion direction probability value according to the comparison result. Or selecting a plurality of movement direction probability values as target movement direction probability values according to the comparison result from large to small. The motion direction probability values of the multiple motion directions can be ranked according to the sequence from large to small, then the motion direction probability value of the first ranking is selected as a target motion direction probability value, and the motion direction corresponding to the target motion direction probability value is used as the target motion direction of the predicted target object. The motion direction probability values of the multiple motion directions can be ranked according to the sequence from small to large, then the motion direction probability value at the tail of the ranking is selected as a target motion direction probability value, and the motion direction corresponding to the target motion direction probability value is used as the target motion direction of the predicted target object. In some cases, after the magnitude order of the motion direction probability values of the plurality of motion directions, at least two motion direction probability values may be selected as the target motion direction probability values in the order from the large to the small, so that the target motion direction of the output predicted target object, that is, at least two. The manager can track and manage or early warn the target object according to the output at least two target movement directions.

In order to prompt the manager that the target object is likely to enter the tripwire area, the method further comprises the following steps:

if the target moving direction is the tripwire direction, outputting high-probability early warning prompt information for representing the tripwire direction, wherein the probability value of the moving direction corresponding to the target moving direction is larger than a preset warning threshold value.

The preset alarm threshold is, for example, 60%, 70%, etc. Optionally, the audible and visual alarm can output high probability early warning prompt information for representing the trip wire direction; or can send high probability early warning prompt information for representing the tripwire direction, etc. to the terminal equipment of the manager.

Fig. 8 is a schematic diagram of a motion direction prediction process provided by the present application, including the following steps:

s801: and determining a detection frame sub-image of the target object to be tracked in the target image.

S802: inputting the sub-image into a pre-trained motion direction prediction model for processing to obtain a motion direction probability value of the target object; the motion direction prediction model is used for determining gesture key points of the target object in the sub-image, and determining the motion direction probability value based on the sub-image and the gesture key points.

S803: and predicting the target movement direction of the target object according to the movement direction probability value.

S804: if the target moving direction is the tripwire direction, outputting high-probability early warning prompt information for representing the tripwire direction, wherein the probability value of the moving direction corresponding to the target moving direction is larger than a preset warning threshold value.

The following describes in detail the motion direction prediction process provided by the present application with reference to the accompanying drawings.

And combining the video and image databases of the moving direction of the target object to form semantic descriptions on the continuous frames of the moving picture of the target and the side face and back shadow of the moving target in each frame, and generating a continuous frame generation data set of 6-8 directions of the future moving direction of the target object by combining the semantic descriptions by utilizing a continuous frame diffusion model (CFDM-PC) added with gesture conditions and a self-attention mechanism (or GAN). The training mode of the self-supervision learning model is adopted, a training set generated by continuous frame video data collected on site and a self-attention mechanism (or GAN) is fully utilized, the trained model can pre-judge the probability of a motion direction when a certain motion target moves forwards or backwards, and provide the information of the probability of the motion direction for tracking service, so that the tracking efficiency is improved.

Fig. 9 is a motion direction prediction framework diagram provided by the application, and as shown in fig. 9, the motion direction prediction framework diagram comprises a motion direction prediction model training module, a motion direction prediction defense arrangement module and a motion direction prediction application module. The motion direction prediction model training module comprises semantic data annotation description, field collected data, a continuous frame diffusion model (CFDM-PC) with added gesture conditions, a continuous frame generation data set of 6-8 directions of future motion directions of a moving target, a motion direction prediction model obtained through self-supervision learning training, and a probability value for predicting the motion of the moving target to multiple directions. The motion direction prediction defense setting module comprises a tripping line drawing module for drawing tripping lines of all complex intersections, a tripping line area (forbidden area) drawing module for drawing tripping lines of all complex intersections, a reinforcement increment learning information providing module and a forward guiding motion target motion direction prediction model. The motion direction prediction application module comprises a rectangular search frame of each target in a picture, a target object is outlined, target tracking is carried out, and a motion direction probability value visualization list of the target object is output based on a motion direction prediction model.

When the motion direction prediction model is obtained through self-supervised learning training, 15% of sample continuous frame videos are sourced from a real video library and an image library (first sample continuous frame videos), and 85% of data are sourced from data (second sample continuous frame videos) which are generated by a CFDM-PC self-attention mechanism (or GAN) and contain richer motion direction possibility information.

In a square gathering scene, the original video pictures recorded by the monitoring camera have extremely irregular running routes, and the number of the monitoring video pictures collected worldwide is less than 2 ten thousand continuous frame images. Based on the above consideration, in order to enrich the training set, in the application, firstly, the original video recorded by the square gathering is utilized to extract skeleton key points of each frame of image in the personnel video, the extracted skeleton points are matched with each frame in the video one to one, so as to manufacture "frame image-gesture key point" data, and at the same time, semantic annotation is carried out on the "frame image-gesture key point" video of continuous frames, such as: the man wearing the yellow-green ball shirt runs from the middle of the square to the upper right of the picture, then the man runs back to the middle of the picture, then the man runs back to the upper left of the picture, then the man deviates from the picture along the center line of the picture and disappears above the picture, so that a video pair of frame image-gesture key points (semantic labels) is formed, and the video pair is used as one of video data in the original video training set. According to the steps, the video pairs of frame image-gesture key points (semantic annotation) are matched with the original video of the collected square meeting monitoring records.

A large number of moving object "irregular moving direction video pairs" datasets are then newly generated using CFDM-PC. Combining the video pair matching data set of the frame image-gesture key point (semantic annotation), utilizing a human body gesture transformation algorithm, such as a continuous frame diffusion model (continuous-Frame Diffusion Models with Pose Conditions) added with gesture conditions, a control net and a Pose function thereof, a gesture generation countermeasure network (PoseGAN) and the like, modifying the existing gesture key points in the data of the frame image-gesture key point of the original data set, such as changing a continuous gesture deviating from a direction into a gesture deviating from a certain angle, such as a 20-degree angle or a 15-degree angle, or changing a continuous gesture running to the upper right side of a picture into a continuous gesture running to the upper left side, in the gesture generation process, ensuring that a generated image of each frame is combined with a corresponding gesture pair to form newly generated frame image-gesture key point data, and adding corresponding semantic annotation description while newly generating frame image-gesture key point data, such as: the method comprises the steps that a woman wearing a black coat runs to the left of a picture from the middle of a square, then runs to the upper left of the picture, then runs to the upper right of the picture, and then runs out of the picture to disappear from the upper right of the picture, so that a video pair of a frame image-gesture key point (semantic annotation) is newly generated, a series of video pairs of irregular motion directions of different targets are newly generated, the possible motion route directions of the square of the full corresponding original monitoring video are covered as much as possible, and a large number of data sets of video pairs of the irregular motion directions of different targets on the square are obtained. According to the steps, the video pairs of the frame image-gesture key points (semantic labels) of the original video are modified in a large number, and a large number of data sets of irregular moving direction video pairs of the moving target are newly generated.

FIG. 10 is a schematic diagram of a frame of a motion direction prediction model training module provided by the application, which comprises a motion direction video and image database of a motion target, semantic descriptions such as continuous frames of a motion picture of the target object, side surfaces and back images of the motion target in each frame, on-site collected data, continuous frame diffusion models (CFDM-PC) with additional gesture conditions, and CFDM-PC are used for amplifying continuous frame videos in the video and image database to obtain a motion direction continuous frame data set of the motion target for 6-8 directions, and CFDM-PC is used for amplifying continuous frame videos in the video and image database to strengthen incremental learning information, and the motion direction prediction model is obtained by combining on-site collected data and the motion direction continuous frame data set of 6-8 directions through self-supervision learning, and predicts and outputs a motion direction probability value of forward or backward motion of the motion target.

The application has better effect when generating a scene picture for a single frame image based on the original Stable Diffusion model (SD, stable Diffusion), but the SD has limited capability when generating an event continuous video due to randomness and uncertainty when generating the SD model scene picture, and retrains the original Stable Diffusion model, and in the model retraining process, a training data set supplements a large amount of event video continuous frames and supplements optical flow information of each continuous frame, thereby obtaining a retraining model of a continuous frame Diffusion model (continuous-Frame Diffusion Models, CFDM-PC). The CFDM-PC is an improvement on the basis of SD, and prompts such as continuous frame images, gesture information and optical flow information are added on the basis of SD, so that the CFDM-PC is obtained through training, and the effect is better when continuous videos of events are generated based on the CFDM-PC.

Then, when carrying out fine adjustment on the downstream task of the model generated by the gesture pair on the CFDM-PC pre-training model, introducing gesture key points of frames corresponding to continuous frames of a training set as input conditions, and simultaneously carrying out semantic description on the target gesture motion of each frame by utilizing a display Prompt dialog box promt, for example: a red-lined man is walking in a 45 ° lateral direction, etc., so as to train a continuous frame CFDM model that can understand the "image-pose pair", which can be an initial motion direction prediction model. At the same time, semantic description is performed on the whole continuous frame video pair through the Prompt, for example: "a man wearing a yellow-green jersey runs from the middle of the square to the upper right of the picture, then the man runs back to the middle of the picture, then the man runs back to the upper left of the picture, then the man deviates from the picture along the center line of the picture, disappears to the upper side of the picture", etc., so that the continuous frame diffusion model (continuous-Frame Diffusion Models with Pose Conditions) with increased posture conditions, which is more excellent in the ability of generating the downstream task for the "posture pair", is finally trained and recalled. According to the application, the CFDM-PC and the motion direction prediction model can be arranged separately, that is, the training set is expanded by the CFDM-PC, and then the independently arranged motion direction prediction model is trained by the expanded training set. When the motion direction prediction model is trained, an initial motion direction prediction model is firstly obtained through training, the initial motion direction prediction model can learn the corresponding relation between an image frame and a gesture key point, then the initial motion direction prediction model is retrained to obtain a final motion direction prediction model, and the motion direction prediction model can learn probability values of motion of a motion target in an image in all directions. The initial motion direction prediction model may be used as a first sub-model in the motion direction prediction model, and a probability value of motion of a moving object in a predicted image in each direction may be used as a second sub-model.

The self-supervision training of each irregular movement direction probability prediction model of the moving object is as follows:

after the original video 'frame image-gesture key points (semantic annotation)' video pairs and a large number of moving targets 'irregular moving direction video pairs' data sets are obtained, the data sets comprise 'frame image-gesture key points' matching pairs, semantic descriptions of the video pairs and 'irregular moving directions' of different targets, so that self-supervision training is carried out by utilizing the data sets, the obtained probability prediction model can learn what is the real-time moving direction or direction of an object, and the probability prediction capability of identifying each 'irregular moving direction' of a certain moving target from video continuous frames shot by a monitoring camera is provided.

Meanwhile, a rectangular search frame of each target in a picture is provided at the motion direction prediction application module, the search frame of the monitoring picture is configurable, and the rectangular frame can be manually specified for a specific single target or multiple targets on a frame image of a real-time picture through an operation interface and rapidly delineated through quick clicking.

FIG. 11 is a diagram of an object detection interface provided by the present application, including pedestrian detection, human body detection, pedestrian and human riding detection, escalator head detection, trip line area intrusion detection, driver detection, helmet detection, large piece of luggage detection, smoking-to-phone behavior detection, and the like.

Fig. 12 is a schematic view of target object selection provided by the present application, wherein the person in the rectangular frame in fig. 12 is the target object, and the target object is tracked and the movement direction is predicted at the same time after the frame selection. And when the target rectangular frame is provided for the frame image, a target tracking algorithm operated in the background immediately and specially circles the target, and starts target tracking. Meanwhile, the target direction estimation algorithm obtains the frame image and gives out the future motion direction probability estimation of the target in the specific rectangular frame in the image.

The motion direction prediction algorithm is realized by combining a motion direction prediction model of an irregular motion direction of a moving target with actual algorithm business logic.

Input: the algorithm application platform interface provides moving target selection through quick clicking or frame selection, 1 or more target objects can be selected, and after the target objects are selected, each frame of the real-time video frames the selected target and tracks in real time;

and (3) starting an algorithm: when the target is selected in a frame mode, the algorithm dynamically calls and runs a 'moving target' irregular moving direction 'moving direction prediction model', the direction probability of the moving target is estimated in real time, and the estimated moving direction probability levels are reported and output regularly (for example, every 3 seconds at intervals);

Reinforcement learning: when the frame selection target reaches the forbidden area detection or trip line detection triggering condition, the estimated probability value of the 'moving target' irregular moving direction 'moving direction prediction model' in the direction is enhanced, and the probability of the target moving towards the direction is obviously increased.

And (3) outputting: when the motion direction of the frame selection target is estimated, if a certain motion estimation probability is obviously increased to exceed a threshold value, such as 55% probability of the threshold value, a high-probability standard reaching alarm of the estimated direction is sent out, and a front person is notified to perform actions from the background.

Fig. 13 is a frame diagram of a motion direction prediction application module provided by the application, which comprises a rectangular search frame of each target in a picture, a rapid click rectangular frame, a target object, target tracking, and a motion direction probability value visualization list for outputting the target object based on a motion direction prediction model.

FIG. 14 is a block diagram of a motion direction prediction defense arrangement module provided by the application, wherein the algorithm configures tripwires and forbidden areas of various complex intersections while the tracking algorithm and the target motion estimation algorithm are defended, and the configured tripwire areas provide reinforcement increment learning information for the target motion direction estimation algorithm. The motion direction prediction defense setting module comprises a trip line drawing module for drawing the trip line of each complex intersection, a trip line area (forbidden area) drawing module for drawing the trip line of each complex intersection, a training module for providing reinforcement increment learning information and a target motion direction estimation algorithm and a probability estimation module. The reinforcement increment learning information comprises detection and estimation algorithm business practices, and a direction probability value increase with more successful detection times, and a forward guiding moving target moving direction prediction model.

FIG. 15 is a schematic diagram of a trip wire of an algorithm configuration provided by the present application, and FIG. 16 is a schematic diagram of a trip wire region of an algorithm configuration provided by the present application. After the detection and estimation algorithm service is implemented for a plurality of times, the probability value of the target motion direction with more tripping lines or invasion areas is higher and higher, so that the forward guiding motion target direction estimation algorithm continuously improves the success rate of direction prediction.

Fig. 17 is a schematic diagram of an estimated preliminary result of the algorithm for possible motion directions of each detected target provided by the present application, when the 5 moving targets are clicked and framed in sequence, the initial direction of possible motion of each target is shown as a circle in the figure, and in the practical application process, each circle will display the predicted probability value of the direction.

Fig. 18 is a schematic diagram of an estimation result of a possible motion direction algorithm of each detected target provided by the present application, after the target continuously moves for more than 1 second, the probability prediction model of "irregular motion direction" of the moving target "can give a plurality of higher direction probability values, and some probability values will exceed a threshold (for example, 55% probability), so as to trigger early warning and prompt personnel in front to take action. At this time, the algorithm is not configured with a tripwire area, and the possible motion directions of each moving object are exemplified, and the light boxes in the figure show visual descriptions of the possible motion directions of the moving objects. I.e. possible pose information after the movement.

FIG. 19 is a schematic diagram showing the result of performing moving object estimation in the tripwire configuration area provided by the application, when a moving object passes through a forbidden area or a tripwire, the detection probability of the algorithm is strengthened in an auxiliary way, and at the moment, the object direction estimation algorithm gives more definite moving object direction estimation and triggers early warning to prompt a person in front to take action.

Fig. 20 is a schematic structural diagram of a motion direction prediction apparatus according to the present application, where the apparatus includes:

a determining module 21, configured to determine a detection frame sub-image of a target object to be tracked in the target image;

the input module 22 is configured to input the sub-image into a pre-trained motion direction prediction model for processing, so as to obtain a motion direction probability value of the target object; the motion direction prediction model is used for determining gesture key points of the target object in the sub-image and determining the motion direction probability value based on the sub-image and the gesture key points;

and the prediction module 23 is used for predicting the target motion direction of the target object according to the motion direction probability value.

The determining module 21 is specifically configured to acquire a plurality of images including the target image acquired in a preset period, and determine a detection frame sub-image of the target object in the plurality of images;

The input module 22 is specifically configured to input a plurality of sub-images into the motion direction prediction model for processing, so as to obtain a motion direction probability value of the target object; the motion direction prediction model is used for determining gesture key points of the target object corresponding to the sub-images respectively, and determining the motion direction probability value based on the sub-images and the gesture key points corresponding to the sub-images respectively.

The apparatus further comprises:

the training module 24 is configured to obtain a sample continuous frame video in a training set, input a sample image in the sample continuous frame video, a gesture key point of a sample object in the sample image, and a corresponding relationship between the labeled sample image and the gesture key point of the sample object into an initial motion direction prediction model, and train the initial motion direction prediction model; inputting a sample image in the sample continuous frame video, gesture key points of a sample object in the sample image and semantic description information corresponding to the sample continuous frame video into a trained initial motion direction prediction model, and training the trained initial motion direction prediction model to obtain the motion direction prediction model.

The training module 24 is further configured to obtain a first continuous frame video of samples stored in the database, and semantic description information corresponding to the first continuous frame video of samples; transforming the human body gestures in the first sample images in the first sample continuous frame video by using a human body gesture transformation algorithm to obtain a second sample continuous frame video, and generating semantic description information corresponding to the second sample continuous frame video; and taking the first sample continuous frame video and the second sample continuous frame video as sample continuous frame videos in a training set.

The input module 22 is specifically configured to determine a movement direction probability value of the target object in multiple directions, and enhance the movement direction probability value of the trip line direction according to the preset probability value in the trip line direction towards the preset trip line area.

The input module 22 is specifically configured to determine first location information of the target object in the target image, and if a trigger probability enhancing condition is determined according to the first location information and second location information of the preset trip line region, enhance a movement direction probability value of the trip line direction according to a preset probability value in a trip line direction towards the preset trip line region.

The prediction module 23 is specifically configured to predict a target motion direction of the target object according to at least one motion direction probability value greater than a preset direction probability threshold.

The apparatus further comprises:

and the alarm module 25 is configured to output high-probability early warning prompt information for representing the trip line direction if the target movement direction is the trip line direction, and the movement direction probability value corresponding to the target movement direction is greater than a preset alarm threshold.

The present application also provides an electronic device, as shown in fig. 21, including: the processor 31, the communication interface 32, the memory 33 and the communication bus 34, wherein the processor 31, the communication interface 32 and the memory 33 complete communication with each other through the communication bus 34;

the memory 33 has stored therein a computer program which, when executed by the processor 31, causes the processor 31 to perform any of the above method steps.

The communication bus mentioned above for the electronic devices may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.

The communication interface 32 is used for communication between the above-described electronic device and other devices.

The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The processor may be a general-purpose processor, including a central processing unit, a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processing, DSP), application specific integrated circuits, field programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.

The application also provides a computer-readable storage medium having stored thereon a computer program executable by an electronic device, which when run on the electronic device causes the electronic device to perform any of the above method steps.

While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A method of motion direction prediction, the method comprising:

2. The method of claim 1, wherein the method comprises:

3. The method of claim 2, wherein the training process of the motion direction prediction model comprises:

4. The method of claim 3, wherein the acquiring sample sequential frame video in a training set comprises:

acquiring a first sample continuous frame video stored in a database and semantic description information corresponding to the first sample continuous frame video;

transforming the human body gestures in the first sample images in the first sample continuous frame video by using a human body gesture transformation algorithm to obtain a second sample continuous frame video, and generating semantic description information corresponding to the second sample continuous frame video;

and taking the first sample continuous frame video and the second sample continuous frame video as sample continuous frame videos in a training set.

5. The method of claim 1, wherein determining a motion direction probability value for the target object comprises:

6. The method of claim 5, wherein the enhancing the trip line direction movement direction probability value in the trip line direction toward the predetermined trip line region according to the predetermined probability value comprises:

7. The method of claim 1, wherein predicting the target direction of motion of the target object based on the direction of motion probability value comprises:

comparing the motion direction probability values of the multiple motion directions, and selecting a target motion direction probability value according to a comparison result; and predicting the target motion direction of the target object according to the target motion direction probability value.

8. The method of claim 5 or 6, wherein the method further comprises:

9. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;

A memory for storing a computer program;

a processor for implementing the method steps of any one of claims 1-8 when executing a program stored on a memory.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the method steps of any of claims 1-8.