CN113313007B

CN113313007B - Pedestrian static state identification method based on video, electronic equipment and storage medium

Info

Publication number: CN113313007B
Application number: CN202110575789.3A
Authority: CN
Inventors: 董霖; 叶新江; 姚建明; 方毅; 杨玉春; 曹克丹; 魏兴明
Original assignee: Merit Interactive Co Ltd
Current assignee: Merit Interactive Co Ltd
Priority date: 2021-05-26
Filing date: 2021-05-26
Publication date: 2022-10-14
Anticipated expiration: 2041-05-26
Also published as: CN113313007A

Abstract

The invention discloses a video-based static state identification method, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a group of frame images in a first target video, wherein the first target video is a video containing pedestrians; determining a target object and a target area vector corresponding to the target object according to each frame image of the group of frame images, generating a target height vector and a target movement length vector according to the target area vector, preprocessing the target height vector and the target movement length vector to obtain a target characteristic vector, inputting the target characteristic vector into a state recognition model, and determining that the state of the target object is a static state; the pedestrian detection method and the pedestrian detection system can judge whether a person is static (including slight shaking) only through the detection area information of the pedestrian, have good robustness, reduce the data volume, simplify the analysis process, improve the accuracy of the analysis result and facilitate the adoption of appropriate management and control measures for the target pedestrian.

Description

Pedestrian static state identification method based on video, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a video-based static state identification method, electronic equipment and a readable storage medium.

Background

Along with the development of internet, more and more places are equipped with camera device and are used for gathering the video, for example, different roads in the city, district gate, market gate, school gate etc. place all need gather the video, and the state of target object in the video is utilized to the big many cases, monitors or other management and control measures.

At present, for the state of a target object in a video, the target object is judged to be in a moving state or a static state by analyzing the behavior action of the target object, the method needs more data to be analyzed, and meanwhile, the accuracy of an analysis result is influenced by other factors, for example, the behavior action of other objects is overlapped with the behavior action of the target object, so that the target object cannot be accurately judged to be in the moving state or the static state, and appropriate control measures are influenced to be taken on the target object.

Disclosure of Invention

In order to solve the problems of the prior art, a group of frame images in a first target video are acquired; determining size characteristics corresponding to a target domain area according to each frame image of the group of frame images, acquiring two variable vectors based on the size characteristics, inputting the two variable vectors into an identification state model, and determining the state of a target object; the state of the target object can be analyzed only by the size characteristics, the data volume is reduced, the analysis process is simplified, the accuracy of the analysis result is improved, and appropriate control measures can be taken for the target object conveniently; the embodiment of the invention provides a video-based static state identification method, electronic equipment and a readable storage medium. The technical scheme is as follows:

in one aspect, a method for video-based still state recognition includes the steps of:

acquiring a group of frame images in a first target video, wherein the first target video refers to a video containing pedestrians;

determining a target object and a target area vector Z corresponding to the target object according to each frame image of the group of frame images, wherein the target area vector Z = (Z) ₁ 、Z ₂ ，Z ₃ ，……，Z _i )，i＝1…… n, said Z _i Refers to diagonal point coordinates of a target region in the ith frame, wherein the diagonal point coordinates comprise a first coordinate point (a) _i ，b _i ) And a second coordinate point (c) _i ，d _i ) The first coordinate point refers to any vertex angle horizontal and vertical coordinate of the target area, and the second coordinate point refers to a vertex angle horizontal and vertical coordinate which is in a diagonal relation with the first coordinate point in the target area;

generating a target height vector H = (H) according to the Z ₁ ，H ₂ ，H ₃ ，……，H _i ) And target move length vector L = (L) ₁ ，L ₂ ，L ₃ ，……，L _i ) Wherein the target height vector H _i Refers to the height of the target region in the ith frame, and the target height vector H _i The following conditions are met:

H _i ＝b _i -d _i

the Li refers to the moving length of the target area in the ith frame, and the Li meets the following conditions:

L _i ＝[(a _i -a ₁ )+(c _i -c ₁ )]/2

based on the target height vector (H) ₁ ，H ₂ ，H ₃ ，……，H _i ) And the target movement length vector (L) ₁ ，L ₂ ，L ₃ ，……，L _i ) Preprocessing is carried out to obtain a target characteristic vector V = (V) ₁ ，V ₂ ，V ₃ ，……，V _i ) Wherein the V comprises a first target vector X = (X) ₁ ，X ₂ ，X ₃ ，……，X _i ) And/or the second target vector Y = (Y) ₁ ，Y ₂ ，Y ₃ ，……，Y _i )；

And inputting the first target vector X and/or the second target vector Y into a state recognition model, and determining that the state of the target object is a static state.

In another aspect, an electronic device includes a processor and a memory, where at least one instruction or at least one program is stored in the memory, and the at least one instruction or the at least one program is loaded and executed by the processor to implement the video-based still state identification method according to any one of the above descriptions.

In another aspect, a computer-readable storage medium stores at least one instruction or at least one program, which is loaded and executed by a processor to implement the video-based still state identification method according to any one of the above aspects.

The video-based static state identification method, the electronic equipment and the readable storage medium have the following technical effects:

the method includes the steps that a group of frame images in a first target video are obtained; according to each frame image of the group of frame images, determining size characteristics corresponding to a target domain area, and acquiring two variable vectors based on the size characteristics; therefore, the technical scheme of the invention determines the state of the target object by inputting the two variable vectors into the recognition state model, can analyze the state of the target object only by size characteristics, reduces the data volume, improves the accuracy of an analysis result, and is convenient for taking appropriate control measures on the target object; meanwhile, in the analysis process, two variable vectors are obtained based on the size characteristics, the two variables do not need to be processed, the analysis process can be simplified, meanwhile, the two variable vectors are not influenced by other factors, the accuracy of an analysis result is also improved, and appropriate management and control measures are convenient to take for a target object.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flowchart of a video-based still state identification method according to one to three embodiments of the present invention;

fig. 2 is a schematic flowchart of a dynamic identification method based on a silhouette map according to four to five embodiments of the present invention;

FIG. 3 is a schematic diagram of a target area according to one embodiment of the present invention;

fig. 4 is a schematic diagram of a silhouette image according to the fourth to fifth embodiments of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in other sequences than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example one

As shown in fig. 1, the first embodiment provides a video-based still state identification method, which includes the following steps:

s101, acquiring a group of frame images in a first target video, wherein the first target video is a video containing pedestrians;

the first target video is any one video segment in video data, the video data are collected through a fixed collecting device, and the first target video belongs to a connection real-time video.

S103, according to each frame image of the group of frame images, determining a target object and a target area vector Z corresponding to the target object, wherein the target area vector Z = (Z) ₁ 、Z ₂ ，Z ₃ ，……，Z _i ) I =1 … … n, n is the number of image frames, Z _i Refers to diagonal point coordinates of a target area in the ith frame, wherein the diagonal point coordinates comprise a first coordinate point (a) _i ，b _i ) And a second coordinate point (c) _i ，d _i ) The first coordinate point refers to any vertex angle horizontal and vertical coordinate of the target area, and the second coordinate point refers to a vertex angle horizontal and vertical coordinate which is in a diagonal relation with the first coordinate point in the target area;

in particular, the target object refers to an object that exhibits a complete contour in each frame image of the set of frame images.

Specifically, the target area refers to a detection area of the target object in each frame image of the group of frame images, and the target area is square.

Preferably, the first coordinate point is a vertex coordinate of an upper right corner of the target area, and the second coordinate point is a vertex coordinate of a lower left corner of the target area.

S105, generating a target height vector H = (H) according to the Z ₁ ，H ₂ ，H ₃ ，……，H _i ) And target move length vector L = (L) ₁ ，L ₂ ，L ₃ ，……，L _i ) Wherein, the H _i Refers to the height of the target region in the ith frame, said H _i The following conditions are met:

H _i ＝b _i -d _i

said L _i Refers to the moving length of the target region in the ith frame, L _i The following conditions are met:

specifically, the H _i Refers to the target height corresponding to the target area in the ith frame image, for example, when the target object is from near to far in the first target video, the target area vector (H) ₁ ，H ₂ ，H ₃ ，……，H _i ) Middle H ₁ To H _i Gradually decreases.

Specifically, the L _i The image processing method includes the steps that a target distance corresponding to a target area of an ith frame of image is obtained, and the target distance refers to the distance between the center point of a current target area and the center point of a first target area or the distance between a certain vertex of the current target area and the vertex of the corresponding position of the first target area.

S107, based on the target height vector (H) ₁ ，H ₂ ，H ₃ ，……，H _i ) And the target movement length vector (L) ₁ ，L ₂ ，L ₃ ，……，L _i ) Preprocessing is carried out to obtain a target characteristic vector V = (V) ₁ ，V ₂ ，V ₃ ，……，V _i ) Wherein the V comprises a first target vector X = (X) ₁ ，X ₂ ，X ₃ ，……，X _i ) And/or the second target vector Y = (Y) ₁ ，Y ₂ ，Y ₃ ，……，Y _i ))；

In a specific embodiment, based on the target height vector (H) ₁ ，H ₂ ，H ₃ ，……，H _i ) And the target movement length vector (L) ₁ ，L ₂ ，L ₃ ，……，L _i ) Generating a first target vector (X) ₁ ，X ₂ ，X ₃ ，……，X _i )；

According to the target height vector (H) ₁ ，H ₂ ，H ₃ ，……，H _i ) Generating said second target vector (Y) ₁ ，Y ₂ ，Y ₃ ，……，Y _i )。

Further, said X _i Refers to the ith target areaWherein said X is a first variable value of _i The following conditions are met:

X _i ＝E(L _i ) Wherein, the E () refers to a first variable function; preferably, the first and second liquid crystal films are,

wherein said λ is said target height vector (H) ₁ ，H ₂ ，H ₃ ，……，H _i ) The medium minimum value.

Further, said Y is _i Is the second variable value of the ith target region, wherein Y is _i The following conditions are met:

Y _i ＝F(H _i ) Wherein, the F () refers to a second variable function; preferably, the first and second liquid crystal films are,

s109, inputting the first target vector X and/or the second target vector Y into a state recognition model, and determining that the state of the target object is a static state.

In a specific embodiment, the first target vector (X) ₁ ，X ₂ ，X ₃ ，……，X _i ) And the second target vector Y ₁ ，Y ₂ ，Y ₃ ，……，Y _i ) And inputting the target object into a recognition state model, and determining the state of the target object.

Specifically, the method further includes the following steps of determining that the state of the target object is a static state:

the first target vector (X) ₁ ，X ₂ ，X ₃ ，……，X _i ) And the second target vector Y ₁ ，Y ₂ ，Y ₃ ，……，Y _i ) Inputting the state value K into an identification state model to obtain a state value K corresponding to the target object;

comparing the state value K with a preset state threshold value;

if the state value K is smaller than the state threshold value, determining that the state of the target object is a static state;

and if the state value K is not less than the state threshold value, determining that the state of the target object is a motion state.

Specifically, the K satisfies the following condition:

K＝L(W ₁ ，W ₂ ，W ₃ ，…，W _i ) Wherein, the L () is a state vector (W) corresponding to a state function and a target object in the recognition state model ₁ ，W ₂ ，W ₃ ，……，W _i ) Is based on a first target vector (X) ₁ ，X ₂ ，X ₃ ，……，X _i ) And said second target vector (Y) ₁ ，Y ₂ ，Y ₃ ，……，Y _i ) The determination is made in the recognition state model.

Specifically, the W _i The following conditions are met:

W _i ＝G(X _i ，Y _i ) Wherein the G () is a nonlinear activation function in the recognition state model.

Specifically, the state threshold is 0.5, for example, when the state value corresponding to the target object is 0.4, the state value is less than 0.5, and it is determined that the state of the target object is a static state; otherwise, determining the state of the target object as a motion state.

Specifically, the video data further comprises a training set, wherein the training set is used for training the recognition state model to adjust parameters of the recognition state model; wherein the state value corresponding to the training set

The same calculation method is adopted as the prediction set state value K.

In a specific embodiment, the method further comprises: in a first preset database, a first target accuracy is calculated according to the state of a target object, wherein the first target accuracy can reach 93%, and the first target accuracy refers to the accuracy for determining that the state of the target object is a static state.

Example two

The second embodiment provides a video-based still state identification method, which includes the following steps:

the first target video is any video segment in video data, the video data are collected through a fixed collecting device, and the first target video belongs to a connection real-time video.

S103, according to each frame image of the group of frame images, determining a target object and a target area vector Z corresponding to the target object, wherein the target area vector Z = (Z) ₁ 、Z ₂ ，Z ₃ ，……，Z _i ) I =1 … … n, n is the number of image frames, Z is _i Refers to diagonal point coordinates of a target area in the ith frame, wherein the diagonal point coordinates comprise a first coordinate point (a) _i ，b _i ) And a second coordinate point (c) _i ，d _i ) The first coordinate point refers to any vertex angle horizontal and vertical coordinate of the target area, and the second coordinate point refers to a vertex angle horizontal and vertical coordinate which is in a diagonal relation with the first coordinate point in the target area;

S105, generating a target height vector H = (H) according to the Z ₁ ，H ₂ ，H ₃ ，……，H _i ) And target move length vector L = (L) ₁ ，L ₂ ，L ₃ ，……，L _i ) Wherein, the H _i Refers to the target area in the ith frameHeight of (a), said H _i The following conditions are met:

H _i ＝b _i -d _i

specifically, the H _i Refers to the target height corresponding to the target area in the ith frame image, for example, when the target object is from near to far in the first target video, the target area vector (H) ₁ ，H ₂ ，H ₃ ，……，H _i ) In H) ₁ To H _i Gradually decreases.

Further, said X _i A first variable value referring to an ith target region, wherein X is _i The following conditions are met:

wherein said λ is said target height vector (H) ₁ ，H ₂ ，H ₃ ，……，H _i ) The median minimum value.

S109, inputting the first target vector X and/or the second target vector Y into a state recognition model, and determining that the state of the target object is a static state;

in a specific embodiment, the first target vector (X) is summed ₁ ，X ₂ ，X ₃ ，……，X _i ) And inputting the data into a recognition state model, and determining that the state of the target object is a static state.

the first target vector (X) ₁ ，X ₂ ，X ₃ ，……，X _i ) Inputting the state value K into an identification state model to obtain a state value K corresponding to the target object;

comparing the state value K with a preset state threshold value;

and if the state value K is not smaller than the state threshold value, determining that the state of the target object is a motion state.

Specifically, the K satisfies the following condition:

K＝L(X ₁ ，X ₂ ，X ₃ ，…，X _i ) Wherein the L () is a state function in the recognition state model.

The same calculation method is adopted as the prediction set state value K.

In a specific embodiment, the method further comprises: in the first preset database, a first target accuracy rate is calculated according to the state of the target object, wherein the first target accuracy rate can reach 86%, and the first target accuracy rate refers to the accuracy rate of determining that the state of the target object is a static state.

EXAMPLE III

The third embodiment provides a video-based still state identification method, which includes the following steps:

S103, according to each frame image of the group of frame images, determining a target object and a target area vector Z corresponding to the target object, wherein the target area vector Z = (Z) ₁ 、Z ₂ ，Z ₃ ，……，Z _i ) I =1 … … n, n is the number of image frames, Z is _i Refers to diagonal point coordinates of a target area in the ith frame, wherein the diagonal point coordinates comprise a first coordinate point (a) _i ，b _i ) And a second coordinate point (c) _i ，d _i ) Of 1 atThe first coordinate point is any vertex angle horizontal and vertical coordinate of the target area, and the second coordinate point is a vertex angle horizontal and vertical coordinate which is in a diagonal relation with the first coordinate point in the target area;

H _i ＝b _i -d _i

specifically, the H _i Refers to the target height corresponding to the target area in the ith frame image, for example, when the target object is from near to far in the first target video, the target height vector ((H) ₁ ，H ₂ ，H ₃ ，……，H _i ) Middle H ₁ To H _i Gradually decreases.

S107, based on the target height vector (H) ₁ ，H ₂ ，H ₃ ，……，H _i ) And the target movement length vector (L) ₁ ，L ₂ ，L ₃ ，……，L _i ) The pre-treatment is carried out, and the pretreatment,obtaining a target characteristic vector V = (V) ₁ ，V ₂ ，V ₃ ，……，V _i ) Wherein the V comprises a first target vector X = (X) ₁ ，X ₂ ，X ₃ ，……，X _i ) And/or the second target vector Y = (Y) ₁ ，Y ₂ ，Y ₃ ，……，Y _i )；

In a specific embodiment, based on the target height vector (H) ₁ ，H ₂ ，H ₃ ，……，H _i ) And the target move length vector (L) ₁ ，L ₂ ，L ₃ ，……，L _i ) Generating a second target vector (Y) ₁ ，Y ₂ ，Y ₃ ，……，Y _i )；

Further, said Y is _i A first variable value referring to the ith target region, wherein Y is _i The following conditions are met:

in a specific embodiment, the second target vector (Y) is used ₁ ，Y ₂ ，Y ₃ ，……，Y _i ) And inputting the target object into a recognition state model, and determining the state of the target object.

Specifically, the method further comprises the following steps of determining the state of the target object:

the second target vector (Y) ₁ ，Y ₂ ，Y ₃ ，……，Y _i ) Inputting the state value K into an identification state model to obtain a state value K corresponding to the target object;

comparing the state value K with a preset state threshold value;

Specifically, the K satisfies the following condition:

K＝L(Y ₁ ，Y ₂ ，Y ₃ ，…，Y _i ) Wherein the L () is a state function in the recognition state model.

The same calculation method is adopted as the prediction set state value K.

In a specific embodiment, the method further comprises: in the first preset database, according to the state of the target object, a first target accuracy rate is calculated, and the first target accuracy rate can reach 81%. The first target accuracy rate refers to the accuracy rate of determining that the target object state is a static state.

Specifically, in the first to third embodiments, the preprocessing may be combination of feature processing and application calculation, and a person skilled in the art adopts any existing technology to implement a specific implementation process of the preprocessing, which is not described herein again.

In summary, the first table is a comparison table of the first target accuracy rates corresponding to the three target objects in the first and second embodiments, and it can be known that the state corresponding to the target object determined by jointly inputting the first vector and the second vector into the recognition state model is more accurate, and whether a person is still (including slight shake) can be determined only by the detection area information of the pedestrian, and the comparison table has very good robustness, reduces the data volume, simplifies the analysis process, improves the accuracy of the analysis result, and facilitates taking appropriate control measures for the target object; meanwhile, in the analysis process, two variable vectors are obtained based on the size characteristics, the two variables do not need to be processed, the analysis process can be simplified, meanwhile, the two variable vectors are not influenced by other factors, the accuracy of an analysis result is also improved, and appropriate management and control measures are convenient to take for a target object.

Watch 1

Examples	Example one	Example two	EXAMPLE III
				First target accuracy	93％	86％	81％

Example four

With reference to fig. 2 and fig. 4, the present embodiment provides a video-based motion state identification method, where the method includes:

s201, acquiring a group of frame images in a state video segment, wherein the state video is a video including a pedestrian motion state.

In a specific embodiment, the method further comprises determining the motion state video by:

acquiring a video to be processed;

carrying out scene recognition processing on the video to be processed to obtain an appointed video, wherein the appointed video is a video with a target object in any group of frame images;

and determining a motion state video according to the designated video.

Specifically, the designated video refers to a video in which a target object exists in any group of frame images, that is, the video to be processed is subjected to scene recognition processing, a video segment in which the target object exists in a video scene is recognized, and a video after the video segment in which no target object exists is deleted.

Specifically, determining a motion state video according to the designated video, further comprising:

acquiring a group of frame images in the designated video;

determining the number of target objects and a second target video corresponding to any one of the target objects according to the group of frame images;

and determining the state of the target object according to the second target video.

And determining a motion state video according to the state of the target object.

Preferentially, only a single target object and a region corresponding to the target object exist in the second target video; and a plurality of target objects and a region corresponding to any one target object may exist in the first target video.

In a specific embodiment, the state of the target object is determined according to the second target video, and any one of the first to third embodiments of the video-based state identification method may be adopted; preferably, the state of the target object is determined according to the second target video, and with the video-based state identification method provided in the first embodiment, that is, any video segment in the video segment vector is the second target video.

In a specific embodiment, determining the motion state video further comprises determining the motion state video according to the state of the target object;

judging whether the state of the target object is a motion state or not;

when the state of the target object is a static state, deleting a second target video corresponding to the target object;

and when the state of the target object is a motion state, reserving a second target video corresponding to the state of the target object.

S203, preprocessing each frame image of the group of frame images to obtain an appointed image vector (A) ₁ ，A ₂ ，A ₃ ，……，A _j ) J =1 … … m, m is the number of image frames, where A _j A designated image of a target object corresponding to the jth frame;

specifically, the designated image is an image after the fracture, and the designated image is a grayscale image.

Specifically, the preprocessing in this embodiment may be a combination of detection, tracking and contour segmentation, and a person skilled in the art knows that any prior art scheme is adopted to implement the preprocessing implementation process, which is not described herein again.

S205, according to the appointed image vector (A) ₁ ，A ₂ ，A ₃ ，……，A _j ) Binarizing to obtain a vector (B) of the silhouette image ₁ ，B ₂ ，B ₃ ，……，B _j ),B _j Referring to a silhouette image of a target object corresponding to a jth frame;

in particular, the silhouette image vector (B) ₁ ，B ₂ ，B ₃ ，……，B _j ) The size of any one of the silhouette image is 64 × 64, that is, after each frame image of the group of frame images is preprocessed, each frame image of the group of frame images is binarized into an image with the size of 64 × 64, which is beneficial to processing a motion state recognition model, and improves the calculation efficiency.

S207, the vector (B) of the silhouette image ₁ ，B ₂ ，B ₃ ，……，B _j ) Inputting the motion state into a motion state recognition model, and determining the corresponding actual motion state of the target object, wherein the motion state is input into the motion state recognition modelThe actual motion state corresponding to the target object comprises one or more combinations of a walking state, a running state, a riding state and other motion states.

In a specific embodiment, the method further includes determining an actual state corresponding to the target object by:

the vector (B) of the silhouette image ₁ ，B ₂ ，B ₃ ，……，B _j ) Inputting the motion state probability vector into a motion state recognition model to obtain a motion state probability vector (S) ₁ ，S ₂ ，S ₃ ，S ₄ ) Wherein said S ₁ ，S ₂ ，S ₃ ，S ₄ Probability values of motion states respectively corresponding to the target object, e.g. S ₁ A probability value corresponding to the walking state of the target object, S ₂ A probability value corresponding to the running state of the target object, S ₃ Is the probability value corresponding to the riding state of the target object, S ₄ The probability values corresponding to other motion states of the target object are S1+ S2+ S3+ S4=1;

according to the motion state probability vector (S) ₁ ，S ₂ ，S ₃ ，S ₄ ) In a specific embodiment, the method further includes, in order to screen the motion state corresponding to the maximum probability value as the actual state corresponding to the target object: calculating a second target accuracy rate which can reach 93% according to the actual state corresponding to the target object in a second preset database; the second target accuracy rate refers to the accuracy rate of determining that the target object state is any one of the motion states.

EXAMPLE five

The embodiment provides a motion state identification method based on a video, which comprises the following steps:

s201, a group of frame images in a state video section are obtained, wherein the state video refers to a video including a pedestrian motion state.

Specifically, the motion state video refers to a video in which a motion behavior exists in a target object.

acquiring a video to be processed;

carrying out scene recognition processing on the video to be processed to obtain an appointed video;

and determining a motion state video according to the designated video.

acquiring a group of frame images in the designated video;

In a specific embodiment, the state of the target object is determined according to the second target video, and any one of the first to third embodiments of the video-based state identification method may be adopted; preferably, the state of the target object is determined according to the second target video, and a video-based state identification method provided in the first embodiment is adopted, that is, any video segment in the video segment vector is the second target video.

judging whether the state of the target object is a motion state or not;

S203, preprocessing each frame image of the group of frame images to obtain an appointed image vector (A) ₁ ，A ₂ ，A ₃ ，……，A _j ) J =1 … … m, m being the number of picture frames, where A _j A designated image of a target object corresponding to the jth frame;

specifically, the designated image is an image after the fracture, and the designated image is a grayscale image. Specifically, the preprocessing in this embodiment may be a combination of detection, tracking and contour segmentation, and a person skilled in the art knows that any prior art scheme is adopted to implement the preprocessing implementation process, which is not described herein again.

S205, according to the appointed image vector (A) ₁ ，A ₂ ，A ₃ ，……，A _j ) Binarizing to obtain a vector (B) of the silhouette image ₁ ，B ₂ ，B ₃ ，……，B _j ),B _j Referring to a silhouette image of a target object corresponding to the jth frame;

in particular, the silhouette image vector (B) ₁ ，B ₂ ，B ₃ ，……，B _j ) Any one of the target images is 64 × 64 in size, which is beneficial to processing by the motion state recognition model and improves the calculation efficiency.

S207, the vector (B) of the silhouette image ₁ ，B ₂ ，B ₃ ，……，B _j ) Inputting the motion state identification model, and determining an actual motion state corresponding to the target object, wherein the actual motion state corresponding to the target object comprises one or more combinations of a walking state, a running state, a riding state and other motion states.

the silhouette image vector (B) ₁ ，B ₂ ，B ₃ ，……，B _j ) Inputting the motion state probability vector into a motion state recognition model to obtain a motion state probability vector (S) ₁ ，S ₂ ，S ₃ ，S ₄ ) Wherein said S ₁ ，S ₂ ，S ₃ ，S ₄ Probability values of motion states respectively corresponding to the target object, e.g. S ₁ A probability value corresponding to the walking state of the target object, S ₂ A probability value corresponding to the running state of the target object, S ₃ Is the probability value corresponding to the riding state of the target object, S ₄ Probability values, S, corresponding to other motion states of the target object ₁ +S ₂ +S ₃ +S ₄ ＝1；

According to the motion state probability vector (S) ₁ ，S ₂ ，S ₃ ，S ₄ ) In a specific embodiment, the method further includes, in order to screen the motion state corresponding to the maximum probability value as the actual state corresponding to the target object: in a second preset database, calculating a second target accuracy rate according to the actual state corresponding to the target object, wherein the second target accuracy rate can reach 86%; the second target accuracy rate refers to the accuracy rate of determining that the target object state is any one of the motion states.

In summary, the second table is a comparison table of the second target accuracy corresponding to the target object in the fourth and fifth embodiments, and it can be known that a target image vector is obtained by performing occlusion processing on each frame image of the group of frame images, and a silhouette image corresponding to any frame in the target image vector; and inputting the target image vector into a motion state recognition model, determining the motion state of the target object pair, conveniently and accurately acquiring the motion characteristics of the target object, improving the accuracy of the analysis result of a user, and conveniently taking appropriate control measures for the target object.

Watch two

Examples	Example four	EXAMPLE five
			Rate of accuracy	93％	86％

Embodiments of the present invention also provide an electronic device, including a processor and a memory, where at least one instruction or at least one program is stored in the memory, and the at least one instruction or the at least one program is loaded and executed by the processor to implement the video-based still state identification method as described above.

The computer device of embodiments of the present invention exists in a variety of forms, including but not limited to:

(1) A mobile communication device: such devices are characterized by mobile communications capabilities and are primarily targeted at providing voice, data communications. Such terminals include: smart phones (e.g., tpphone), multimedia phones, functional phones, and low-end phones, etc.

(2) Ultra mobile personal computer device: the equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include: PDA, mtD, and UMPC devices, etc., such as tPad.

(3) A portable entertainment device: such devices can display and play multimedia content. This type of device comprises: audio, video players (e.g., tPod), handheld game players, electronic books, and smart toys and portable car navigation devices.

(4) A server: the device for providing the computing service comprises a processor, a hard disk, a memory, a system bus and the like, and the server is similar to a general computer architecture, but has higher requirements on processing capacity, stability, reliability, safety, expandability, manageability and the like because of the need of providing high-reliability service.

(5) And other electronic devices with data interaction functions.

Embodiments of the present invention also provide a computer-readable storage medium, which may be disposed in an electronic device to store at least one instruction or at least one program for implementing a virus detection method in the method embodiments, where the at least one instruction or the at least one program is loaded and executed by the processor to implement the video-based still state identification method provided in the method embodiments.

Alternatively, in this embodiment, the storage medium may be located in at least one network server of a plurality of network servers of a computer network. Optionally, in this embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and various media capable of storing program codes.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A method for identifying a still state based on a video, the method comprising the steps of:

acquiring a group of frame images in a first target video, wherein the first target video is a video containing pedestrians;

determining a target object and a target area vector Z corresponding to the target object according to each frame image of the group of frame images, wherein the target area vector Z = (Z) ₁ 、Z ₂ ，Z ₃ ，……，Z _i ) I =1 … … n, said Z _i Refers to diagonal point coordinates of a target area in the ith frame, wherein the diagonal point coordinates comprise a first coordinate point (a) _i ，b _i ) And a second coordinate point (c) _i ，d _i ) The first coordinate point refers to any vertical angle horizontal and vertical coordinate of the target area, and the second coordinate point refers to a vertical angle horizontal and vertical coordinate in the target area in a diagonal relation with the first coordinate point;

generating a target height vector H = (H) according to the Z ₁ ，H ₂ ，H ₃ ，……，H _i ) And target move length vector L = (L) ₁ ，L ₂ ，L ₃ ，……，L _i ) Wherein, the Hi refers to the height of the target area in the ith frame, and the Hi meets the following conditions:

H _i ＝b _i -d _i

based on the target height vector (H) ₁ ，H ₂ ，H ₃ ，……，H _i ) And the target movement length vector (L) ₁ ，L ₂ ，L ₃ ，……，L _i ) Preprocessing is carried out to obtain a target characteristic vector V = (V) ₁ ，V ₂ ，V ₃ ，……，V _i ) Wherein the V comprises a first target vector X = (X) ₁ ，X ₂ ，X ₃ ，……，X _i ) And/or the second target vector Y = (Y) ₁ ，Y ₂ ，Y ₃ ，……，Y _i )，

Wherein, X is _i Is the first variable value of the ith target region, wherein X _i The following conditions are met: x _i ＝E(L _i ) Wherein E refers to a first variable function,

wherein, the Y is _i Means thatSecond variable value of ith target area, wherein Y is _i The following conditions are met: y is _i ＝F(H _i ) Wherein, F refers to a second variable function;

inputting the first target vector X and/or the second target vector Y into a state recognition model, and determining that the state of the target object is a static state, wherein the method further comprises the following steps of determining that the state of the target object is a static state:

putting the first target vector X and/or the second target vector Y into a recognition state model to obtain a state value K corresponding to the target object;

comparing the state value K with a preset state threshold value;

2. The video-based still state identification method according to claim 1, wherein the first target video is any one of video segments in video data, the video data is collected by a fixed collection device, and the first target video belongs to a connected real-time video.

3. The video-based still state recognition method of claim 1, wherein the target object is an object that exhibits a complete contour in each image of the set of frame images.

4. The video-based still state recognition method according to any one of claims 1 or 3, wherein the target region refers to a detection region of the target object in each frame image of the group of frame images.

5. The video-based still state recognition method of claim 1, wherein K satisfies the following condition:

K＝L(W ₁ ，W ₂ ，W ₃ ，…，W _i ) Wherein, L is a state function, and the state vector (W) corresponding to the target object ₁ ，W ₂ ，W ₃ ，……，W _i ) Is determined in the recognition state model based on the X and the Y.

6. An electronic device comprising a processor and a memory, wherein the memory stores at least one instruction or at least one program, and the at least one instruction or the at least one program is loaded and executed by the processor to implement the video-based still state recognition method according to any one of claims 1 to 5.

7. A computer-readable storage medium having stored therein at least one instruction or at least one program, the at least one instruction or the at least one program being loaded and executed by a processor to implement the method for video-based still state recognition according to any one of claims 1-5.