CN110619286A

CN110619286A - Vehicle door opening and closing action identification method and system and storage medium

Info

Publication number: CN110619286A
Application number: CN201910809419.4A
Authority: CN
Inventors: 张晓春; 李熙莹; 陈振武; 邱铭凯; 张枭勇
Original assignee: Shenzhen Urban Transport Planning Center Co Ltd; National Sun Yat Sen University
Current assignee: Sun Yat Sen University; Shenzhen Urban Transport Planning Center Co Ltd; National Sun Yat Sen University
Priority date: 2019-08-29
Filing date: 2019-08-29
Publication date: 2019-12-27

Abstract

The invention discloses a method, a system and a storage medium for identifying the opening and closing actions of a vehicle door, wherein the method comprises the following steps: carrying out vehicle door state feature extraction and identification on each frame of vehicle picture to obtain vehicle door state features and vehicle door state identification results of each frame of vehicle picture; and when the change of the vehicle door state is determined and recognized according to the vehicle door state recognition result, performing vehicle door opening and closing action recognition on the vehicle door opening and closing state characteristic sequence of continuous frames, wherein the continuous frames comprise vehicle picture frames with the changed vehicle door state. The invention decomposes the vehicle door opening and closing action recognition into two parts of the vehicle door state recognition and the vehicle door opening and closing action recognition, can firstly recognize the vehicle door state for each frame of picture of the video in practical application, and only when the vehicle door state is determined to change, the vehicle door action recognition of continuous frames is carried out, and the continuous frame of picture is not required to be input for action recognition in each prediction, thereby better improving the efficiency and the practicability of the action recognition, and being widely applied to the field of image processing.

Description

Vehicle door opening and closing action identification method and system and storage medium

Technical Field

The invention relates to the field of image processing, in particular to a method and a system for identifying vehicle door opening and closing actions and a storage medium.

Background

In a daily traffic scene, in order to get on and off passengers nearby, illegal parking behaviors of some commercial vehicles occur in a region where parking is definitely prohibited on a road, and in a region with a relatively high traffic flow, the illegal getting on and off passenger behaviors may cause short-time congestion of the road. For the control of such behaviors, it is not practical to rely on traffic polices to patrol to find out due to uncertainty of the occurrence time and place of the behaviors. At present, various dead-angle-free monitoring cameras are arranged in each basic place of an urban road, and a road monitoring video is analyzed by using computer vision and an artificial intelligence technology, so that intelligent detection is carried out on illegal behavior of passengers getting on and off, and the method is an effective mode capable of greatly improving the monitoring efficiency.

The detection of illegal boarding and alighting behaviors can be decomposed into two parts, namely behavior analysis of pedestrians and action identification of opening and closing of a vehicle door. The pedestrian behavior analysis is mainly to judge whether to be far away from a vehicle or to be close to the vehicle from a distance according to the motion track of the pedestrian according to the detection and tracking of the pedestrian, but in a place with high pedestrian occurrence frequency, the pedestrian can not completely judge the occurrence of illegal boarding and alighting behaviors only by means of the behavior analysis of the pedestrian because the pedestrian can just pass through the vehicle in a video instead of forming the boarding and alighting behaviors together with the vehicle, and the analysis needs to be carried out by combining the door opening and closing behavior identification of the vehicle. The occurrence of the behavior of getting on or off the passenger can be judged only when the pedestrian is in the process of leaving or approaching the vehicle and is accompanied by the action of opening and closing the door of the vehicle.

The vehicle door opening and closing action recognition belongs to the category of action recognition, and the existing action recognition method can be divided into two types of recognition based on artificial design characteristics and recognition based on deep learning characteristics. Among the methods for performing motion recognition based on artificial design features, a typical method is to extract a trajectory of a Feature and extract a spatio-temporal descriptor between consecutive frames by a Kanade-Lucas-Tomasi (KLT) tracker, such as a Scale Inventory Feature Transform (SIFT) descriptor, a 3-dimensional history of organized Gradient (HOG3D) descriptor, a Speed Up RobustFeatures (SURF) descriptor, etc., as features, and send the features to a classifier for motion recognition. However, the method of performing motion recognition based on artificially designed features has a low recognition accuracy and is generally less adopted. The action recognition method which is more commonly applied in practice and has higher recognition accuracy is a recognition method based on deep learning characteristics, and the method comprises the following steps: 1) the method comprises the steps of connecting pictures of continuous frames in a time dimension by using a 3-dimensional Convolutional Neural Network (3 DCNN), and learning space-time characteristics in the continuous frames by using the Neural Network so as to capture motion information between adjacent frames. 2) Respectively constructing a space network and a time network, capturing the characteristics with strong distinction for motion understanding by using the space network, and learning the effective motion characteristics by using the time network. However, when the existing recognition method based on deep learning features is applied to an actual scene, each prediction needs to input continuous multi-frame pictures for motion recognition, and the processing mode is very inefficient and impractical.

Disclosure of Invention

To solve the above technical problem, an embodiment of the present invention aims to: provided are a method, a system and a storage medium for efficiently and practically recognizing a door opening and closing action of a vehicle.

In a first aspect, the technical solution adopted in the embodiment of the present invention is:

a vehicle door opening and closing action recognition method comprises the following steps:

carrying out vehicle door state feature extraction and identification on each frame of vehicle picture to obtain vehicle door state features and vehicle door state identification results of each frame of vehicle picture;

and when the change of the vehicle door state is determined and recognized according to the vehicle door state recognition result, performing vehicle door opening and closing action recognition on a vehicle door opening and closing state characteristic sequence of continuous frames, wherein the continuous frames comprise vehicle picture frames with the changed vehicle door state.

Further, the method also comprises a step-by-step training step of two stages, wherein the step-by-step training step of two stages specifically comprises the following steps:

training a vehicle door state feature encoder, wherein the vehicle door state feature encoder is used for encoding and identifying the state features of a vehicle door in a single-frame vehicle picture;

and fixing parameters of the vehicle door state feature encoder, and training a motion recognition classifier, wherein the motion recognition classifier is used for recognizing the opening and closing motions of the vehicle door in continuous frames.

Further, the step of extracting and identifying the vehicle door state features of each frame of vehicle picture to obtain the vehicle door state features and the vehicle door state identification result of each frame of vehicle picture specifically includes:

acquiring each frame of vehicle picture from an input video;

and respectively inputting each frame of vehicle picture into the trained vehicle door state feature encoder to extract and identify the vehicle door state features, so as to obtain the vehicle door state features and the vehicle door state identification results of each frame of vehicle picture.

Further, the vehicle door state includes a vehicle door opening state and a vehicle door closing state, and the step of training the vehicle door state feature encoder specifically includes:

obtaining a given single-frame vehicle picture as a training sample, and obtaining the vehicle door state of the training sample as a label;

the method comprises the steps of training by adopting an improved VGG16 network according to a training sample and a label to obtain a vehicle door state characteristic encoder, wherein the improved VGG16 network comprises 14 layers, the 14 layers are a first convolution layer, a first maximum pooling layer, a second convolution layer, a second maximum pooling layer, a third convolution layer, a third maximum pooling layer, a fourth convolution layer, a fourth maximum pooling layer, a fifth convolution layer, a fifth maximum pooling layer, a full-connection layer and a sigmoid classification layer in sequence, and the full-connection layer and the first to fifth convolution layers all adopt linear rectification functions.

Further, the step of performing door opening and closing operation recognition on the door opening and closing state feature sequence of the continuous frames when it is determined that the door state is changed according to the door state recognition result specifically includes:

determining a vehicle picture frame with a changed vehicle door state according to the vehicle door state identification result, wherein the vehicle picture frame with the changed vehicle door state is a vehicle picture frame with a different vehicle door state from that of a previous vehicle picture;

obtaining continuous frames according to the vehicle picture frames with the changed vehicle door states;

inputting the continuous frames into a vehicle door state feature encoder to obtain a vehicle door state feature sequence of the continuous frames;

and inputting the vehicle door state characteristic sequence of the continuous frames into a trained motion recognition classifier to obtain a vehicle door opening and closing motion recognition result.

Further, the door opening and closing actions comprise opening keeping, opening changing to closing, closing changing to opening and closing keeping, and the action recognition classifier is obtained by adopting double-layer LSTM network training.

In a second aspect, the technical solution adopted in the embodiment of the present invention is:

a vehicle door opening and closing action recognition system comprises the following modules:

the vehicle door state feature extraction and identification module is used for extracting and identifying vehicle door state features of each frame of vehicle picture to obtain vehicle door state features and vehicle door state identification results of each frame of vehicle picture;

and the vehicle door opening and closing action recognition module is used for carrying out vehicle door opening and closing action recognition on a vehicle door opening and closing state characteristic sequence of continuous frames when the vehicle door state change is determined and recognized according to the vehicle door state recognition result, wherein the continuous frames comprise vehicle picture frames with the changed vehicle door state.

Further, the device also comprises a double-stage step-by-step training module, wherein the double-stage step-by-step training module specifically comprises:

the first-stage training unit is used for training a vehicle door state feature encoder, and the vehicle door state feature encoder is used for encoding and identifying the state features of a vehicle door in a single-frame vehicle picture;

and the second-stage training unit is used for fixing parameters of the vehicle door state feature encoder and training a motion recognition classifier, and the motion recognition classifier is used for recognizing the opening and closing motions of the vehicle door in continuous frames.

In a third aspect, the technical solution adopted in the embodiment of the present invention is:

a vehicle door opening and closing motion recognition system, comprising:

at least one processor;

at least one memory for storing at least one program;

when the at least one program is executed by the at least one processor, the at least one processor is enabled to implement the vehicle door opening and closing action identification method.

In a fourth aspect, the technical solution adopted in the embodiment of the present invention is:

a storage medium having stored therein processor-executable instructions for implementing a vehicle door opening and closing motion recognition method when executed by a processor.

One or more of the above-described embodiments of the present invention have the following advantages: according to the embodiment of the invention, the vehicle door state feature extraction and identification are firstly carried out on each frame of vehicle picture, then the vehicle door opening and closing action identification is carried out on the vehicle door opening and closing state feature sequence of the continuous frames when the vehicle door state change is confirmed and identified, the vehicle door opening and closing action identification is decomposed into two parts of the vehicle door state identification and the vehicle door opening and closing action identification, in practical application, the vehicle door state identification can be firstly carried out on each frame of video picture, the vehicle door action identification of the continuous frames is carried out only when the vehicle door state change is confirmed, and the continuous frame pictures are not required to be input for action identification in each prediction, so that the action identification efficiency and the practicability are better improved.

Drawings

Fig. 1 is a flowchart of a method for identifying a door opening and closing action of a vehicle according to an embodiment of the present invention;

FIG. 2 is an architecture diagram of a vehicle door opening and closing motion recognition scheme in accordance with an embodiment of the present invention;

FIG. 3 is a flowchart illustrating an implementation of a two-stage step-by-step training method according to an embodiment of the present invention;

FIG. 4 is a network architecture diagram of a door status feature encoder in accordance with an embodiment of the present invention;

FIG. 5 is a network architecture diagram of a motion recognition classifier in accordance with an embodiment of the present invention;

fig. 6 is a block diagram of a vehicle door opening and closing action recognition system according to an embodiment of the present invention;

fig. 7 is another structural block diagram of a vehicle door opening and closing action recognition system according to an embodiment of the present invention.

Detailed Description

The invention will be further explained and explained with reference to the drawings and the embodiments in the specification. The step numbers in the following embodiments are provided only for convenience of illustration, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art.

Referring to fig. 1, an embodiment of the present invention provides a method for identifying a vehicle door opening and closing action, including the following steps:

s101, carrying out vehicle door state feature extraction and identification on each frame of vehicle picture to obtain vehicle door state features and vehicle door state identification results of each frame of vehicle picture;

specifically, the vehicle picture can be decomposed from an input video (including an image of the vehicle, which can be captured by a monitoring camera).

The present embodiment recognizes the state (including two states of opening and closing) of the vehicle door in a single frame of vehicle picture by feature extraction and recognition and extracts the corresponding state feature. The feature extraction and identification can be completed by adopting a vehicle door state feature encoder which is trained by a deep learning algorithm (such as a convolutional neural network algorithm) in advance.

And S102, when the change of the vehicle door state is determined and recognized according to the vehicle door state recognition result, performing vehicle door opening and closing action recognition on a vehicle door opening and closing state characteristic sequence of continuous frames, wherein the continuous frames comprise vehicle picture frames with the changed vehicle door state.

Specifically, since the door opening and closing state of each frame of vehicle picture is identified in step S101, a vehicle picture frame with a changed door state (which proves that the vehicle is in the process of opening and closing the door) can be found out according to the door opening and closing state of the current frame of vehicle picture and the door opening and closing state of the previous frame of vehicle picture, and then all frames of vehicle pictures within a preset time before and after the video picture frame can be extracted from the input video as continuous frames. After the continuous frames are extracted, the door opening and closing state features of the continuous frames can be extracted through step S101 to form a door opening and closing state feature sequence, and then the door opening and closing motion recognition is performed on the door opening and closing state feature sequence to complete the door opening and closing motion recognition of the continuous frames. Because the vehicle door action recognition of continuous frames is carried out when the vehicle door state is determined to change, the embodiment does not need to input continuous frame pictures for action recognition during each prediction, and therefore the efficiency and the practicability of the action recognition are improved better. When the door opening and closing action is identified, an action identification classifier which is trained by a deep learning algorithm (such as a convolutional neural network algorithm) in advance can be adopted to complete the identification.

As can be seen from the above, in the embodiment, the vehicle door opening and closing motion recognition is decomposed into two parts, namely, the vehicle door state recognition and the vehicle door opening and closing motion recognition, in practical application, the vehicle door state recognition can be performed on each frame of video picture, and only when the vehicle door state change is determined, the vehicle door motion recognition of continuous frames is performed, so that the efficiency and the practicability of the motion recognition are greatly improved.

Further as a preferred embodiment, the method further includes a two-stage step training step S100, where the two-stage step training step specifically includes:

s1001, training a vehicle door state feature encoder, wherein the vehicle door state feature encoder is used for encoding and identifying the state feature of a vehicle door in a single-frame vehicle picture;

s1002, fixing parameters of a vehicle door state feature encoder, and training a motion recognition classifier, wherein the motion recognition classifier is used for recognizing opening and closing motions of a vehicle door in continuous frames.

In particular, in practical applications, video data has the problems of troublesome collection and small data volume. In order to obtain a higher recognition accuracy rate on fewer training samples, considering that a video is composed of multiple frames of pictures, a small number of videos can obtain a large number of picture samples, and the labeling of the pictures is simpler than that of the video samples, the embodiment adopts a two-stage step training method: in the first stage, firstly, a vehicle door opening and closing state feature encoder is constructed and trained to extract and identify the state features of the vehicle door in a single-frame vehicle picture; and then, parameters of the vehicle door opening and closing state feature encoder are fixed, and a motion recognition classifier is trained to recognize the vehicle door opening and closing motions of continuous frames.

Further as a preferred embodiment, the step S101 of performing door state feature extraction and identification on each frame of vehicle picture to obtain the door state feature and the door state identification result of each frame of vehicle picture specifically includes:

s1011, acquiring each frame of vehicle picture from the input video;

specifically, in practical applications, a video is composed of multiple frames of pictures, so that an input video can be divided into several frames of vehicle pictures by video division in units of frames.

And S1012, inputting each frame of vehicle picture into the trained vehicle door state feature encoder respectively to extract and identify the vehicle door state features, so as to obtain the vehicle door state features and the vehicle door state identification results of each frame of vehicle picture.

Specifically, because the door state feature encoder has been trained, the door state of each frame of vehicle picture and the corresponding door state feature can be output only by inputting each frame of vehicle picture into the door state feature encoder respectively during practical application, and the intelligent degree is high and convenient.

Further as a preferred embodiment, the step S1001 of training the door state feature encoder includes:

s10011, acquiring a given single-frame vehicle picture as a training sample, and acquiring the vehicle door state of the training sample as a label;

s10012, training by adopting an improved VGG16 network according to a training sample and a label to obtain a vehicle door state characteristic encoder, wherein the improved VGG16 network comprises 14 layers, the 14 layers are a first convolution layer, a first maximum pooling layer, a second convolution layer, a second maximum pooling layer, a third convolution layer, a third maximum pooling layer, a fourth convolution layer, a fourth maximum pooling layer, a fifth convolution layer, a fifth maximum pooling layer, a full-connection layer and a sigmoid classification layer in sequence, and the full-connection layer and the first to fifth convolution layers all adopt linear rectification functions.

Particularly, the VGG is named as Visual Geometry Group, belongs to a convolutional neural network model proposed in 2014 of Oxford university scientific engineering system, and has good feature extraction capability and portability. The VGG network can be divided into a VGG11 network with 11 layers (8 convolutional layers and 3 fully-connected layers), a VGG13 network with 13 layers (10 convolutional layers and 3 fully-connected layers), a VGG16 network with 16 layers and a VGG19 network with 19 layers according to the hidden layer number, wherein the VGG16 network with 16 layers (13 convolutional layers and 3 fully-connected layers) and the VGG19 network with 19 layers (16 convolutional layers and 3 fully-connected layers) are widely applied.

The car door state feature encoder of this embodiment has adopted modified VGG16 network structure, has reduced the network layer number and the characteristic dimension of each layer on VGG16 network structure's basis, on the effectual basis of assurance network structure, reduces the network parameter, improves the speed of network operation. Specifically, the improved VGG16 network structure comprises 6 convolutional layers and 2 fully-connected layers including 2 first convolutional layers and second to fifth convolutional layers, and the 5 maximum pooling layers of the first to fifth maximum pooling layers are the same as the maximum pooling layers of the conventional VGG16 network. In addition, since the door state includes two states, namely a door opening state and a door closing state, which belong to the problem of two classifications, the network structure of VGG16 modified in the embodiment adopts a sigmoid classification layer to replace the traditional softmax classifier as an output.

Therefore, the improved VGG16 network is adopted to train the car door state feature encoder, the training speed is increased, and the network structure is simpler and more efficient.

Further preferably, the step S102 of performing door opening and closing operation recognition on the door opening and closing state feature sequence of consecutive frames when it is determined that the door state change is recognized according to the door state recognition result specifically includes:

s1021, determining a vehicle picture frame with a changed vehicle door state according to the vehicle door state recognition result, wherein the vehicle picture frame with the changed vehicle door state is a vehicle picture frame with a different vehicle door state from that of a previous vehicle picture;

specifically, the vehicle picture frame with the changed door state can be determined according to the relative state of the current vehicle picture and the previous vehicle picture: and if the door state of the previous vehicle picture is identified as open and the door state of the current vehicle picture is identified as closed or the door state of the previous vehicle picture is identified as closed and the door state of the current vehicle picture is identified as open, determining that the door state of the current vehicle picture is changed. The current frame vehicle picture can be any frame vehicle picture in the input video.

S1022, obtaining continuous frames according to the vehicle picture frames with the changed vehicle door states;

specifically, after the vehicle picture frame in which the state of the door is changed is determined, all the frame pictures before or after the frame picture within a certain time may be selected from the input video as the continuous frames.

S1023, inputting the continuous frames into a vehicle door state feature encoder to obtain a vehicle door state feature sequence of the continuous frames;

and S1024, inputting the vehicle door state characteristic sequences of the continuous frames into the trained motion recognition classifier to obtain a vehicle door opening and closing motion recognition result.

Specifically, because the action recognition classifier is trained, complete vehicle door opening and closing action recognition can be performed only by inputting the vehicle door state feature sequence of continuous frames into the action recognition classifier in the embodiment during practical application, and the method is very convenient and high in intelligent degree.

Further as a preferred embodiment, the door opening and closing actions comprise opening keeping, opening changing to closing, closing changing to opening and closing keeping, and the action recognition classifier is trained by adopting a double-layer LSTM network.

Specifically, the door is kept open in a constantly open state, and the door is kept closed in a constantly closed state. Considering that the state change of the vehicle door in consecutive frames has a certain timing relationship, the motion recognition classifier of the present embodiment uses a double-layer LSTM (Long Short-Term Memory) network to extract timing information of the state change of the vehicle door.

In view of the disadvantages of the prior art, the present embodiment provides a detachable network structure capable of efficiently determining the door opening and closing actions of a video vehicle in practical applications, and provides a two-stage training method suitable for training a small number of samples. The embodiment designs a separable network structure comprising a vehicle door state feature encoder and an action recognition classifier, firstly trains the vehicle door state feature encoder by using a single-frame vehicle picture in a first stage, then fixes parameters of the vehicle door state feature encoder in a second stage, and trains the action recognition classifier by using a continuous frame sequence, thereby realizing the effect of still obtaining higher recognition accuracy under a small amount of samples. As shown in fig. 2, when the method is applied to an actual scene, the present embodiment decomposes the vehicle door opening and closing motion recognition into two parts, i.e., a door state recognition part and a motion recognition part, first identifies and extracts the opening and closing states of the vehicle door by using a door state feature encoder for each frame of vehicle picture in the input video, and then performs complete vehicle door opening and closing motion recognition on a continuous frame sequence including the frame by using a motion recognition classifier only when the vehicle state change is identified, thereby greatly improving the efficiency and the application feasibility of the vehicle door motion recognition technology. The following specifically describes the main contents of the present embodiment:

(1) a training method.

In practical application, video data has the problems of troublesome collection and small data volume. In order to obtain a higher recognition accuracy rate on fewer training samples, the specific embodiment adopts the two-stage step-by-step training method shown in fig. 3, in consideration of the characteristics that a video consists of multiple frames of pictures, a small number of videos can obtain a large number of picture samples, and the labeling of the pictures is simpler than that of the video samples. In the first stage, a door opening and closing state feature encoder shown in fig. 4 is constructed, a single-frame picture is input, the opening and closing state of the door in the single-frame picture is recognized, and the door opening and closing state feature encoder is trained to learn features representing the opening and closing state of the door. Then, in the second stage, the parameters of the door opening and closing state feature encoder are fixed, the continuous frames are input, the door opening and closing state feature sequence of the continuous frames is obtained through the feature encoding of the door opening and closing state features, the feature sequence is input into the action recognition classifier shown in fig. 5, and the action recognition classifier is trained.

(2) A network structure.

The network structure design of the door opening and closing state characteristic encoder is shown in fig. 4. The encoder reduces the number of network layers and the characteristic dimension of each layer on the basis of a VGG16 network structure, reduces network parameters on the basis of ensuring the effectiveness of the network structure, and improves the speed of network operation.

The output characteristic dimension of the vehicle door opening and closing state characteristic encoder can be kept consistent with that of a VGG16 network and is set to be 4096 dimensions, and at this time, as shown in FIG. 4, the specific processing procedures of the network structure are as follows in sequence:

1) the input picture is subjected to convolution twice and a linear rectification function ReLU through 32 convolution kernels 3 × 3 of the first convolution layer c1, and the size of the input picture is 224 × 224 × 32;

2) max posing (maximum pooling) by the first maximum pooling layer p1, the size of the maximum pooling layer p1 is 2 × 2 (effect is image size halving), and the pooled size becomes 112 × 112 × 32;

3) the sum of the first convolution and ReLU is obtained by 64 convolution kernels of 3 × 3 of the second convolution layer c2, and the size of the convolution kernels is 112 × 112 × 64;

4) 2 × 2 max pond formation is carried out through the second maximization pond layer p2, and the size is changed to 56 × 56 × 64;

5) making a convolution + ReLU by 128 convolution kernels of 3 × 3 of the third convolution layer c3, the size becomes 56 × 56 × 128;

6) 2 × 2 max pond formation is carried out by a third maximization pond layer p3, and the size is changed to 28 × 28 × 128;

7) making a convolution + ReLU by 256 convolution kernels of 3 × 3 of the fourth convolution layer c4, the size of which becomes 28 × 28 × 256;

8) 2 × 2 max pond formation is carried out by a fourth maximization pond layer p4, and the size is changed to 14 × 14 × 256;

9) performing primary convolution by using 256 convolution kernels of 3 × 3 of the fifth convolution layer c5 and keeping the size to be 14 × 14 × 256;

10) performing max posing pooling of 2 × 2 through a fifth maximized pooling layer p5, the size of which becomes 7 × 7 × 256;

11) full connection + ReLU with two layers 1 × 1 × 4096 of full connection layers Fc1 and Fc 2;

12) outputting 1 prediction result through sigmoid.

Under the condition of comprehensively considering resource occupation and network prediction accuracy, the output characteristic dimension of the encoder can be selected in a set {512, 1024, 2048, 4096} according to actual equipment configuration, the larger the characteristic dimension is, the higher the prediction accuracy is, but the more resources are occupied, the higher the requirement configuration on the equipment is.

And the network structure design of the action recognition classifier is shown in fig. 5. Considering that the state change of the vehicle door has a certain timing relationship in consecutive frames, the motion recognition classifier uses a double-layer LSTM (Long Short-Term Memory) network to extract timing information of the state change of the vehicle door. The classifier uses as input a sequence of encoded features of successive frames, as shown in fig. 2 and 5, forIn an input N-frame continuous frame sequence f_1,f₂,…,f_NObtaining a characteristic sequence (F) from each frame through a vehicle door state characteristic encoder_1,F₂,…,F_NIn which F_iFor the door state features of size 1 × 4096 encoded in the ith frame, N door state features of size 1 × 4096 are sequentially input into the double-layer LSTM network of FIG. 5, and the state features { C ] representing the door motion in successive frames are extracted_1,C₂,…,C₄₀₉₆}. Vehicle door action state feature { C_1,C₂,…,C₄₀₉₆And after the frames are coded by two full connection layers FC1 and FC2, the prediction results of the door motion types in the continuous frames are output by a Softmax classifier.

(3) And identifying the action of opening and closing the door of the vehicle.

In practical applications of the vehicle door opening and closing motion recognition algorithm, it is time consuming and impractical to perform a complete vehicle door opening and closing motion recognition for all consecutive frames. In the present embodiment, based on the two-stage distribution training mode adopted in the present application, in practical applications, the vehicle door state feature encoder may be used to distinguish the vehicle door state for each frame of vehicle picture in the video, and only when the vehicle door state change is detected, it is proved that the vehicle door opening and closing motion is occurring, the complete vehicle door opening and closing motion recognition is performed on the continuous frame sequence including the frame. The detection process can greatly improve the operation efficiency of the algorithm and enhance the feasibility of the algorithm in practical application.

Referring to fig. 6, an embodiment of the present invention provides a vehicle door opening and closing action recognition system, including the following modules:

the vehicle door state feature extraction and identification module 201 is used for extracting and identifying vehicle door state features of each frame of vehicle picture to obtain vehicle door state features and vehicle door state identification results of each frame of vehicle picture;

and the door opening and closing action recognition module 202 is configured to perform door opening and closing action recognition on a door opening and closing state feature sequence of consecutive frames when it is determined that the door state changes are recognized according to the door state recognition result, where the consecutive frames include vehicle picture frames in which the door state changes.

Referring to fig. 6, further as a preferred embodiment, the method further includes a two-stage step-by-step training module 200, where the two-stage step-by-step training module 200 specifically includes:

a first stage training unit 2001, configured to train a door state feature encoder, where the door state feature encoder is used for encoding and identifying a state feature of a door in a single frame of vehicle picture;

and a second-stage training unit 2002 for fixing parameters of the vehicle door state feature encoder and training a motion recognition classifier, wherein the motion recognition classifier is used for vehicle door opening and closing motion recognition of continuous frames.

The contents in the above method embodiments are all applicable to the present system embodiment, the functions specifically implemented by the present system embodiment are the same as those in the above method embodiment, and the beneficial effects achieved by the present system embodiment are also the same as those achieved by the above method embodiment.

Referring to fig. 7, an embodiment of the present invention provides a vehicle door opening and closing motion recognition system, including:

at least one processor 301;

at least one memory 302 for storing at least one program;

when the at least one program is executed by the at least one processor 301, the at least one processor 301 implements the method for identifying a vehicle door opening and closing action.

The embodiment of the invention also provides a storage medium, wherein processor-executable instructions are stored in the storage medium, and the processor-executable instructions are used for realizing the vehicle door opening and closing action identification method when being executed by a processor. The storage medium may be a floppy disk, an optical disk, a DVD, a hard disk, a flash Memory, a U disk, a CF card, an SD card, an MMC card, an SM card, a Memory Stick (Memory Stick), an XD card, or the like.

While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A vehicle door opening and closing action recognition method is characterized by comprising the following steps: the method comprises the following steps:

2. The vehicle door opening and closing action recognition method according to claim 1, characterized in that: the method further comprises a step-by-step training step of two stages, wherein the step-by-step training step of two stages specifically comprises the following steps:

3. The vehicle door opening and closing action recognition method according to claim 2, characterized in that: the step of extracting and identifying the vehicle door state features of each frame of vehicle picture to obtain the vehicle door state features and the vehicle door state identification result of each frame of vehicle picture specifically comprises the following steps:

acquiring each frame of vehicle picture from an input video;

4. The vehicle door opening and closing action recognition method according to claim 2, characterized in that: the vehicle door state comprises a vehicle door opening state and a vehicle door closing state, and the step of training the vehicle door state feature encoder specifically comprises the following steps:

5. The vehicle door opening and closing action recognition method according to claim 2, characterized in that: when the change of the vehicle door state is determined and recognized according to the vehicle door state recognition result, the step of recognizing the vehicle door opening and closing actions of the vehicle door opening and closing state characteristic sequence of the continuous frames specifically comprises the following steps:

6. The vehicle door opening and closing action recognition method according to claim 5, wherein: the door opening and closing actions comprise opening keeping, opening changing to closing, closing changing to opening and closing keeping, and the action recognition classifier is obtained by adopting double-layer LSTM network training.

7. A vehicle door opening and closing action recognition system is characterized in that: the system comprises the following modules:

8. A vehicle door opening and closing motion recognition system as claimed in claim 7, wherein: the device further comprises a double-stage step-by-step training module, wherein the double-stage step-by-step training module specifically comprises:

9. A vehicle door opening and closing action recognition system is characterized in that: the method comprises the following steps:

at least one processor;

at least one memory for storing at least one program;

when executed by the at least one processor, cause the at least one processor to implement a vehicle door opening and closing action recognition method as claimed in any one of claims 1-6.

10. A storage medium having stored therein instructions executable by a processor, the storage medium comprising: the processor-executable instructions, when executed by the processor, are for implementing a vehicle door opening and closing action recognition method as claimed in any one of claims 1 to 6.