CN114973425A - Traffic police gesture recognition method and device - Google Patents

Traffic police gesture recognition method and device Download PDF

Info

Publication number
CN114973425A
CN114973425A CN202210914213.XA CN202210914213A CN114973425A CN 114973425 A CN114973425 A CN 114973425A CN 202210914213 A CN202210914213 A CN 202210914213A CN 114973425 A CN114973425 A CN 114973425A
Authority
CN
China
Prior art keywords
traffic police
gesture
time sequence
historical
key point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210914213.XA
Other languages
Chinese (zh)
Inventor
黄冠英
敬思远
杨骏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Leshan Normal University
Original Assignee
Leshan Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Leshan Normal University filed Critical Leshan Normal University
Priority to CN202210914213.XA priority Critical patent/CN114973425A/en
Publication of CN114973425A publication Critical patent/CN114973425A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of gesture recognition, in particular to a method and a device for recognizing a traffic police gesture, wherein the method comprises the steps of acquiring a plurality of historical traffic police command gesture videos, wherein each historical traffic police command gesture video comprises P frames of continuous traffic police gesture images; determining a key point moving track time sequence in each historical traffic police directing gesture video based on a plurality of historical traffic police directing gesture videos, wherein the key point is a human skeleton key point of a traffic police; constructing a recognition model for recognizing the traffic police gesture based on the key point movement track time sequence; based on the recognition model, the collected target traffic police gestures are recognized, and then when the traffic police gestures are recognized, the key point moving track time sequence is processed, so that the processing amount is reduced compared with image processing, and when the traffic police gestures are recognized, the recognition efficiency of the traffic police gestures is improved.

Description

Traffic police gesture recognition method and device
Technical Field
The invention relates to the technical field of gesture recognition, in particular to a method and a device for recognizing a traffic police gesture.
Background
Gesture recognition has been successfully applied in a variety of fields, including: the driving assistance field, the security authentication field, the activity recognition field, and the like. Human gesture recognition is mostly to track human gestures and acquire human gesture data by using a camera device, a sensor, or the like.
The automatic driving technology is realized by the cooperation of multiple aspects such as vision, GPS, radar, monitoring system and the like, and the vehicle can be automatically controlled by sensing the surrounding environment without human operation, thereby providing convenience for the life of people.
However, under the conditions of traffic signal failure, vehicle driving peak or traffic police car inspection, the automatic driving technology cannot completely solve the driving problem on the road.
In order to solve the above problems, conventionally, a device such as a camera or other sensor is generally used to track and recognize the gesture motion of the traffic police, but such a recognition technology is often processing images, consumes time and power, has low recognition efficiency for a high-speed traffic system, and cannot meet the requirement of the high-speed traffic system.
Therefore, how to improve the recognition efficiency of the traffic police gesture is a technical problem to be solved urgently at present.
Disclosure of Invention
In view of the above, the present invention has been made to provide a method and apparatus for recognizing a traffic police gesture that overcomes or at least partially solves the above problems.
In a first aspect, the present invention provides a traffic police gesture recognition method, including:
acquiring a plurality of historical traffic police command gesture videos, wherein each historical traffic police command gesture video comprises P frames of continuous traffic police gesture images, and P is an integer greater than or equal to 30;
determining a key point moving track time sequence in each historical traffic police directing gesture video based on the plurality of historical traffic police directing gesture videos, wherein the key points are key points of human skeleton of traffic police;
constructing a recognition model for recognizing the traffic police gesture based on the key point movement track time sequence;
and identifying the collected target traffic police gestures based on the identification model.
Further, after obtaining a plurality of historical traffic police command gesture videos, the method further includes:
and deleting the background image of each historical traffic police command gesture video, and keeping the body outline of the traffic police.
Further, the determining a keypoint movement trajectory time sequence in each historical traffic police command gesture video based on the plurality of historical traffic police command gesture videos includes:
determining the positions of the left elbow, the left wrist, the right elbow and the right wrist of the traffic police in each historical traffic police directing gesture video and determining the position of a reference point based on the plurality of historical traffic police directing gesture videos;
determining the distances between the left elbow, the left wrist, the right elbow and the right wrist of the traffic police and the reference points in each frame of traffic police gesture images based on the positions of the left elbow, the left wrist, the right elbow and the right wrist and the positions of the reference points;
and determining a time sequence of the movement tracks of the key points in each historical traffic police command gesture video based on the distance.
Further, determining the reference point position includes:
acquiring the left ankle position and the right ankle position of the traffic police;
determining a midpoint of the left ankle position and the right ankle position as a reference point position.
Further, the determining a time sequence of the movement tracks of the key points in each historical traffic police command gesture video based on the distance comprises:
in each historical traffic police command gesture video, the distances between the left elbow, the left wrist, the right elbow and the right wrist of a corresponding traffic police in each frame of traffic police gesture image and a reference point are respectively arranged according to a preset numbering sequence of the reference point to serve as longitudinal data of a matrix, the preset numbering sequence of the reference point is specifically the numbering sequence in which the left elbow, the left wrist, the right elbow and the right wrist are arranged according to a preset rule, and the matrix is data formed by the key point movement track time sequence;
and respectively taking the distances between the left elbow, the left wrist, the right elbow and the right wrist of the traffic police corresponding to each frame of traffic police gesture image and the reference point as the transverse data of the matrix according to the time sequence of each frame of traffic police gesture image, so as to determine the time sequence of the movement track of the key point in each historical traffic police directing gesture video.
Further, constructing a recognition model for recognizing a traffic police gesture based on the time sequence of the key point movement tracks comprises:
and processing the time sequence of the movement tracks of the key points by adopting a self-adaptive dynamic time warping algorithm to obtain a recognition model for recognizing the traffic police gestures.
Further, the processing the time sequence of the movement tracks of the key points by using a self-adaptive dynamic time warping algorithm to obtain a recognition model for recognizing the traffic police gesture includes:
labeling the key point movement track time sequence with a corresponding traffic police gesture signal name, wherein the traffic police gesture signal name comprises: parking, straight traveling, left turning waiting, right turning, lane changing, speed reducing and side leaning;
taking the key point movement track time sequence and the corresponding traffic police gesture signal name as samples, and dividing the samples into training samples and testing samples;
processing the training sample by adopting a self-adaptive dynamic time warping algorithm to obtain a recognition model for recognizing the traffic police gesture;
and testing the recognition model based on the test sample.
In a second aspect, the present invention further provides a traffic police gesture recognition apparatus, including:
the video data acquisition module is used for acquiring a plurality of historical traffic police directing gesture videos, and each historical traffic police directing gesture video comprises P frames of continuous traffic police gesture images;
a key point data extraction module, which is used for determining a key point moving track time sequence in each historical traffic police command gesture video based on the plurality of historical traffic police command gesture videos, wherein the key points are human skeleton key points of traffic police;
the model building module is used for building a recognition model for recognizing the traffic police gesture based on the key point movement track time sequence;
and the gesture recognition module is used for recognizing the acquired target traffic police gestures based on the recognition model.
In a third aspect, the present invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method steps described in the first aspect when executing the program.
In a fourth aspect, the invention also provides a computer-readable storage medium, on which a computer program is stored, characterized in that the program, when executed by a processor, performs the method steps as described in the first aspect.
One or more technical solutions in the embodiments of the present invention have at least the following technical effects or advantages:
the invention provides a traffic police gesture recognition method, which comprises the steps of obtaining a plurality of historical traffic police command gesture videos, wherein each historical traffic police command gesture video comprises P frames of continuous traffic police gesture images, and P is an integer greater than or equal to 30; determining a key point moving track time sequence in each historical traffic police directing gesture video based on a plurality of historical traffic police directing gesture videos, wherein the key point is a human skeleton key point of a traffic police; constructing a recognition model for recognizing the traffic police gesture based on the key point movement track time sequence; based on the recognition model, the collected target traffic police gestures are recognized, and then when the traffic police gestures are recognized, the key point moving track time sequence is processed, so that the processing amount is reduced compared with image processing, and when the traffic police gestures are recognized, the recognition efficiency of the traffic police gestures is improved.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a flow chart illustrating steps of a traffic police gesture recognition method according to an embodiment of the invention;
FIG. 2 is a diagram illustrating key points in a traffic police mannequin in an embodiment of the present invention;
FIG. 3 is a flowchart illustrating a method for recognizing a traffic police gesture according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a traffic police gesture recognition apparatus according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a computer device for implementing a traffic police gesture recognition method in an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Example one
The embodiment of the invention provides a traffic police gesture recognition method, which comprises the following steps as shown in figure 1:
s101, obtaining a plurality of historical traffic police directing gesture videos, wherein each historical traffic police directing gesture video comprises P frames of continuous traffic police gesture images, and P is an integer greater than or equal to 30;
s102, determining a key point movement track time sequence in each historical traffic police command gesture video based on a plurality of historical traffic police command gesture videos;
s103, constructing a recognition model for recognizing the traffic police gesture based on the key point movement track time sequence;
and S104, identifying the collected target traffic police gestures based on the identification model.
In a specific embodiment, S101, a plurality of historical traffic police conducting gesture videos are obtained, where each historical traffic police conducting gesture video includes P frames of consecutive traffic police gesture images, and P is an integer greater than or equal to 30.
Specifically, a plurality of traffic police directing gesture videos under different scenes and different moving speeds are collected, wherein each historical traffic police directing gesture video comprises P frames of continuous traffic police gesture images, and P is an integer greater than or equal to 30. Specifically, 15 frames of images are collected in 1s, and one traffic police command gesture video is about 4s, so that one historical traffic police command gesture video comprises about 40-50 frames of images.
The collected traffic police command gesture videos can be generated by different traffic polices, such as the traffic polices with different heights and weights.
After obtaining a plurality of historical traffic police command gesture videos, deleting background images from each historical traffic police command gesture video, and keeping the body outline of a traffic police. So as to improve the accuracy of the later data extraction.
Next, S102 is executed, and based on the plurality of historical traffic police conducting gesture videos, a time sequence of movement trajectories of key points in each historical traffic police conducting gesture video is determined, where the key points are human skeleton key points of the traffic police.
First, based on the plurality of historical traffic police command gesture videos, the positions of the left elbow, left wrist, right elbow, and right wrist of the traffic police in each historical traffic police command gesture video are determined, and the reference point positions are determined.
And then, determining the distances between the left elbow, the left wrist, the right elbow and the right wrist of the traffic police and the reference point in each frame of traffic police gesture image based on the positions of the left elbow, the left wrist, the right elbow and the right wrist and the position of the reference point.
Finally, based on the distance, a time sequence of the movement track of the key point in each historical traffic police command gesture video is determined.
When the key point position of the traffic police in each historical traffic police conducting gesture video and the reference point position are determined, taking the traffic police skeleton model shown in fig. 2 as an example, the left elbow 201, the left wrist 202, the right elbow 203 and the right wrist 204 in the skeleton model are found.
Because each joint of the human body has different degrees of freedom, the contribution of the gesture to the motion of the human body is different, and the effective area is extracted according to the characteristics of the gesture of the traffic police, so that the calculation complexity of the whole system can be reduced, and the recognition speed is increased. Analysis of traffic police commands reveals that traffic police are upright in their torso, with less effective information being transmitted by the lower extremities, and that information is transmitted primarily by upper limb movements, involving arm movements and head rotations, so that the invention eliminates key point data in the lower body, and because head and shoulder rotations contribute relatively little to gesture recognition, only movement trajectory data for the left, right, and right elbows are considered.
Besides the key points of the left elbow 201, the left wrist 202, the right elbow 203 and the right wrist 204, the traffic police human skeleton model further comprises the following steps: head, neck, left shoulder, right shoulder, left hip, right hip, left knee, right knee, left ankle 205, right ankle 206. However, in the present invention, only the above four kinds of key points are considered, and although other key points can be considered, the data processing amount is increased and the processing effect is not greatly affected, so that the processing of other 10 key points is omitted and only 4 key points of the left elbow 201, the left wrist 202, the right elbow 203 and the right wrist 204 are processed.
In determining the reference point location, comprising: acquiring the position of a left ankle 205 and the position of a right ankle 206 of a traffic police; the midpoint of the left ankle 205 and right ankle 206 positions is determined as reference point O position.
Taking one of the frames as an example, the coordinates of the left ankle 205 are
Figure 92465DEST_PATH_IMAGE001
The right ankle coordinate is
Figure 727714DEST_PATH_IMAGE002
To obtain the coordinate of a reference point O
Figure 976293DEST_PATH_IMAGE003
. z is calculated as follows:
Figure 840344DEST_PATH_IMAGE005
(1)
and taking the overall height H of the traffic police human skeleton model as a reference, and acquiring the distance between each key point and a reference point O.
Since each historical traffic police dispatch gesture video includes P frames of consecutive traffic police gesture images, for example, for a "stopped" traffic police dispatch gesture video, a time series of keypoint movement trajectories for the "stopped" traffic police dispatch gesture is obtained.
With the left wrist 202 coordinate in the frame image
Figure 123558DEST_PATH_IMAGE006
For example, in the current frame, the distance between the left wrist 202 and the reference point O
Figure 395620DEST_PATH_IMAGE007
The calculation method of (2) is as follows:
Figure 131495DEST_PATH_IMAGE009
(2)
determining a time sequence of the movement tracks of the key points in each historical traffic police command gesture video based on the distance between each key point and the reference point O, wherein the time sequence comprises the following steps:
in each historical traffic police command gesture video, the distances between the left elbow, the left wrist, the right elbow and the right wrist of the traffic police corresponding to each frame of traffic police gesture image and a reference point are respectively arranged according to the preset serial number sequence of the reference point and used as longitudinal data of a matrix, the preset serial number sequence of the reference point is specifically the serial number sequence of arranging the left elbow, the left wrist, the right elbow and the right wrist according to a preset rule, and the matrix is data formed by a key point movement track time sequence.
And respectively taking the distances between the left elbow, the left wrist, the right elbow and the right wrist of the traffic police corresponding to each frame of traffic police gesture image and a reference point as the transverse data of the matrix according to the time sequence of each frame of traffic police gesture image, so as to determine the time sequence of the movement track of the key point in each historical traffic police directing gesture video.
For example, when the time series of the movement tracks of the key points are formed into a matrix,
Figure 799237DEST_PATH_IMAGE010
the distance between the first key point and the reference point in the first frame image is shown. Wherein, the first 1 represents the first frame image, and the second 1 represents the distance between the first key point and the reference point of the frame image.
Figure 936957DEST_PATH_IMAGE011
And the distance between the second key point and the reference point in the first frame image is represented.
And so on to obtain any longitudinal data of the matrix
Figure 648430DEST_PATH_IMAGE012
And represents four-dimensional data of an arbitrary frame image.
The preset numbering sequence may be specifically the sequence of the left elbow, the left wrist, the right elbow and the right wrist, and may be set arbitrarily, which is not limited herein.
Figure 606022DEST_PATH_IMAGE013
Representing the distance between a first key point of any frame image and a reference point, and forming an array by the distance between the first key point and the reference point along with the change of time
Figure 811875DEST_PATH_IMAGE014
As the horizontal data of the matrix.
When considering a plurality of key points, the following time series of key point moving tracks are obtained:
Figure 53369DEST_PATH_IMAGE015
(3)
the key point movement track time sequence finally obtained by the historical traffic police command gesture video is obtained, and a plurality of key point movement track time sequences are obtained for a plurality of historical traffic police command gesture videos.
In a specific implementation, if the traffic police gesture feature extraction is performed on the historical traffic police command gesture video, 8 standard traffic police gesture signals are obtained specifically according to the specified traffic safety law and related regulations: parking, straight going, left turning, right turning waiting, right turning, lane changing, speed reducing and side leaning. In addition, it is also included that traffic police often maintain a "right" pose when traffic conditions are not changed, from which historical traffic police command gesture video is divided into 800 meaningful traffic police command gestures.
Then, according to the above-mentioned mode for determining the time sequence of the key point moving track in each historical traffic police command gesture video, 800 time sequence data of the key point moving track of the traffic police gesture are obtained in total.
Next, S103 is executed, and a recognition model for recognizing the traffic police gesture is constructed based on the time series of the key point movement tracks.
In the embodiment of the present invention, a self-adaptive Dynamic Time Warping (DTW) is specifically adopted to process the Time sequence of the movement trajectory of the key point, so as to obtain an identification model for identifying a traffic police gesture.
The following describes in detail the obtaining of a recognition model for recognizing a traffic police gesture.
Taking the time sequence of the movement tracks of the key points and the corresponding names of the traffic police gesture signals as samples, and dividing the samples into training samples and testing samples;
then, processing the training sample by adopting a self-adaptive dynamic time warping algorithm to obtain a recognition model for recognizing the traffic police gesture;
and testing the recognition model based on the test sample.
Specifically, according to the above example, the gesture type Y is labeled to the time sequence of the movement track of the key point in each historical traffic police command gesture video, that is, the gesture type Y is labeled
Figure 952055DEST_PATH_IMAGE016
Respectively, parking, straight traveling, left turning waiting, right turning, lane changing, deceleration, and edge leaning.
The 800 time sequences of the movement tracks of the key points of the traffic police gestures are divided into 400 training sets, namely training samples, and 400 testing sets, namely testing samples.
The training samples are processed by adopting a self-adaptive dynamic time warping algorithm to obtain a recognition model for recognizing the traffic police gesture, wherein the self-adaptive DTW is a scheme for selecting between an independent DTW and a dependent DTW, namely when a NN-DTW is used for classifying a time sequence Q of a key point moving track of the traffic police gesture, if the categories of the dependent DTW and the independent DTW are different from the categories of the nearest neighbors of Q, the most correct distance function is required to be predicted.
Suppose that the m-th dimension data of any two sequences Q and C are respectively
Figure 662522DEST_PATH_IMAGE017
And
Figure 406487DEST_PATH_IMAGE018
and the length is n. The following formula (4) represents two sequences
Figure 768068DEST_PATH_IMAGE019
And
Figure 572075DEST_PATH_IMAGE020
i.e., DTW distance.
Figure 769839DEST_PATH_IMAGE021
Figure 317495DEST_PATH_IMAGE022
(4)
Wherein the content of the first and second substances,
Figure 270932DEST_PATH_IMAGE023
the distance function representing the two sequences is usually chosen as manhattan distance or euclidean distance, etc.
Independent DTW (DTW) I ) Is to measure the cumulative distance of all dimensions independently using DTW. Definition DTW: (
Figure 511420DEST_PATH_IMAGE019
Figure 930900DEST_PATH_IMAGE020
) If the distance between the mth dimension of Q and the mth dimension of C is DTW, the method for calculating the independence DTW is as follows:
Figure 282247DEST_PATH_IMAGE024
(5)
data of four dimensions in the time sequence of the movement locus of the key points of the traffic police gesture in the formula (5) are considered to be independent, and DTW can freely distort each dimension independently.
Dependent DTW (DTW) D ) All the time sequence data of the moving tracks of the key points of the traffic police gestures are forced to have independence between four dimensions, so that the distortion of all the dimensions is the same.
Figure 352840DEST_PATH_IMAGE025
(6)
In formula (6)
Figure 764230DEST_PATH_IMAGE026
Is that
Figure 405427DEST_PATH_IMAGE027
The Euclidean distance of data points, which is derived from the DTW generalization of the single-dimensional time-series data, similar to the DTW of the single-dimensional time-series (equation 4), will be
Figure 544153DEST_PATH_IMAGE028
Redefined as the cumulative distance of the M data points. Wherein the content of the first and second substances,
Figure 485564DEST_PATH_IMAGE029
is the ith data point in the mth dimension of Q,
Figure 67855DEST_PATH_IMAGE030
is the jth data point in the mth dimension of C.
Four different cases may occur when classifying a time series T using independent DTW and dependent DTW distance measurements. The first is that T is correctly classified by both dependent DTW and independent DTW. The second is that T is misclassified by independent DTW and dependent DTW. Third, T is correctly classified by independent DTW, but misclassified by dependent DTW. Fourth, T is correctly classified by dependent DTW but misclassified by independent DTW. The data sets for the third and fourth cases are referred to as iSuccess and dsuccesss, respectively. The method uses the training set of the traffic police gesture movement trajectory to calculate a threshold value threshold. When classifying the test set, a score function S (x) is calculated, and whether the trust independence DTW or the trust dependency DTW is selected according to a value of S relative to a threshold, as shown in formula (7):
Figure 461927DEST_PATH_IMAGE031
(7)
obtaining a threshold value according to whether the iSUCcess and the dSuccess are empty sets, wherein the threshold values are respectively as follows according to the four conditions: the first case threshold is set to 1, the second case threshold is determined by a decision tree function, the third case threshold is a threshold that minimizes the score function s (x), and the fourth case threshold is a threshold that maximizes the score function s (x). The second case is most common in these four cases, where finding a point using a decision tree function maximizes the information gain.
The adaptive dynamic time warping algorithm is adopted to process 400 training samples, wherein the time sequence of the movement tracks of the key points in the training samples is used as input data, and the signal names of the traffic police gestures corresponding to the time sequence of the movement tracks of the key points are used as output data, so that a recognition model for recognizing the traffic police gestures is obtained. Then, the recognition model is tested based on 400 test samples to correct the recognition model.
And after the corrected recognition model is obtained, executing S104, and recognizing the acquired target traffic police gesture based on the recognition model.
When the method is applied to the automatic driving mode, the gesture meaning of the intersection traffic police can be quickly recognized, and the corresponding driving mode is executed according to the gesture meaning.
As shown in fig. 3, the method for recognizing a traffic police gesture provided by the present invention includes: s301, obtaining a plurality of historical traffic police command gesture videos, then, extracting the body contour of the traffic police from the plurality of historical traffic police command gesture videos, S302, and then, obtaining the positions of the key points of the traffic police gesture by using the body contour information of the traffic police, S303. S304, converting the positions of the key points of the traffic police gestures into moving track data characteristics, namely a key point moving track time sequence, thereby obtaining an S305 time sequence data set. Then, the time series data set is used as a sample, and the sample is divided into a training sample and a test sample. Based on the fact that the training set is processed by adopting a preset algorithm, S306, a recognition model for recognizing the traffic police gesture is obtained, S307 is executed, the recognition model is tested to obtain an accurate recognition model, and finally S308 is executed, the recognition model is adopted for recognizing the traffic police gesture.
One or more technical solutions in the embodiments of the present invention have at least the following technical effects or advantages:
the invention provides a traffic police gesture recognition method, which comprises the steps of obtaining a plurality of historical traffic police command gesture videos, wherein each historical traffic police command gesture video comprises P frames of continuous traffic police gesture images, and P is an integer greater than or equal to 30; determining a key point moving track time sequence in each historical traffic police directing gesture video based on a plurality of historical traffic police directing gesture videos, wherein the key point is a human skeleton key point of a traffic police; constructing a recognition model for recognizing the traffic police gesture based on the key point movement track time sequence; based on the recognition model, the collected target traffic police gestures are recognized, and then when the traffic police gestures are recognized, data are processed through the key point moving track time sequence, so that the processing amount is reduced compared with image processing, and when the traffic police gestures are recognized, the recognition efficiency of the traffic police gestures is improved.
Moreover, the invention has low requirement on the resolution of the acquired video image because the technology of human body contour recognition is widely applied.
Example two
Based on the same inventive concept, an embodiment of the present invention further provides a traffic police gesture recognition apparatus, as shown in fig. 4, including:
the video data acquisition model 401 is used for acquiring a plurality of historical traffic police command gesture videos, wherein each historical traffic police command gesture video comprises P frames of continuous traffic police gesture images;
a key point data extraction model 402, configured to determine a time sequence of a movement trajectory of a key point in each historical traffic police command gesture video based on the plurality of historical traffic police command gesture videos, where the key point is a human skeleton key point of a traffic police;
a model construction model 403 for constructing a recognition model for recognizing a traffic police gesture based on the key point movement trajectory time sequence;
and the gesture recognition model 404 is used for recognizing the collected target traffic police gestures based on the recognition model.
In an optional implementation manner, the system further includes a deleting module, configured to:
and deleting the background image of each historical traffic police command gesture video, and keeping the body outline of the traffic police.
In an alternative embodiment, the key point data extracting module 402 includes:
the first determination unit is used for determining the positions of the left elbow, the left wrist, the right elbow and the right wrist of the traffic police in each historical traffic police directing gesture video and determining the position of a reference point based on the historical traffic police directing gesture videos;
the second determining unit is used for determining the distances between the left elbow, the left wrist, the right elbow and the right wrist of the traffic police and the reference point in each frame of traffic police gesture image based on the positions of the left elbow, the left wrist, the right elbow and the right wrist and the position of the reference point;
and the third determining unit is used for determining a key point movement track time sequence in each historical traffic police conducting gesture video based on the distance.
In an alternative embodiment, the first determining unit is configured to: acquiring the left ankle position and the right ankle position of a traffic police; determining a midpoint of the left ankle position and the right ankle position as a reference point position.
In an alternative embodiment, the third determining unit is configured to:
in each historical traffic police command gesture video, the distances between the left elbow, the left wrist, the right elbow and the right wrist of a corresponding traffic police in each frame of traffic police gesture image and a reference point are respectively arranged according to a preset numbering sequence of the reference point to serve as longitudinal data of a matrix, the preset numbering sequence of the reference point is specifically the numbering sequence in which the left elbow, the left wrist, the right elbow and the right wrist are arranged according to a preset rule, and the matrix is data formed by the key point movement track time sequence;
and respectively taking the distances between the left elbow, the left wrist, the right elbow and the right wrist of the traffic police corresponding to each frame of traffic police gesture image and the reference point as the transverse data of the matrix according to the time sequence of each frame of traffic police gesture image, so as to determine the time sequence of the movement track of the key point in each historical traffic police directing gesture video.
In an alternative embodiment, model building module 403 is used for
And processing the time sequence of the movement tracks of the key points by adopting a self-adaptive dynamic time warping algorithm to obtain a recognition model for recognizing the traffic police gestures.
In an alternative embodiment, the model building module 403 is specifically configured to:
labeling the key point movement track time sequence with a corresponding traffic police gesture signal name, wherein the traffic police gesture signal name comprises: parking, straight traveling, left turning waiting, right turning, lane changing, speed reducing and side leaning;
taking the key point movement track time sequence and the corresponding traffic police gesture signal name as samples, and dividing the samples into training samples and testing samples;
processing the training sample by adopting a self-adaptive dynamic time warping algorithm to obtain a recognition model for recognizing the traffic police gesture;
and testing the recognition model based on the test sample.
EXAMPLE III
Based on the same inventive concept, the embodiment of the present invention provides a computer device, as shown in fig. 5, including a memory 504, a processor 502 and a computer program stored on the memory 504 and capable of running on the processor 502, where the processor 402 implements the steps of the traffic police gesture recognition method when executing the program.
Where in fig. 5 a bus architecture (represented by bus 500) is shown, bus 500 may include any number of interconnected buses and bridges, and bus 500 links together various circuits including one or more processors, represented by processor 502, and memory, represented by memory 504. The bus 500 may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface 506 provides an interface between the bus 500 and the receiver 501 and transmitter 503. The receiver 501 and the transmitter 503 may be the same element, i.e., a transceiver, providing a means for communicating with various other apparatus over a transmission medium. The processor 502 is responsible for managing the bus 500 and general processing, and the memory 504 may be used for storing data used by the processor 502 in performing operations.
Example four
Based on the same inventive concept, embodiments of the present invention provide a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps of the traffic police gesture recognition method described above.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. It will be appreciated by those skilled in the art that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components of the traffic police gesture recognition apparatus, computer device, and/or the like in accordance with embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims (8)

1. A traffic police gesture recognition method is characterized by comprising the following steps:
acquiring a plurality of historical traffic police command gesture videos, wherein each historical traffic police command gesture video comprises P frames of continuous traffic police gesture images, and P is an integer greater than or equal to 30;
determining a key point moving track time sequence in each historical traffic police directing gesture video based on the plurality of historical traffic police directing gesture videos, wherein the key points are key points of human skeleton of traffic police;
constructing a recognition model for recognizing the traffic police gesture based on the key point movement track time sequence;
and identifying the collected target traffic police gestures based on the identification model.
2. The method of claim 1, wherein after obtaining the plurality of videos of historical traffic police headings gestures, further comprising:
and deleting the background image of each historical traffic police command gesture video, and keeping the body outline of the traffic police.
3. The method of claim 1, wherein determining a time series of keypoint movement trajectories in each historical traffic police command gesture video based on the plurality of historical traffic police command gesture videos comprises:
determining the positions of the left elbow, the left wrist, the right elbow and the right wrist of the traffic police in each historical traffic police directing gesture video and determining the position of a reference point based on the plurality of historical traffic police directing gesture videos;
determining the distances between the left elbow, the left wrist, the right elbow and the right wrist of the traffic police and the reference points in each frame of traffic police gesture images based on the positions of the left elbow, the left wrist, the right elbow and the right wrist and the positions of the reference points;
and determining a time sequence of the movement tracks of the key points in each historical traffic police command gesture video based on the distance.
4. The method of claim 3, wherein determining a reference point location comprises:
acquiring the left ankle position and the right ankle position of a traffic police;
determining a midpoint of the left ankle position and the right ankle position as a reference point position.
5. The method of claim 3, wherein the determining a time series of keypoint movement trajectories in each historical traffic police command gesture video based on the distance comprises:
in each historical traffic police command gesture video, the distances between the left elbow, the left wrist, the right elbow and the right wrist of a corresponding traffic police in each frame of traffic police gesture image and a reference point are respectively arranged according to a preset numbering sequence of the reference point to serve as longitudinal data of a matrix, the preset numbering sequence of the reference point is specifically the numbering sequence in which the left elbow, the left wrist, the right elbow and the right wrist are arranged according to a preset rule, and the matrix is data formed by a time sequence of the movement track of the key point;
and respectively taking the distances between the left elbow, the left wrist, the right elbow and the right wrist of the traffic police corresponding to each frame of traffic police gesture image and the reference point as the transverse data of the matrix according to the time sequence of each frame of traffic police gesture image, thereby determining the time sequence of the movement track of the key point in each historical traffic police command gesture video.
6. The method of claim 1, wherein constructing a recognition model for recognizing a traffic police gesture based on the time series of keypoint movement trajectories comprises:
and processing the time sequence of the movement tracks of the key points by adopting a self-adaptive dynamic time warping algorithm to obtain a recognition model for recognizing the traffic police gestures.
7. The method as claimed in claim 6, wherein the processing the time series of the key point movement tracks by using an adaptive dynamic time warping algorithm to obtain a recognition model for recognizing the traffic police gesture comprises:
labeling the key point movement track time sequence with a corresponding traffic police gesture signal name, wherein the traffic police gesture signal name comprises: parking, straight traveling, left turning waiting, right turning, lane changing, speed reducing and side leaning;
taking the key point movement track time sequence and the corresponding traffic police gesture signal name as samples, and dividing the samples into training samples and testing samples;
processing the training sample by adopting a self-adaptive dynamic time warping algorithm to obtain a recognition model for recognizing the traffic police gesture;
and testing the recognition model based on the test sample.
8. A traffic police gesture recognition device, comprising:
the video data acquisition module is used for acquiring a plurality of historical traffic police directing gesture videos, and each historical traffic police directing gesture video comprises P frames of continuous traffic police gesture images;
a key point data extraction module, which is used for determining a key point moving track time sequence in each historical traffic police command gesture video based on the plurality of historical traffic police command gesture videos, wherein the key points are human skeleton key points of traffic police;
the model building module is used for building a recognition model for recognizing the traffic police gesture based on the key point movement track time sequence;
and the gesture recognition module is used for recognizing the acquired target traffic police gestures based on the recognition model.
CN202210914213.XA 2022-08-01 2022-08-01 Traffic police gesture recognition method and device Pending CN114973425A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210914213.XA CN114973425A (en) 2022-08-01 2022-08-01 Traffic police gesture recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210914213.XA CN114973425A (en) 2022-08-01 2022-08-01 Traffic police gesture recognition method and device

Publications (1)

Publication Number Publication Date
CN114973425A true CN114973425A (en) 2022-08-30

Family

ID=82969203

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210914213.XA Pending CN114973425A (en) 2022-08-01 2022-08-01 Traffic police gesture recognition method and device

Country Status (1)

Country Link
CN (1) CN114973425A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115565253A (en) * 2022-12-08 2023-01-03 季华实验室 Dynamic gesture real-time recognition method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110837778A (en) * 2019-10-12 2020-02-25 南京信息工程大学 Traffic police command gesture recognition method based on skeleton joint point sequence
CN111797709A (en) * 2020-06-14 2020-10-20 浙江工业大学 Real-time dynamic gesture track recognition method based on regression detection
CN113887547A (en) * 2021-12-08 2022-01-04 北京世纪好未来教育科技有限公司 Key point detection method and device and electronic equipment
CN114299422A (en) * 2021-12-20 2022-04-08 中国人民解放军海军航空大学 Flight quality self-adaptive evaluation method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110837778A (en) * 2019-10-12 2020-02-25 南京信息工程大学 Traffic police command gesture recognition method based on skeleton joint point sequence
CN111797709A (en) * 2020-06-14 2020-10-20 浙江工业大学 Real-time dynamic gesture track recognition method based on regression detection
CN113887547A (en) * 2021-12-08 2022-01-04 北京世纪好未来教育科技有限公司 Key point detection method and device and electronic equipment
CN114299422A (en) * 2021-12-20 2022-04-08 中国人民解放军海军航空大学 Flight quality self-adaptive evaluation method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
SHOKOOHI-YEKTA M等: ""Generalizing DTW to the multi-dimensional case requires an adaptive approach"", 《DATA MIN KNOWL DISCOV》 *
张备伟等: "基于DTW的交警指挥手势识别方法", 《计算机应用研究》 *
赵思蕊 等: "基于3D骨架的交警指挥姿势动作识别仿真", 《计算机仿真》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115565253A (en) * 2022-12-08 2023-01-03 季华实验室 Dynamic gesture real-time recognition method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN113269073B (en) Ship multi-target tracking method based on YOLO V5 algorithm
CN111860155B (en) Lane line detection method and related equipment
US8005263B2 (en) Hand sign recognition using label assignment
CN112700470B (en) Target detection and track extraction method based on traffic video stream
US20190065872A1 (en) Behavior recognition apparatus, learning apparatus, and method and program therefor
CN110377025A (en) Sensor aggregation framework for automatic driving vehicle
CN108171112A (en) Vehicle identification and tracking based on convolutional neural networks
CN112805730A (en) Trajectory prediction method and related equipment
CN108388871B (en) Vehicle detection method based on vehicle body regression
CN110738101A (en) Behavior recognition method and device and computer readable storage medium
JP7078021B2 (en) Object detection device, object detection method and computer program for object detection
US10762440B1 (en) Sensor fusion and deep learning
CN111027381A (en) Method, device, equipment and storage medium for recognizing obstacle by monocular camera
CN111931764A (en) Target detection method, target detection framework and related equipment
CN112298194A (en) Lane changing control method and device for vehicle
CN114973425A (en) Traffic police gesture recognition method and device
CN110263664A (en) A kind of more occupant lanes are broken rules and regulations recognition methods and device
CN111008622B (en) Image object detection method and device and computer readable storage medium
CN113011285A (en) Lane line detection method and device, automatic driving vehicle and readable storage medium
US10867192B1 (en) Real-time robust surround view parking space detection and tracking
CN112232257A (en) Traffic abnormity determining method, device, equipment and medium
CN115392407B (en) Non-supervised learning-based danger source early warning method, device, equipment and medium
CN116964588A (en) Target detection method, target detection model training method and device
CN112232317B (en) Target detection method and device, equipment and medium for target orientation recognition
CN114842660A (en) Unmanned lane track prediction method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20220830