CN114973425A

CN114973425A - Traffic police gesture recognition method and device

Info

Publication number: CN114973425A
Application number: CN202210914213.XA
Authority: CN
Inventors: 黄冠英; 敬思远; 杨骏
Original assignee: Leshan Normal University
Current assignee: Leshan Normal University
Priority date: 2022-08-01
Filing date: 2022-08-01
Publication date: 2022-08-30

Abstract

The invention relates to the technical field of gesture recognition, in particular to a method and a device for recognizing a traffic police gesture, wherein the method comprises the steps of acquiring a plurality of historical traffic police command gesture videos, wherein each historical traffic police command gesture video comprises P frames of continuous traffic police gesture images; determining a key point moving track time sequence in each historical traffic police directing gesture video based on a plurality of historical traffic police directing gesture videos, wherein the key point is a human skeleton key point of a traffic police; constructing a recognition model for recognizing the traffic police gesture based on the key point movement track time sequence; based on the recognition model, the collected target traffic police gestures are recognized, and then when the traffic police gestures are recognized, the key point moving track time sequence is processed, so that the processing amount is reduced compared with image processing, and when the traffic police gestures are recognized, the recognition efficiency of the traffic police gestures is improved.

Description

Traffic police gesture recognition method and device

Technical Field

The invention relates to the technical field of gesture recognition, in particular to a method and a device for recognizing a traffic police gesture.

Background

Gesture recognition has been successfully applied in a variety of fields, including: the driving assistance field, the security authentication field, the activity recognition field, and the like. Human gesture recognition is mostly to track human gestures and acquire human gesture data by using a camera device, a sensor, or the like.

The automatic driving technology is realized by the cooperation of multiple aspects such as vision, GPS, radar, monitoring system and the like, and the vehicle can be automatically controlled by sensing the surrounding environment without human operation, thereby providing convenience for the life of people.

However, under the conditions of traffic signal failure, vehicle driving peak or traffic police car inspection, the automatic driving technology cannot completely solve the driving problem on the road.

In order to solve the above problems, conventionally, a device such as a camera or other sensor is generally used to track and recognize the gesture motion of the traffic police, but such a recognition technology is often processing images, consumes time and power, has low recognition efficiency for a high-speed traffic system, and cannot meet the requirement of the high-speed traffic system.

Therefore, how to improve the recognition efficiency of the traffic police gesture is a technical problem to be solved urgently at present.

Disclosure of Invention

In view of the above, the present invention has been made to provide a method and apparatus for recognizing a traffic police gesture that overcomes or at least partially solves the above problems.

In a first aspect, the present invention provides a traffic police gesture recognition method, including:

acquiring a plurality of historical traffic police command gesture videos, wherein each historical traffic police command gesture video comprises P frames of continuous traffic police gesture images, and P is an integer greater than or equal to 30;

determining a key point moving track time sequence in each historical traffic police directing gesture video based on the plurality of historical traffic police directing gesture videos, wherein the key points are key points of human skeleton of traffic police;

constructing a recognition model for recognizing the traffic police gesture based on the key point movement track time sequence;

and identifying the collected target traffic police gestures based on the identification model.

Further, after obtaining a plurality of historical traffic police command gesture videos, the method further includes:

and deleting the background image of each historical traffic police command gesture video, and keeping the body outline of the traffic police.

Further, the determining a keypoint movement trajectory time sequence in each historical traffic police command gesture video based on the plurality of historical traffic police command gesture videos includes:

determining the positions of the left elbow, the left wrist, the right elbow and the right wrist of the traffic police in each historical traffic police directing gesture video and determining the position of a reference point based on the plurality of historical traffic police directing gesture videos;

determining the distances between the left elbow, the left wrist, the right elbow and the right wrist of the traffic police and the reference points in each frame of traffic police gesture images based on the positions of the left elbow, the left wrist, the right elbow and the right wrist and the positions of the reference points;

and determining a time sequence of the movement tracks of the key points in each historical traffic police command gesture video based on the distance.

Further, determining the reference point position includes:

acquiring the left ankle position and the right ankle position of the traffic police;

determining a midpoint of the left ankle position and the right ankle position as a reference point position.

Further, the determining a time sequence of the movement tracks of the key points in each historical traffic police command gesture video based on the distance comprises:

in each historical traffic police command gesture video, the distances between the left elbow, the left wrist, the right elbow and the right wrist of a corresponding traffic police in each frame of traffic police gesture image and a reference point are respectively arranged according to a preset numbering sequence of the reference point to serve as longitudinal data of a matrix, the preset numbering sequence of the reference point is specifically the numbering sequence in which the left elbow, the left wrist, the right elbow and the right wrist are arranged according to a preset rule, and the matrix is data formed by the key point movement track time sequence;

and respectively taking the distances between the left elbow, the left wrist, the right elbow and the right wrist of the traffic police corresponding to each frame of traffic police gesture image and the reference point as the transverse data of the matrix according to the time sequence of each frame of traffic police gesture image, so as to determine the time sequence of the movement track of the key point in each historical traffic police directing gesture video.

Further, constructing a recognition model for recognizing a traffic police gesture based on the time sequence of the key point movement tracks comprises:

and processing the time sequence of the movement tracks of the key points by adopting a self-adaptive dynamic time warping algorithm to obtain a recognition model for recognizing the traffic police gestures.

Further, the processing the time sequence of the movement tracks of the key points by using a self-adaptive dynamic time warping algorithm to obtain a recognition model for recognizing the traffic police gesture includes:

labeling the key point movement track time sequence with a corresponding traffic police gesture signal name, wherein the traffic police gesture signal name comprises: parking, straight traveling, left turning waiting, right turning, lane changing, speed reducing and side leaning;

taking the key point movement track time sequence and the corresponding traffic police gesture signal name as samples, and dividing the samples into training samples and testing samples;

processing the training sample by adopting a self-adaptive dynamic time warping algorithm to obtain a recognition model for recognizing the traffic police gesture;

and testing the recognition model based on the test sample.

In a second aspect, the present invention further provides a traffic police gesture recognition apparatus, including:

the video data acquisition module is used for acquiring a plurality of historical traffic police directing gesture videos, and each historical traffic police directing gesture video comprises P frames of continuous traffic police gesture images;

a key point data extraction module, which is used for determining a key point moving track time sequence in each historical traffic police command gesture video based on the plurality of historical traffic police command gesture videos, wherein the key points are human skeleton key points of traffic police;

the model building module is used for building a recognition model for recognizing the traffic police gesture based on the key point movement track time sequence;

and the gesture recognition module is used for recognizing the acquired target traffic police gestures based on the recognition model.

In a third aspect, the present invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method steps described in the first aspect when executing the program.

In a fourth aspect, the invention also provides a computer-readable storage medium, on which a computer program is stored, characterized in that the program, when executed by a processor, performs the method steps as described in the first aspect.

One or more technical solutions in the embodiments of the present invention have at least the following technical effects or advantages:

the invention provides a traffic police gesture recognition method, which comprises the steps of obtaining a plurality of historical traffic police command gesture videos, wherein each historical traffic police command gesture video comprises P frames of continuous traffic police gesture images, and P is an integer greater than or equal to 30; determining a key point moving track time sequence in each historical traffic police directing gesture video based on a plurality of historical traffic police directing gesture videos, wherein the key point is a human skeleton key point of a traffic police; constructing a recognition model for recognizing the traffic police gesture based on the key point movement track time sequence; based on the recognition model, the collected target traffic police gestures are recognized, and then when the traffic police gestures are recognized, the key point moving track time sequence is processed, so that the processing amount is reduced compared with image processing, and when the traffic police gestures are recognized, the recognition efficiency of the traffic police gestures is improved.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 is a flow chart illustrating steps of a traffic police gesture recognition method according to an embodiment of the invention;

FIG. 2 is a diagram illustrating key points in a traffic police mannequin in an embodiment of the present invention;

FIG. 3 is a flowchart illustrating a method for recognizing a traffic police gesture according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a traffic police gesture recognition apparatus according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a computer device for implementing a traffic police gesture recognition method in an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Example one

The embodiment of the invention provides a traffic police gesture recognition method, which comprises the following steps as shown in figure 1:

s101, obtaining a plurality of historical traffic police directing gesture videos, wherein each historical traffic police directing gesture video comprises P frames of continuous traffic police gesture images, and P is an integer greater than or equal to 30;

s102, determining a key point movement track time sequence in each historical traffic police command gesture video based on a plurality of historical traffic police command gesture videos;

s103, constructing a recognition model for recognizing the traffic police gesture based on the key point movement track time sequence;

and S104, identifying the collected target traffic police gestures based on the identification model.

In a specific embodiment, S101, a plurality of historical traffic police conducting gesture videos are obtained, where each historical traffic police conducting gesture video includes P frames of consecutive traffic police gesture images, and P is an integer greater than or equal to 30.

Specifically, a plurality of traffic police directing gesture videos under different scenes and different moving speeds are collected, wherein each historical traffic police directing gesture video comprises P frames of continuous traffic police gesture images, and P is an integer greater than or equal to 30. Specifically, 15 frames of images are collected in 1s, and one traffic police command gesture video is about 4s, so that one historical traffic police command gesture video comprises about 40-50 frames of images.

The collected traffic police command gesture videos can be generated by different traffic polices, such as the traffic polices with different heights and weights.

After obtaining a plurality of historical traffic police command gesture videos, deleting background images from each historical traffic police command gesture video, and keeping the body outline of a traffic police. So as to improve the accuracy of the later data extraction.

Next, S102 is executed, and based on the plurality of historical traffic police conducting gesture videos, a time sequence of movement trajectories of key points in each historical traffic police conducting gesture video is determined, where the key points are human skeleton key points of the traffic police.

First, based on the plurality of historical traffic police command gesture videos, the positions of the left elbow, left wrist, right elbow, and right wrist of the traffic police in each historical traffic police command gesture video are determined, and the reference point positions are determined.

And then, determining the distances between the left elbow, the left wrist, the right elbow and the right wrist of the traffic police and the reference point in each frame of traffic police gesture image based on the positions of the left elbow, the left wrist, the right elbow and the right wrist and the position of the reference point.

Finally, based on the distance, a time sequence of the movement track of the key point in each historical traffic police command gesture video is determined.

When the key point position of the traffic police in each historical traffic police conducting gesture video and the reference point position are determined, taking the traffic police skeleton model shown in fig. 2 as an example, the left elbow 201, the left wrist 202, the right elbow 203 and the right wrist 204 in the skeleton model are found.

Because each joint of the human body has different degrees of freedom, the contribution of the gesture to the motion of the human body is different, and the effective area is extracted according to the characteristics of the gesture of the traffic police, so that the calculation complexity of the whole system can be reduced, and the recognition speed is increased. Analysis of traffic police commands reveals that traffic police are upright in their torso, with less effective information being transmitted by the lower extremities, and that information is transmitted primarily by upper limb movements, involving arm movements and head rotations, so that the invention eliminates key point data in the lower body, and because head and shoulder rotations contribute relatively little to gesture recognition, only movement trajectory data for the left, right, and right elbows are considered.

Besides the key points of the left elbow 201, the left wrist 202, the right elbow 203 and the right wrist 204, the traffic police human skeleton model further comprises the following steps: head, neck, left shoulder, right shoulder, left hip, right hip, left knee, right knee, left ankle 205, right ankle 206. However, in the present invention, only the above four kinds of key points are considered, and although other key points can be considered, the data processing amount is increased and the processing effect is not greatly affected, so that the processing of other 10 key points is omitted and only 4 key points of the left elbow 201, the left wrist 202, the right elbow 203 and the right wrist 204 are processed.

In determining the reference point location, comprising: acquiring the position of a left ankle 205 and the position of a right ankle 206 of a traffic police; the midpoint of the left ankle 205 and right ankle 206 positions is determined as reference point O position.

Taking one of the frames as an example, the coordinates of the left ankle 205 are

The right ankle coordinate is

To obtain the coordinate of a reference point O

. z is calculated as follows:

（1）

and taking the overall height H of the traffic police human skeleton model as a reference, and acquiring the distance between each key point and a reference point O.

Since each historical traffic police dispatch gesture video includes P frames of consecutive traffic police gesture images, for example, for a "stopped" traffic police dispatch gesture video, a time series of keypoint movement trajectories for the "stopped" traffic police dispatch gesture is obtained.

With the left wrist 202 coordinate in the frame image

For example, in the current frame, the distance between the left wrist 202 and the reference point O

The calculation method of (2) is as follows:

（2）

determining a time sequence of the movement tracks of the key points in each historical traffic police command gesture video based on the distance between each key point and the reference point O, wherein the time sequence comprises the following steps:

in each historical traffic police command gesture video, the distances between the left elbow, the left wrist, the right elbow and the right wrist of the traffic police corresponding to each frame of traffic police gesture image and a reference point are respectively arranged according to the preset serial number sequence of the reference point and used as longitudinal data of a matrix, the preset serial number sequence of the reference point is specifically the serial number sequence of arranging the left elbow, the left wrist, the right elbow and the right wrist according to a preset rule, and the matrix is data formed by a key point movement track time sequence.

And respectively taking the distances between the left elbow, the left wrist, the right elbow and the right wrist of the traffic police corresponding to each frame of traffic police gesture image and a reference point as the transverse data of the matrix according to the time sequence of each frame of traffic police gesture image, so as to determine the time sequence of the movement track of the key point in each historical traffic police directing gesture video.

For example, when the time series of the movement tracks of the key points are formed into a matrix,

the distance between the first key point and the reference point in the first frame image is shown. Wherein, the first 1 represents the first frame image, and the second 1 represents the distance between the first key point and the reference point of the frame image.

And the distance between the second key point and the reference point in the first frame image is represented.

And so on to obtain any longitudinal data of the matrix

And represents four-dimensional data of an arbitrary frame image.

The preset numbering sequence may be specifically the sequence of the left elbow, the left wrist, the right elbow and the right wrist, and may be set arbitrarily, which is not limited herein.

Representing the distance between a first key point of any frame image and a reference point, and forming an array by the distance between the first key point and the reference point along with the change of time

As the horizontal data of the matrix.

When considering a plurality of key points, the following time series of key point moving tracks are obtained:

（3）

the key point movement track time sequence finally obtained by the historical traffic police command gesture video is obtained, and a plurality of key point movement track time sequences are obtained for a plurality of historical traffic police command gesture videos.

In a specific implementation, if the traffic police gesture feature extraction is performed on the historical traffic police command gesture video, 8 standard traffic police gesture signals are obtained specifically according to the specified traffic safety law and related regulations: parking, straight going, left turning, right turning waiting, right turning, lane changing, speed reducing and side leaning. In addition, it is also included that traffic police often maintain a "right" pose when traffic conditions are not changed, from which historical traffic police command gesture video is divided into 800 meaningful traffic police command gestures.

Then, according to the above-mentioned mode for determining the time sequence of the key point moving track in each historical traffic police command gesture video, 800 time sequence data of the key point moving track of the traffic police gesture are obtained in total.

Next, S103 is executed, and a recognition model for recognizing the traffic police gesture is constructed based on the time series of the key point movement tracks.

In the embodiment of the present invention, a self-adaptive Dynamic Time Warping (DTW) is specifically adopted to process the Time sequence of the movement trajectory of the key point, so as to obtain an identification model for identifying a traffic police gesture.

The following describes in detail the obtaining of a recognition model for recognizing a traffic police gesture.

Taking the time sequence of the movement tracks of the key points and the corresponding names of the traffic police gesture signals as samples, and dividing the samples into training samples and testing samples;

then, processing the training sample by adopting a self-adaptive dynamic time warping algorithm to obtain a recognition model for recognizing the traffic police gesture;

and testing the recognition model based on the test sample.

Specifically, according to the above example, the gesture type Y is labeled to the time sequence of the movement track of the key point in each historical traffic police command gesture video, that is, the gesture type Y is labeled

Respectively, parking, straight traveling, left turning waiting, right turning, lane changing, deceleration, and edge leaning.

The 800 time sequences of the movement tracks of the key points of the traffic police gestures are divided into 400 training sets, namely training samples, and 400 testing sets, namely testing samples.

The training samples are processed by adopting a self-adaptive dynamic time warping algorithm to obtain a recognition model for recognizing the traffic police gesture, wherein the self-adaptive DTW is a scheme for selecting between an independent DTW and a dependent DTW, namely when a NN-DTW is used for classifying a time sequence Q of a key point moving track of the traffic police gesture, if the categories of the dependent DTW and the independent DTW are different from the categories of the nearest neighbors of Q, the most correct distance function is required to be predicted.

Suppose that the m-th dimension data of any two sequences Q and C are respectively

And

and the length is n. The following formula (4) represents two sequences

And

i.e., DTW distance.

，

(4)

Wherein the content of the first and second substances,

the distance function representing the two sequences is usually chosen as manhattan distance or euclidean distance, etc.

Independent DTW (DTW) _I ) Is to measure the cumulative distance of all dimensions independently using DTW. Definition DTW: (

，

) If the distance between the mth dimension of Q and the mth dimension of C is DTW, the method for calculating the independence DTW is as follows:

（5）

data of four dimensions in the time sequence of the movement locus of the key points of the traffic police gesture in the formula (5) are considered to be independent, and DTW can freely distort each dimension independently.

Dependent DTW (DTW) _D ) All the time sequence data of the moving tracks of the key points of the traffic police gestures are forced to have independence between four dimensions, so that the distortion of all the dimensions is the same.

（6）

In formula (6)

Is that

The Euclidean distance of data points, which is derived from the DTW generalization of the single-dimensional time-series data, similar to the DTW of the single-dimensional time-series (equation 4), will be

Redefined as the cumulative distance of the M data points. Wherein the content of the first and second substances,

is the ith data point in the mth dimension of Q,

is the jth data point in the mth dimension of C.

Four different cases may occur when classifying a time series T using independent DTW and dependent DTW distance measurements. The first is that T is correctly classified by both dependent DTW and independent DTW. The second is that T is misclassified by independent DTW and dependent DTW. Third, T is correctly classified by independent DTW, but misclassified by dependent DTW. Fourth, T is correctly classified by dependent DTW but misclassified by independent DTW. The data sets for the third and fourth cases are referred to as iSuccess and dsuccesss, respectively. The method uses the training set of the traffic police gesture movement trajectory to calculate a threshold value threshold. When classifying the test set, a score function S (x) is calculated, and whether the trust independence DTW or the trust dependency DTW is selected according to a value of S relative to a threshold, as shown in formula (7):

（7）

obtaining a threshold value according to whether the iSUCcess and the dSuccess are empty sets, wherein the threshold values are respectively as follows according to the four conditions: the first case threshold is set to 1, the second case threshold is determined by a decision tree function, the third case threshold is a threshold that minimizes the score function s (x), and the fourth case threshold is a threshold that maximizes the score function s (x). The second case is most common in these four cases, where finding a point using a decision tree function maximizes the information gain.

The adaptive dynamic time warping algorithm is adopted to process 400 training samples, wherein the time sequence of the movement tracks of the key points in the training samples is used as input data, and the signal names of the traffic police gestures corresponding to the time sequence of the movement tracks of the key points are used as output data, so that a recognition model for recognizing the traffic police gestures is obtained. Then, the recognition model is tested based on 400 test samples to correct the recognition model.

And after the corrected recognition model is obtained, executing S104, and recognizing the acquired target traffic police gesture based on the recognition model.

When the method is applied to the automatic driving mode, the gesture meaning of the intersection traffic police can be quickly recognized, and the corresponding driving mode is executed according to the gesture meaning.

As shown in fig. 3, the method for recognizing a traffic police gesture provided by the present invention includes: s301, obtaining a plurality of historical traffic police command gesture videos, then, extracting the body contour of the traffic police from the plurality of historical traffic police command gesture videos, S302, and then, obtaining the positions of the key points of the traffic police gesture by using the body contour information of the traffic police, S303. S304, converting the positions of the key points of the traffic police gestures into moving track data characteristics, namely a key point moving track time sequence, thereby obtaining an S305 time sequence data set. Then, the time series data set is used as a sample, and the sample is divided into a training sample and a test sample. Based on the fact that the training set is processed by adopting a preset algorithm, S306, a recognition model for recognizing the traffic police gesture is obtained, S307 is executed, the recognition model is tested to obtain an accurate recognition model, and finally S308 is executed, the recognition model is adopted for recognizing the traffic police gesture.

the invention provides a traffic police gesture recognition method, which comprises the steps of obtaining a plurality of historical traffic police command gesture videos, wherein each historical traffic police command gesture video comprises P frames of continuous traffic police gesture images, and P is an integer greater than or equal to 30; determining a key point moving track time sequence in each historical traffic police directing gesture video based on a plurality of historical traffic police directing gesture videos, wherein the key point is a human skeleton key point of a traffic police; constructing a recognition model for recognizing the traffic police gesture based on the key point movement track time sequence; based on the recognition model, the collected target traffic police gestures are recognized, and then when the traffic police gestures are recognized, data are processed through the key point moving track time sequence, so that the processing amount is reduced compared with image processing, and when the traffic police gestures are recognized, the recognition efficiency of the traffic police gestures is improved.

Moreover, the invention has low requirement on the resolution of the acquired video image because the technology of human body contour recognition is widely applied.

Example two

Based on the same inventive concept, an embodiment of the present invention further provides a traffic police gesture recognition apparatus, as shown in fig. 4, including:

the video data acquisition model 401 is used for acquiring a plurality of historical traffic police command gesture videos, wherein each historical traffic police command gesture video comprises P frames of continuous traffic police gesture images;

a key point data extraction model 402, configured to determine a time sequence of a movement trajectory of a key point in each historical traffic police command gesture video based on the plurality of historical traffic police command gesture videos, where the key point is a human skeleton key point of a traffic police;

a model construction model 403 for constructing a recognition model for recognizing a traffic police gesture based on the key point movement trajectory time sequence;

and the gesture recognition model 404 is used for recognizing the collected target traffic police gestures based on the recognition model.

In an optional implementation manner, the system further includes a deleting module, configured to:

In an alternative embodiment, the key point data extracting module 402 includes:

the first determination unit is used for determining the positions of the left elbow, the left wrist, the right elbow and the right wrist of the traffic police in each historical traffic police directing gesture video and determining the position of a reference point based on the historical traffic police directing gesture videos;

the second determining unit is used for determining the distances between the left elbow, the left wrist, the right elbow and the right wrist of the traffic police and the reference point in each frame of traffic police gesture image based on the positions of the left elbow, the left wrist, the right elbow and the right wrist and the position of the reference point;

and the third determining unit is used for determining a key point movement track time sequence in each historical traffic police conducting gesture video based on the distance.

In an alternative embodiment, the first determining unit is configured to: acquiring the left ankle position and the right ankle position of a traffic police; determining a midpoint of the left ankle position and the right ankle position as a reference point position.

In an alternative embodiment, the third determining unit is configured to:

In an alternative embodiment, model building module 403 is used for

In an alternative embodiment, the model building module 403 is specifically configured to:

and testing the recognition model based on the test sample.

EXAMPLE III

Based on the same inventive concept, the embodiment of the present invention provides a computer device, as shown in fig. 5, including a memory 504, a processor 502 and a computer program stored on the memory 504 and capable of running on the processor 502, where the processor 402 implements the steps of the traffic police gesture recognition method when executing the program.

Where in fig. 5 a bus architecture (represented by bus 500) is shown, bus 500 may include any number of interconnected buses and bridges, and bus 500 links together various circuits including one or more processors, represented by processor 502, and memory, represented by memory 504. The bus 500 may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface 506 provides an interface between the bus 500 and the receiver 501 and transmitter 503. The receiver 501 and the transmitter 503 may be the same element, i.e., a transceiver, providing a means for communicating with various other apparatus over a transmission medium. The processor 502 is responsible for managing the bus 500 and general processing, and the memory 504 may be used for storing data used by the processor 502 in performing operations.

Example four

Based on the same inventive concept, embodiments of the present invention provide a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps of the traffic police gesture recognition method described above.

The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. It will be appreciated by those skilled in the art that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components of the traffic police gesture recognition apparatus, computer device, and/or the like in accordance with embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims

1. A traffic police gesture recognition method is characterized by comprising the following steps:

2. The method of claim 1, wherein after obtaining the plurality of videos of historical traffic police headings gestures, further comprising:

3. The method of claim 1, wherein determining a time series of keypoint movement trajectories in each historical traffic police command gesture video based on the plurality of historical traffic police command gesture videos comprises:

4. The method of claim 3, wherein determining a reference point location comprises:

acquiring the left ankle position and the right ankle position of a traffic police;

5. The method of claim 3, wherein the determining a time series of keypoint movement trajectories in each historical traffic police command gesture video based on the distance comprises:

in each historical traffic police command gesture video, the distances between the left elbow, the left wrist, the right elbow and the right wrist of a corresponding traffic police in each frame of traffic police gesture image and a reference point are respectively arranged according to a preset numbering sequence of the reference point to serve as longitudinal data of a matrix, the preset numbering sequence of the reference point is specifically the numbering sequence in which the left elbow, the left wrist, the right elbow and the right wrist are arranged according to a preset rule, and the matrix is data formed by a time sequence of the movement track of the key point;

and respectively taking the distances between the left elbow, the left wrist, the right elbow and the right wrist of the traffic police corresponding to each frame of traffic police gesture image and the reference point as the transverse data of the matrix according to the time sequence of each frame of traffic police gesture image, thereby determining the time sequence of the movement track of the key point in each historical traffic police command gesture video.

6. The method of claim 1, wherein constructing a recognition model for recognizing a traffic police gesture based on the time series of keypoint movement trajectories comprises:

7. The method as claimed in claim 6, wherein the processing the time series of the key point movement tracks by using an adaptive dynamic time warping algorithm to obtain a recognition model for recognizing the traffic police gesture comprises:

and testing the recognition model based on the test sample.

8. A traffic police gesture recognition device, comprising: