CN115641610A

CN115641610A - Hand-waving help-seeking identification system and method

Info

Publication number: CN115641610A
Application number: CN202211259423.6A
Authority: CN
Inventors: 刘秦; 周晓; 王磊; 孙岩
Original assignee: Shenyang Zhanyan Technology Co ltd
Current assignee: Shenyang Zhanyan Technology Co ltd
Priority date: 2022-10-14
Filing date: 2022-10-14
Publication date: 2023-01-24

Abstract

The invention belongs to the field of artificial intelligence identification, in particular to a system and a method for identifying a waving hand for help, which comprises the following steps: the system comprises a feature extraction unit, a calculation module and a hand waving distress detection unit; the feature extraction unit is used for extracting acoustic features, acquiring the preprocessed face images, respectively sending the face images to the hand waving distress detection unit, acquiring the skeleton key point information of the personnel and transmitting the skeleton key point information to the calculation module; the computing module is used for detecting hand waving action, hand waving amplitude and sitting, lying or standing postures of personnel, acquiring hand waving frequency, and sending detection results comprising the extension degree index, the hand waving action state and the sitting, lying or standing postures of the personnel and the hand waving frequency to the hand waving distress detection unit; the hand-waving distress detection unit judges whether a person is waving a hand for distress by adopting a comprehensive weight method. The invention relates to a hand waving distress fusion strategy: a multi-state fusion strategy is provided, so that the finally obtained result of the swing distress judgment is more robust.

Description

Hand-waving help-seeking identification system and method

Technical Field

The invention belongs to the field of artificial intelligence identification, and particularly relates to a hand-waving distress identification system and method.

Background

With the development of deep learning and machine vision, the application based on artificial intelligence is gradually mature and widely applied. The existing hand-waving distress recognition system is mainly based on visual images, acquires gesture key points, establishes a distress recognition model by using a deep learning model or a manual logic rule, judges whether to wave a hand, detects hand waving frequency and recognizes distress behaviors.

The method depends on the identification of key points, and due to factors such as shielding, visual angle, illumination, scale, motion blur and the like, the identification of key points of human body postures is interfered, so that the identification result is unstable and inaccurate, and the requirement of high accuracy of a real use scene cannot be met.

Disclosure of Invention

The invention aims to provide a stable and high-reliability hand-waving help-seeking identification method, and provides a plurality of strategies to improve and improve the help-seeking identification effect aiming at the problems.

The technical scheme adopted by the invention for realizing the purpose is as follows: a hand-waving distress recognition system comprising: the system comprises a feature extraction unit, a calculation module and a waving help-seeking detection unit;

the feature extraction unit is used for receiving the audio stream and the video stream sent by the camera, extracting MFCC acoustic features and acquiring a preprocessed face image, respectively sending the face image and the acoustic features to the hand-waving distress detection unit, and meanwhile, extracting the posture information of the human skeleton in the video stream, acquiring the key point information of the human skeleton, and transmitting the key point information to the calculation module;

the computing module is used for detecting hand waving action, hand waving amplitude and sitting, lying or standing postures of the personnel according to the key point information of the personnel skeleton, and sending detection results comprising the extension degree index, the hand waving action state and the sitting, lying or standing postures of the personnel to the hand waving help-seeking detection unit; meanwhile, according to the coordinates of key points of the human body, obtaining an extension degree index, further obtaining a waving frequency, and sending the waving frequency to a waving distress detection unit;

and the hand-waving distress detection unit is used for processing the MFCC acoustic features extracted by the feature extraction unit, the sent face image and the detection result sent by the calculation module, and judging whether the person is waving for distress by adopting a comprehensive weight method.

The feature extraction unit includes: the system comprises a sound feature extraction module, a facial feature preprocessing module and a human body posture detection module;

the voice feature extraction module is used for receiving the audio stream sent by the camera, processing continuous voice in the audio stream, acquiring N-dimensional MFCC features and sending the N-dimensional MFCC features to the waving help-seeking detection unit;

the facial feature preprocessing module is used for acquiring images through video streaming of a camera, detecting faces of people through a machine learning method, removing background and non-face areas, acquiring face point location coordinates, performing face feature normalization processing on the face images according to the face point location coordinates, and sending the processed face images to the swing help-seeking detection unit;

and the human body posture detection module is used for receiving the video stream sent by the camera, extracting the personnel skeleton posture information through the deep neural network, acquiring the personnel skeleton key point information and transmitting the personnel skeleton key point information to the calculation module.

The calculation module comprises: the hand waving action detection module and the extension degree index calculation module;

the hand waving motion detection module is used for carrying out normalization preprocessing on coordinates and scales according to the key point information of the human body to obtain key point characteristics, judging whether a person standing, sitting and lying in the posture information has a hand waving motion state or not, and obtaining hand waving amplitude information according to the hand waving motion state; meanwhile, outputting the hand-waving gesture confidence coefficient under the hand-waving action states of standing, sitting and lying corresponding to the person according to the gesture detection model; sending posture information of the human body corresponding to standing, sitting and lying, hand waving amplitude information and hand waving posture confidence coefficient to a hand waving distress detection unit;

the extension degree index calculation module is used for acquiring an extension degree index according to the coordinates of the key points of the human body so as to reflect the posture extension degree of the human body and calculating the hand waving frequency according to the extension degree index; and sending the hand waving frequency to a hand waving distress detection unit.

The hand waving distress detection unit comprises: the system comprises a voice event recognition module, a facial expression recognition module and a hand waving distress detection module;

the voice event recognition module is used for carrying out event detection by utilizing a deep learning model according to the MFCC characteristics extracted by the characteristic extraction unit to obtain a voice classification result containing classified voice and corresponding confidence information output by the current frame voice;

the facial expression recognition module is used for extracting facial features through a deep learning model according to the facial images sent by the feature extraction unit, carrying out training reasoning and obtaining a facial expression classification result containing the facial expressions of the classified human bodies and the corresponding confidence information output by the facial expressions of the current frame;

and the hand waving distress detection module is used for judging whether the person waves hands for help according to the voice classification result, the facial expression classification result and the posture information of the corresponding station, the sitting posture and the lying posture, the hand waving amplitude information, the hand waving posture confidence coefficient and the hand waving frequency sent by the calculation module by utilizing a comprehensive weight method.

A recognition method of a hand-waving distress recognition system comprises the following steps:

1) Sending the audio stream sent by the camera to a sound feature extraction module, and sending the video stream to a facial feature preprocessing module and a human body posture detection module respectively;

2-1) the sound feature extraction module receives the audio stream sent by the camera, processes the continuous voice, obtains MFCC acoustic features and sends the MFCC acoustic features to the waving help-seeking detection unit;

2-2) the facial feature preprocessing module acquires images through video stream of a camera, performs face detection on personnel through a machine learning method, removes background and non-face areas, acquires face point coordinates, performs face feature normalization processing on the face images according to the face point coordinates, and sends the processed face images to a waving help detection unit;

2-3) the human body posture detection module receives the video stream sent by the camera, extracts the personnel skeleton posture information through the deep neural network, acquires the personnel skeleton key point information and transmits the personnel skeleton key point information to the calculation module;

3-1) the calculation module performs normalization pretreatment on coordinates and scales according to the key point information of the human body to obtain key point characteristics, judges whether a person standing, sitting and lying in the gesture information has a hand waving action state or not, and obtains hand waving amplitude information according to the hand waving action state; meanwhile, outputting the confidence coefficients of the hand waving postures of standing, sitting and lying corresponding to the person at the moment according to the posture detection model; sending gesture information of a person corresponding to standing, sitting and lying, hand waving amplitude information and hand waving gesture confidence to a hand waving distress detection unit;

3-2) the calculation module obtains the extension degree index delta according to the key point information of the human body _HPS And according to the elongation index delta _HPS Calculating the hand waving frequency, and sending the hand waving frequency to a hand waving distress detection unit;

4) The hand-waving help-seeking detection unit detects an event by using a deep learning model according to the MFCC acoustic features extracted by the feature extraction unit, and the event comprises classified voices and a voice classification result of confidence information corresponding to the current frame voice output; meanwhile, facial features are extracted through a deep learning model according to the facial images sent by the feature extraction unit, training reasoning is carried out, and a facial expression classification result comprising the classified facial expressions of the human body and confidence information output by the corresponding facial expressions of the current frame is obtained;

5) The hand waving distress detection unit judges whether the person is waving for distress according to the voice classification result, the facial expression classification result, and the posture information of the corresponding station, sitting and lying of the human body, the hand waving amplitude information, the hand waving posture confidence coefficient and the hand waving frequency sent by the calculation module by utilizing a comprehensive weight method.

In step 2-1), after processing the continuous speech, obtaining MFCC acoustic characteristics, specifically:

carrying out pre-emphasis, framing and windowing operations on continuous voice of an audio stream in sequence to obtain pre-processed voice information;

and sequentially performing fast Fourier transform, mei filter bank, logarithm operation, discrete cosine transform and dynamic feature extraction on the preprocessed sound information to finally obtain the N-dimensional MFCC acoustic features.

The step 3-1) specifically comprises the following steps:

3-1-1) obtaining the posture information of standing, sitting or lying of the person at the moment and the confidence coefficient of the hand waving posture of the current frame corresponding to the standing, sitting and lying by the calculation module through a posture detection model according to the key point information of the human body; the attitude detection model is a CNN model;

3-1-2) simultaneously acquiring included angle data among the joints according to key point information of the human body, and acquiring hand waving amplitude information of the human body during hand waving according to included angle data between a small arm and a large arm and included angle data between the large arm and a shoulder in the included angle data among the joints;

3-1-3) the calculation module sends the posture information of the corresponding station, sitting and lying of the human body, the hand waving amplitude information and the hand waving posture confidence coefficient to the hand waving distress detection unit.

8. The identification method of the hand-waving distress identification system according to claim 5, wherein in the step 3-2), the calculation module obtains the extension degree index delta according to the key point information of the human body _HPS The method specifically comprises the following steps:

3-2-1) the calculation module constructs a matrix structure for the human body key point information of the given posture as follows:

X＝[x ₁ ,…,x _n ]∈R ^D×n

wherein D is the dimension of the key points, n is the number of the key points, and x _n The coordinates of the nth key point are shown, and R is a real number set;

3-2-2) order

Defining the mean value of all key point coordinates in X

The variance sum delta of the principal axes of the key points of the human body posture _HPS Is defined as follows:

where U is the projection matrix, x _i Is the ith keypoint coordinate, D is a positive number less than the keypoint dimension D, λ _j Is a covariance matrix

The jth eigenvalue, tr, represents a trace of the matrix;

3-2-3) according to the elongation index delta _HPS Calculating the hand waving frequency specifically as follows:

δ _HPS the sum of the variances of the principal axes of the key points of the human body posture, namely the posture extension degree of the reaction person;

when the hand swings to two sides of the human body, delta _HPS The value is maximum; when the hand swings to the vertex and is in line with the trunk of the human body, delta _HPS Value minimum, by counting delta _HPS Changing the rule of the period on a time axis, namely obtaining hand waving frequency information during hand waving action; and sending the obtained hand waving frequency information to a hand waving distress detection unit.

The step 4) is specifically as follows:

4-1) for sound event classification, specifically:

during model training, input MFCC features are normalized, label texts for representing corresponding expression names are read to generate label vectors, the MFCC features and the label vectors are bound, then preprocessing is carried out, and then the MFCC features and the label vectors are sent to a deep learning model to learn parameters to obtain a voice classification model;

wherein, the deep learning model is any one of a CNN model, an RCNN model and an LSTM model;

during model prediction, input MFCC features are normalized, then preprocessing is carried out, then the input MFCC features are sent into a voice classification model for prediction, and the obtained data are subjected to post-processing to obtain a voice classification result containing classified voice and corresponding confidence information of current frame voice output;

4-2) classifying facial expressions, specifically:

extracting facial features from the face image through a deep learning model;

the deep learning model is any one of a convolutional neural network, a deep confidence network, a deep automatic encoder and a recurrent neural network;

after the facial features are extracted, the facial expressions are classified through a convolutional neural network, so that a facial expression classification result containing the classified facial expressions of the human body and corresponding confidence coefficient information output by the current frame facial expressions is obtained.

The step 5) is specifically as follows:

(1) The hand-waving distress detection unit obtains a global strategy of waving hands for distress by utilizing a comprehensive weight method according to a voice classification result containing classified voices and corresponding confidence information of current frame voice output, a facial expression classification result containing classified facial expressions of human bodies and corresponding confidence information of current frame facial expression output, human body corresponding station, sitting posture information, lying posture information, waving amplitude information, waving posture confidence and waving frequency sent by a calculation module:

wherein,

the confidence of the hand waving distress at the current time t,

the confidence of the hand waving distress gesture at the current time t,

for the confidence of the sound event detection at the current time t,

the confidence coefficient of the facial expression at the current moment t is obtained; w is a _pose ，w _sound ，w _face Respectively weighting the hand-waving gesture, the sound event and the facial expression;

wherein:

w _pose +w _sound +w _face ＝1

w _pose ,w _sound ,w _face for a preset assignment, w _pose ＞w _sound ,w _pose ＞w _face ；

(2) Obtaining by means of exponentially weighted moving average

Namely:

wherein, beta represents a weighting coefficient,

for the confidence of the current frame hand waving distress posture,

confidence information output for the current frame sound detection,

confidence information output for the current frame facial expression;

(3) For the confidence of the current frame waving hand for asking for help

The current frame wave hand gesture confidence, wave hand amplitude information, wave hand frequency information and gesture addition information are comprehensively obtained, namely:

wherein,

for the current frame hand-waving gesture confidence level,

for the current frame of hand-waving amplitude information,

for the current frame of hand waving frequency information,

is the attitude addition coefficient;

(4) Current frame waving amplitude information

Namely:

wherein,

in order to detect the amplitude of the hand swing,

is the maximum value of the preset waving amplitude,

is a value between 0 and 1, and the larger the hand waving amplitude is, the higher the value is;

(5) Current frame waving frequency information

The method specifically comprises the following steps:

wherein,

in order to detect the frequency of the waving of the hand,

is the maximum value of the preset waving frequency,

is a value between 0 and 1, the higher the waving frequency, the higher the value;

(6) In the end of this process,

judging with a preset threshold value, if the threshold value is larger than the preset threshold value, judging that the current frame is in a hand waving distress state; if the continuous n frames are in the state of waving hands for help, the hands waving for help alarm is carried out。

The invention has the following beneficial effects and advantages:

1. the voice information is introduced, so that the video stream and the audio stream are combined, and the accuracy of SOS identification is improved in an auxiliary way through the audio information;

2. the method introduces facial expression recognition of the face, and assists in judging the distress state when the face is panic or fear;

3. for gesture recognition, when the gesture is considered to be a hand waving state, the gesture standing/sitting/lying information of a person is judged, the arm waving amplitude is obtained at the same time, and when the arm waving amplitude is large, the standing, sitting and lying gesture information of the person is combined to assist in judging a distress state;

4. the hand waving action is periodic frequency action, and the periodicity of the hand waving action is judged through the extension degree evaluation index, so that the hand waving action frequency is obtained, and the judgment of whether a person normally waves a hand or performs a help-seeking action is assisted

5. The invention relates to a hand waving distress fusion strategy: and providing a multi-state fusion strategy according to the information data obtained by the hand waving posture model, the sound event detection model and the expression recognition model, so that the finally obtained hand waving distress judgment result is more robust.

Drawings

FIG. 1 is a system framework diagram of the present invention;

FIG. 2 is a flowchart illustrating the operation of the voice feature extraction module in feature extraction according to the present invention;

FIG. 3 is a schematic diagram of the present invention for building a speech classification model;

FIG. 4 is a circuit diagram of a equalization management module of the present invention;

FIG. 5 is a schematic diagram of the human body key points of the present invention;

wherein, the corresponding relationship of the key point serial numbers in fig. 5 is: 0: nose, 1: neck, 2: right shoulder, 3: right elbow, 4: right wrist, 5: left shoulder, 6: left elbow, 7: left wrist, 8: middle hip, 9: right hip, 10: right knee, 11: right ankle, 12: left hip, 13: left knee, 14: left ankle, 15: right eye, 16: left eye, 17: right ear, 18: left ear, 19: left thumb, 20: left little finger, 21: left heel, 22: right thumb, 23: right little thumb, 24: the right heel.

Detailed Description

As shown in fig. 1, which is a system framework diagram of the present invention, the overall structure of the system of the present invention is divided into two major parts, a front-end camera and a back-end application server.

The existing front-end Camera is connected to a rear-end AppServer through RTSP or API, and a plurality of modules in the AppServer perform artificial intelligence analysis and processing on video streams and audio streams.

The AppServer comprises: the system comprises a feature extraction unit, a calculation module and a waving help-seeking detection unit, wherein the functions of each unit/module are as follows:

the feature extraction unit is used for receiving the audio stream and the video stream sent by the camera, extracting MFCC acoustic features and acquiring a preprocessed face image, respectively sending the face image and the MFCC acoustic features to the hand-waving distress detection unit, and meanwhile, extracting the skeleton posture information of the personnel in the video stream, acquiring the key point information of the skeleton of the personnel and transmitting the key point information to the calculation module;

the computing module is used for detecting hand waving action, hand waving amplitude and sitting, lying or standing postures of the personnel according to the key point information of the personnel skeleton, and sending detection results comprising the extension degree index, the hand waving action state and the sitting, lying or standing postures of the personnel to the hand waving help-seeking detection unit; meanwhile, according to the coordinates of the key points of the human body, obtaining the extension degree index, further obtaining the waving frequency, and sending the waving frequency to a waving help-seeking detection unit;

Wherein, the feature extraction unit includes: the system comprises a sound feature extraction module, a facial feature preprocessing module and a human body posture detection module;

the face feature preprocessing module is used for acquiring images through video stream of a camera, detecting faces of people through a machine learning method, removing background and non-face areas, acquiring face point coordinates, performing face feature normalization processing on the face images according to the face point coordinates, and sending the processed face images to the waving help-seeking detection unit;

A computing module, comprising: the hand waving action detection module and the extension degree index calculation module;

the hand waving motion detection module is used for carrying out normalization preprocessing on coordinates and scales according to the key point information of the human body to obtain key point characteristics, judging whether a person standing, sitting and lying in the posture information has a hand waving motion state or not, and obtaining hand waving amplitude information according to the hand waving motion state; meanwhile, outputting the hand waving gesture confidence coefficient in the hand waving action state of the person corresponding to standing, sitting and lying at the moment according to the gesture detection model; sending posture information of the human body corresponding to standing, sitting and lying, hand waving amplitude information and hand waving posture confidence coefficient to a hand waving distress detection unit;

the extension degree index calculation module is used for acquiring an extension degree index according to the coordinates of the key points of the human body so as to reflect the posture extension degree of the human body and calculating the waving frequency according to the extension degree index; and sending the hand waving frequency to a hand waving distress detection unit.

Wave hand SOS detecting element includes: the system comprises a voice event identification module, a facial expression identification module and a hand waving distress detection module;

the voice event recognition module is used for detecting events by utilizing a deep learning model according to the MFCC characteristics extracted by the characteristic extraction unit to obtain a voice classification result containing classified voice and corresponding confidence information output by the current frame voice;

and the hand waving distress detection module is used for judging whether the person waves hands for distress according to the voice classification result, the facial expression classification result and the posture information of the corresponding station, the sitting posture and the lying posture, the hand waving amplitude information, the hand waving posture confidence coefficient and the hand waving frequency sent by the calculation module by utilizing a comprehensive weight method.

As shown in fig. 1, the workflow method of the present invention is specifically implemented based on data streams transmitted by each module, and the method of the present invention includes the following steps:

3-1) the calculation module performs normalization pretreatment on coordinates and scales according to the key point information of the human body to obtain key point characteristics, judges whether a person standing, sitting and lying in the gesture information has a hand waving action state or not, and obtains hand waving amplitude information according to the hand waving action state; meanwhile, outputting the hand waving gesture confidence coefficients of the person correspondingly standing, sitting and lying at the moment according to the gesture detection model; sending posture information of the human body corresponding to standing, sitting and lying, hand waving amplitude information and hand waving posture confidence coefficient to a hand waving distress detection unit;

4) The hand-waving help-seeking detection unit detects an event by using a deep learning model according to the MFCC acoustic features extracted by the feature extraction unit, and the event comprises classified voices and a voice classification result of confidence information corresponding to the current frame voice output; meanwhile, facial features are extracted through a deep learning model according to the facial images sent by the feature extraction unit, training reasoning is carried out, and a facial expression classification result containing the facial expressions of the classified human body and confidence information output by the corresponding facial expressions of the current frame is obtained;

5) The hand waving distress detection unit judges whether the person is waving for help according to the voice classification result, the facial expression classification result and the posture information of the corresponding station, sitting and lying of the human body, the hand waving amplitude information, the hand waving posture confidence and the hand waving frequency sent by the calculation module by utilizing a comprehensive weight method.

As shown in fig. 2, it is a work flow diagram of feature extraction by the sound feature extraction module of the present invention, wherein the work flow diagram specifically includes the following steps:

carrying out preprocessing based on continuous voice, including the steps of pre-emphasis, framing, windowing and the like;

for the sound information after the preprocessing, the steps of fast fourier transform, mei filter bank, logarithm operation, discrete cosine transform, dynamic feature extraction, and the like are performed, and finally 39-dimensional MFCC features (Mel-scale frequency cepstral coefficients, abbreviated as MFCC) are extracted in this embodiment.

As shown in fig. 3, in order to establish a schematic diagram of a speech classification model, a deep learning model is based on, which includes: but not limited to, convolutional neural networks such as CNN/RCNN/LSTM or recurrent neural networks, to recognize the input speech features.

During model training, input MFCC features are normalized, label texts representing real expressions are read to generate label vectors, the features and the labels are bound, preprocessing is carried out, and then the label vectors are sent to a CNN/RCNN/LSTM model to learn parameters, so that a reliable speech classification model is obtained.

As shown in fig. 4, in the model prediction, the 39-dimensional MFCC features are normalized, then preprocessed, and then sent to the speech classification model for prediction, and the obtained data is post-processed to obtain a speech classification result including classified speech and corresponding confidence information of current frame sound output;

as shown in fig. 1, the facial feature preprocessing module in the present system acquires images through a video stream, performs face detection through a machine learning manner, and then removes background and non-face regions.

Face feature normalization: and carrying out brightness normalization on the contents of the face image, so that the distribution of face brightness pixels is unified as much as possible, and meanwhile, the face contrast is enhanced.

The facial expression recognition module extracts features through a deep learning model, including a Convolutional Neural Network (CNN), a Deep Belief Network (DBN), a deep automatic encoder (DAN), and a Recurrent Neural Network (RNN).

After the feature extraction is completed, the facial expressions are classified. The facial expression recognition can be performed in a deep learning manner to form an end-to-end model, and a traditional machine learning method, such as an SVM (support vector machine) and other classifiers used in the embodiment, can be used for classification to obtain whether the human face has the expressions of panic and panic.

As shown in fig. 1 and 5, the human body posture detection module trains and predicts the human body posture key points through a deep learning human body posture network model structure, specifically, as the human body key point schematic diagram in fig. 5.

In the method for identifying the hand waving for help, step 3-1) specifically comprises the following steps:

3-1-1) obtaining the standing, sitting or lying posture information of the person at the moment and the current frame hand-waving posture confidence coefficient corresponding to the standing, sitting and lying posture through a posture detection model according to the key point information (as shown in figure 5) of the human body by a hand-waving motion detection module; the gesture detection model is a CNN model; whether the person who learns gesture information has the state of waving his hand to give the confidence that people at this moment stood, sat, lain, when sitting and lain the state, all be better additional work to the judgement whether output is SOS.

3-1-2) acquiring included angle data between the joints according to the key point information of the human body, and acquiring waving range information of the human body during waving according to included angle data between a forearm and a forearm (i.e. included angles formed between

key points

2, 3 and 4 in fig. 5) and included angle data between the forearm and a shoulder (i.e. included angles formed between key points 1-4 and key points 1 and 5-7) in the included angle data between the joints;

3-1-3) the extension degree index calculation module sends posture information (similarly, judged according to the key point information identified in the attached figure 5) of standing, sitting and lying corresponding to the human body, hand waving amplitude information and hand waving posture confidence coefficient to the hand waving distress detection unit.

In the step 3-2), the calculation module acquires an extension index delta according to the human body key point information _HPS The method specifically comprises the following steps:

introduction of delta _HPS And the index is used for calculating the extension degree of the human body posture so as to assist in judging the waving frequency of the waving hand detection. Delta _HPS Is based on the sum of the distribution variances of the key points of the human posture on the main axis of the key points.

3-2-1) as shown in fig. 4, the extension degree index calculation module constructs a matrix structure for the human body key point information of the given posture as follows:

X＝[x ₁ ,…,x _n ]∈R ^D×n

wherein D is the dimension of the key points, n is the number of the key points, x _n The coordinates of the nth key point are shown, and R is a real number set;

3-2-2) reacting

Defining the mean value of all the key point coordinates in X

The variance sum delta of the principal axes of the key points of the human body posture _HPS Defined as Eq1:

eq1 is only delta _HPS A form of the expression, which can also be written as Eq2:

where U is the projection matrix, x _i Is the ith keypoint coordinate, D is a positive number less than the keypoint dimension D, in this embodiment, D is 2, D is 1, λ _j Is a covariance matrix

The jth eigenvalue, tr, represents a trace of the matrix;

thus, δ _HPS Is the sum of the variances of the principal axes of the key points of the human posture and is used for reflecting the posture extension degree of the human.

3-2-3) an extension degree index calculation module according to the extension degree index delta _HPS Calculating the hand waving frequency specifically as follows:

will delta _HPS The indicator is applied to the waving action. Since the waving motion is also a periodic motion, δ is a motion in which the hand is waved to both sides of the human body during the motion _HPS The value is larger, delta when the hand swings to the vertex and is in line with the trunk of the human body _HPS Small value, by counting delta _HPS The law is changed on the time axis periodically, so that the hand waving frequency during hand waving action can be obtained, and the obtained hand waving frequency information is sent to the hand waving distress detection unit.

δ _HPS Has the advantages that: the index is only related to the human body action, is not related to the information such as the visual angle, the scale and the like of the camera, andthe method is not influenced by posture interference, and can effectively reflect the periodic data of the hand waving frequency of the person, so that whether the person wave the hand normally or wave the hand violently for help can be judged.

The step 5) of the invention specifically comprises the following steps:

(1) The hand waving distress detection unit obtains a global strategy of hand waving distress according to a voice classification result containing classified voice and corresponding confidence information of current frame voice output, a facial expression classification result containing classified human facial expressions and corresponding confidence information of current frame facial expression output, posture information of sitting and lying, hand waving amplitude information, hand waving posture confidence and hand waving frequency sent by a calculation module, and by utilizing a comprehensive weight method:

wherein,

the confidence of the hand waving distress at the current time t,

for the confidence level of the hand waving distress gesture at the current time t,

a confidence is detected for the sound event at the current time t,

the confidence coefficient of the facial expression at the current moment t is obtained; w is a _pose ，w _sound ，w _face The weight of the hand waving gesture, the weight of the sound event and the weight of the facial expression are respectively;

wherein:

w _pose +w _sound +w _face ＝1

w _pose ,w _sound ,w _face for a preset assignment, w _pose ＞w _sound ,w _pose ＞w _face (ii) a In this embodiment, w is a hand waving for help _pose The values may be higher, let:

w _pose ＝0.8,w _sound ＝0.1,w _face ＝0.1

the above is merely an example, and other values may be selected according to the situation in practical application.

In order to avoid the problem of false detection of single-frame detection of the model,

obtaining by using an exponential weighted moving average, namely the step (2) is as follows:

(2) Obtaining by means of exponentially weighted moving average

Comprehensively considering the current data and the historical data, and giving higher weight to the current data, namely:

wherein, beta represents a weighting coefficient,

for the confidence of the current frame hand waving distress posture,

confidence information output for the current frame sound detection,

confidence information output for the current frame facial expression;

at the initial moment, there is a deviation, and therefore a deviation correction is introduced, the formula is as follows:

(3) For the confidence of the current frame waving hand for asking for help

Obtaining the current frame hand waving attitude confidence coefficient, hand waving amplitude information, hand waving frequency information and attitude addition information comprehensively, namely:

wherein,

for the current frame hand-waving gesture confidence level,

for the current frame of hand-waving amplitude information,

for the current frame of hand waving frequency information,

is the attitude addition coefficient;

(4) Current frame waving amplitude information

Namely:

wherein,

in order to detect the amplitude of the hand swing,

is the maximum value of the preset waving range,

(5) Current frame waving frequency information

The method comprises the following specific steps:

wherein,

in order to detect the frequency of the waving of the hand,

is the maximum value of the preset waving frequency,

is a value between 0 and 1, the higher the waving frequency is, the higher the value is;

in practical application, gamma conversion can be performed according to the situation, nonlinear compensation is performed on the hand waving amplitude information and the hand waving frequency information, and the final weight values of the hand waving amplitude information and the hand waving frequency information in hand waving posture judgment are adjusted as follows:

gamma is an empirical value and is typically taken to be slightly less than 1.

For the attitude addition coefficient, it is calculated as follows:

if the gesture detection model detects that the person is in a sitting/lying gesture, the expected probability of asking for help that the person swings hands under the sitting/lying gesture is considered to be larger, the expected probability can be set as an empirical value of 0.1, and the addition value can be increased or decreased:

otherwise, if the person's posture is standing:

(6) Finally, V _t ^(sos) Judging with a preset threshold value, if the threshold value is larger than the preset threshold value, determining that the current frame is in a hand waving distress state; and if the continuous n frames are in the hand-waving distress state, carrying out hand-waving distress alarm.

The above description is only an embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, extension, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims

1. A hand-waving distress identification system is characterized by comprising: the system comprises a feature extraction unit, a calculation module and a hand waving distress detection unit;

the calculation module is used for detecting hand waving actions, hand waving amplitude and sitting, lying or standing postures of personnel according to the key point information of the personnel skeleton, and sending detection results including the extension degree index, the hand waving action state and the sitting, lying or standing postures of the personnel to the hand waving help-seeking detection unit; meanwhile, according to the coordinates of the key points of the human body, obtaining the extension degree index, further obtaining the waving frequency, and sending the waving frequency to a waving help-seeking detection unit;

2. The system as claimed in claim 1, wherein the feature extraction unit comprises: the system comprises a sound feature extraction module, a facial feature preprocessing module and a human body posture detection module;

3. The system as claimed in claim 1, wherein the computing module comprises: the hand waving action detection module and the extension degree index calculation module;

4. The system as claimed in claim 1, wherein the hand-waving distress detection unit comprises: the system comprises a voice event recognition module, a facial expression recognition module and a hand waving distress detection module;

5. The identification method of a hand-waving distress identification system as claimed in claim 1, characterized by comprising the following steps:

3-1) the calculation module performs normalization preprocessing on coordinates and scales according to the key point information of the human body to obtain key point characteristics, judges whether a person standing, sitting and lying in the posture information has a hand waving action state or not, and obtains hand waving amplitude information according to the hand waving action state; meanwhile, outputting the confidence coefficients of the hand waving postures of standing, sitting and lying corresponding to the person at the moment according to the posture detection model; sending posture information of the human body corresponding to standing, sitting and lying, hand waving amplitude information and hand waving posture confidence coefficient to a hand waving distress detection unit;

6. The recognition method of a hand-waving distress recognition system according to claim 5, wherein in the step 2-1), after the continuous speech is processed, MFCC acoustic features are obtained, specifically:

7. The identification method of the hand-waving distress identification system according to claim 5, wherein the step 3-1) is specifically as follows:

X＝[x ₁ ,…,x _n ]∈R ^D×n

3-2-2) reacting

Defining the mean value of all key point coordinates in X

wherein U is a projection matrix,x _i is the ith keypoint coordinate, D is a positive number less than the keypoint dimension D, λ _j Is a covariance matrix

The jth eigenvalue, tr, represents a trace of the matrix;

δ _HPS the sum of the variances of the principal axes of the key points of the human posture is used for reflecting the posture extension degree of the human body;

when the hand swings to two sides of the human body, delta _HPS The value is maximum; when the hand swings to the vertex and is in line with the trunk of the human body, delta _HPS Minimum value by counting delta _HPS Changing the rule on a time axis, namely obtaining hand waving frequency information during hand waving action; and sending the obtained hand waving frequency information to a hand waving distress detection unit.

9. The identification method of the hand-waving distress identification system according to claim 5, wherein the step 4) is specifically as follows:

4-1) for sound event classification, specifically:

during model prediction, input MFCC characteristics are normalized, then preprocessing is carried out, then the input MFCC characteristics are sent into a voice classification model for prediction, and the obtained data are subjected to post-processing to obtain a voice classification result containing classified voice and corresponding confidence information of current frame voice output;

4-2) classifying facial expressions, specifically:

extracting facial features from the face image through a deep learning model;

10. The identification method of the hand-waving distress identification system according to claim 5, wherein the step 5) is specifically as follows:

V _t ^(sos) ＝w _pose V _t ^(pose) +w _sound V _t ^(sound) +w _face V _t ^(face)

wherein, V _t ^(sos) Is the confidence of the help-seeking when waving hand at the current time t, V _t ^(pose) Is the confidence coefficient of the hand waving distress gesture at the current moment t, V _t ^(sound) For the confidence of the detection of the sound event at the current time t, V _t ^(face) The confidence coefficient of the facial expression at the current moment t is obtained; w is a _pose ，w _sound ，w _face Respectively weighting the hand-waving gesture, the sound event and the facial expression;

wherein:

w _pose +w _sound +w _face ＝1

w _pose ，w _sound ，w _face for a preset assignment, w _pose ＞w _sound ，w _pose ＞w _face ；

(2) Obtaining V by means of exponentially weighted moving average _t ^(pose) ，V _t ^(sound) ，V _t ^(face) Namely:

wherein, beta represents a weighting coefficient,

for the confidence of the current frame hand waving distress posture,

confidence information output for the current frame sound detection,

confidence information output for the current frame facial expression;

(3) For the confidence of the current frame waving hand for asking for help

wherein,

for the current frame hand-waving gesture confidence level,

for the current frame of hand-waving amplitude information,

for the current frame of hand waving frequency information,

is the attitude addition coefficient;

(4) Current frame waving amplitude information

Namely:

wherein,

in order to detect the amplitude of the hand swing,

is the maximum value of the preset waving range,

is a numerical value between 0 and 1, and the larger the waving amplitude is, the higher the value is;

(5) Current frame waving frequency information

The method specifically comprises the following steps:

wherein,

in order to detect the frequency of the hand waving,

is the maximum value of the preset waving frequency,

(6) Finally, V _t ^(sos) Judging with a preset threshold value, if the threshold value is larger than the preset threshold value, judging that the current frame is in a hand waving distress state; and if the continuous n frames are in the hand-waving distress state, carrying out hand-waving distress alarm.