CN115641610A - Hand-waving help-seeking identification system and method - Google Patents
Hand-waving help-seeking identification system and method Download PDFInfo
- Publication number
- CN115641610A CN115641610A CN202211259423.6A CN202211259423A CN115641610A CN 115641610 A CN115641610 A CN 115641610A CN 202211259423 A CN202211259423 A CN 202211259423A CN 115641610 A CN115641610 A CN 115641610A
- Authority
- CN
- China
- Prior art keywords
- waving
- hand
- information
- distress
- posture
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 238000001514 detection method Methods 0.000 claims abstract description 120
- 230000036544 posture Effects 0.000 claims abstract description 116
- 230000009429 distress Effects 0.000 claims abstract description 89
- 238000004364 calculation method Methods 0.000 claims abstract description 52
- 238000000605 extraction Methods 0.000 claims abstract description 46
- 230000009471 action Effects 0.000 claims abstract description 30
- 230000008921 facial expression Effects 0.000 claims description 62
- 230000001815 facial effect Effects 0.000 claims description 27
- 238000007781 pre-processing Methods 0.000 claims description 24
- 238000013136 deep learning model Methods 0.000 claims description 23
- 238000013527 convolutional neural network Methods 0.000 claims description 14
- 238000010606 normalization Methods 0.000 claims description 14
- 238000012545 processing Methods 0.000 claims description 14
- 239000011159 matrix material Substances 0.000 claims description 12
- 238000013528 artificial neural network Methods 0.000 claims description 10
- 238000012549 training Methods 0.000 claims description 9
- 238000013145 classification model Methods 0.000 claims description 8
- 238000010801 machine learning Methods 0.000 claims description 8
- 239000013598 vector Substances 0.000 claims description 8
- 230000014509 gene expression Effects 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 4
- 230000008569 process Effects 0.000 claims description 4
- 230000000306 recurrent effect Effects 0.000 claims description 4
- 238000009432 framing Methods 0.000 claims description 3
- 238000012805 post-processing Methods 0.000 claims description 2
- 238000013473 artificial intelligence Methods 0.000 abstract description 4
- 230000004927 fusion Effects 0.000 abstract description 4
- 238000010586 diagram Methods 0.000 description 9
- 210000004247 hand Anatomy 0.000 description 6
- 238000013135 deep learning Methods 0.000 description 3
- 210000000245 forearm Anatomy 0.000 description 3
- 230000000737 periodic effect Effects 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 210000003423 ankle Anatomy 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 210000003127 knee Anatomy 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 210000000707 wrist Anatomy 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 210000004934 left little finger Anatomy 0.000 description 1
- 210000004936 left thumb Anatomy 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000004935 right thumb Anatomy 0.000 description 1
- 210000003813 thumb Anatomy 0.000 description 1
Images
Landscapes
- Image Analysis (AREA)
Abstract
The invention belongs to the field of artificial intelligence identification, in particular to a system and a method for identifying a waving hand for help, which comprises the following steps: the system comprises a feature extraction unit, a calculation module and a hand waving distress detection unit; the feature extraction unit is used for extracting acoustic features, acquiring the preprocessed face images, respectively sending the face images to the hand waving distress detection unit, acquiring the skeleton key point information of the personnel and transmitting the skeleton key point information to the calculation module; the computing module is used for detecting hand waving action, hand waving amplitude and sitting, lying or standing postures of personnel, acquiring hand waving frequency, and sending detection results comprising the extension degree index, the hand waving action state and the sitting, lying or standing postures of the personnel and the hand waving frequency to the hand waving distress detection unit; the hand-waving distress detection unit judges whether a person is waving a hand for distress by adopting a comprehensive weight method. The invention relates to a hand waving distress fusion strategy: a multi-state fusion strategy is provided, so that the finally obtained result of the swing distress judgment is more robust.
Description
Technical Field
The invention belongs to the field of artificial intelligence identification, and particularly relates to a hand-waving distress identification system and method.
Background
With the development of deep learning and machine vision, the application based on artificial intelligence is gradually mature and widely applied. The existing hand-waving distress recognition system is mainly based on visual images, acquires gesture key points, establishes a distress recognition model by using a deep learning model or a manual logic rule, judges whether to wave a hand, detects hand waving frequency and recognizes distress behaviors.
The method depends on the identification of key points, and due to factors such as shielding, visual angle, illumination, scale, motion blur and the like, the identification of key points of human body postures is interfered, so that the identification result is unstable and inaccurate, and the requirement of high accuracy of a real use scene cannot be met.
Disclosure of Invention
The invention aims to provide a stable and high-reliability hand-waving help-seeking identification method, and provides a plurality of strategies to improve and improve the help-seeking identification effect aiming at the problems.
The technical scheme adopted by the invention for realizing the purpose is as follows: a hand-waving distress recognition system comprising: the system comprises a feature extraction unit, a calculation module and a waving help-seeking detection unit;
the feature extraction unit is used for receiving the audio stream and the video stream sent by the camera, extracting MFCC acoustic features and acquiring a preprocessed face image, respectively sending the face image and the acoustic features to the hand-waving distress detection unit, and meanwhile, extracting the posture information of the human skeleton in the video stream, acquiring the key point information of the human skeleton, and transmitting the key point information to the calculation module;
the computing module is used for detecting hand waving action, hand waving amplitude and sitting, lying or standing postures of the personnel according to the key point information of the personnel skeleton, and sending detection results comprising the extension degree index, the hand waving action state and the sitting, lying or standing postures of the personnel to the hand waving help-seeking detection unit; meanwhile, according to the coordinates of key points of the human body, obtaining an extension degree index, further obtaining a waving frequency, and sending the waving frequency to a waving distress detection unit;
and the hand-waving distress detection unit is used for processing the MFCC acoustic features extracted by the feature extraction unit, the sent face image and the detection result sent by the calculation module, and judging whether the person is waving for distress by adopting a comprehensive weight method.
The feature extraction unit includes: the system comprises a sound feature extraction module, a facial feature preprocessing module and a human body posture detection module;
the voice feature extraction module is used for receiving the audio stream sent by the camera, processing continuous voice in the audio stream, acquiring N-dimensional MFCC features and sending the N-dimensional MFCC features to the waving help-seeking detection unit;
the facial feature preprocessing module is used for acquiring images through video streaming of a camera, detecting faces of people through a machine learning method, removing background and non-face areas, acquiring face point location coordinates, performing face feature normalization processing on the face images according to the face point location coordinates, and sending the processed face images to the swing help-seeking detection unit;
and the human body posture detection module is used for receiving the video stream sent by the camera, extracting the personnel skeleton posture information through the deep neural network, acquiring the personnel skeleton key point information and transmitting the personnel skeleton key point information to the calculation module.
The calculation module comprises: the hand waving action detection module and the extension degree index calculation module;
the hand waving motion detection module is used for carrying out normalization preprocessing on coordinates and scales according to the key point information of the human body to obtain key point characteristics, judging whether a person standing, sitting and lying in the posture information has a hand waving motion state or not, and obtaining hand waving amplitude information according to the hand waving motion state; meanwhile, outputting the hand-waving gesture confidence coefficient under the hand-waving action states of standing, sitting and lying corresponding to the person according to the gesture detection model; sending posture information of the human body corresponding to standing, sitting and lying, hand waving amplitude information and hand waving posture confidence coefficient to a hand waving distress detection unit;
the extension degree index calculation module is used for acquiring an extension degree index according to the coordinates of the key points of the human body so as to reflect the posture extension degree of the human body and calculating the hand waving frequency according to the extension degree index; and sending the hand waving frequency to a hand waving distress detection unit.
The hand waving distress detection unit comprises: the system comprises a voice event recognition module, a facial expression recognition module and a hand waving distress detection module;
the voice event recognition module is used for carrying out event detection by utilizing a deep learning model according to the MFCC characteristics extracted by the characteristic extraction unit to obtain a voice classification result containing classified voice and corresponding confidence information output by the current frame voice;
the facial expression recognition module is used for extracting facial features through a deep learning model according to the facial images sent by the feature extraction unit, carrying out training reasoning and obtaining a facial expression classification result containing the facial expressions of the classified human bodies and the corresponding confidence information output by the facial expressions of the current frame;
and the hand waving distress detection module is used for judging whether the person waves hands for help according to the voice classification result, the facial expression classification result and the posture information of the corresponding station, the sitting posture and the lying posture, the hand waving amplitude information, the hand waving posture confidence coefficient and the hand waving frequency sent by the calculation module by utilizing a comprehensive weight method.
A recognition method of a hand-waving distress recognition system comprises the following steps:
1) Sending the audio stream sent by the camera to a sound feature extraction module, and sending the video stream to a facial feature preprocessing module and a human body posture detection module respectively;
2-1) the sound feature extraction module receives the audio stream sent by the camera, processes the continuous voice, obtains MFCC acoustic features and sends the MFCC acoustic features to the waving help-seeking detection unit;
2-2) the facial feature preprocessing module acquires images through video stream of a camera, performs face detection on personnel through a machine learning method, removes background and non-face areas, acquires face point coordinates, performs face feature normalization processing on the face images according to the face point coordinates, and sends the processed face images to a waving help detection unit;
2-3) the human body posture detection module receives the video stream sent by the camera, extracts the personnel skeleton posture information through the deep neural network, acquires the personnel skeleton key point information and transmits the personnel skeleton key point information to the calculation module;
3-1) the calculation module performs normalization pretreatment on coordinates and scales according to the key point information of the human body to obtain key point characteristics, judges whether a person standing, sitting and lying in the gesture information has a hand waving action state or not, and obtains hand waving amplitude information according to the hand waving action state; meanwhile, outputting the confidence coefficients of the hand waving postures of standing, sitting and lying corresponding to the person at the moment according to the posture detection model; sending gesture information of a person corresponding to standing, sitting and lying, hand waving amplitude information and hand waving gesture confidence to a hand waving distress detection unit;
3-2) the calculation module obtains the extension degree index delta according to the key point information of the human body HPS And according to the elongation index delta HPS Calculating the hand waving frequency, and sending the hand waving frequency to a hand waving distress detection unit;
4) The hand-waving help-seeking detection unit detects an event by using a deep learning model according to the MFCC acoustic features extracted by the feature extraction unit, and the event comprises classified voices and a voice classification result of confidence information corresponding to the current frame voice output; meanwhile, facial features are extracted through a deep learning model according to the facial images sent by the feature extraction unit, training reasoning is carried out, and a facial expression classification result comprising the classified facial expressions of the human body and confidence information output by the corresponding facial expressions of the current frame is obtained;
5) The hand waving distress detection unit judges whether the person is waving for distress according to the voice classification result, the facial expression classification result, and the posture information of the corresponding station, sitting and lying of the human body, the hand waving amplitude information, the hand waving posture confidence coefficient and the hand waving frequency sent by the calculation module by utilizing a comprehensive weight method.
In step 2-1), after processing the continuous speech, obtaining MFCC acoustic characteristics, specifically:
carrying out pre-emphasis, framing and windowing operations on continuous voice of an audio stream in sequence to obtain pre-processed voice information;
and sequentially performing fast Fourier transform, mei filter bank, logarithm operation, discrete cosine transform and dynamic feature extraction on the preprocessed sound information to finally obtain the N-dimensional MFCC acoustic features.
The step 3-1) specifically comprises the following steps:
3-1-1) obtaining the posture information of standing, sitting or lying of the person at the moment and the confidence coefficient of the hand waving posture of the current frame corresponding to the standing, sitting and lying by the calculation module through a posture detection model according to the key point information of the human body; the attitude detection model is a CNN model;
3-1-2) simultaneously acquiring included angle data among the joints according to key point information of the human body, and acquiring hand waving amplitude information of the human body during hand waving according to included angle data between a small arm and a large arm and included angle data between the large arm and a shoulder in the included angle data among the joints;
3-1-3) the calculation module sends the posture information of the corresponding station, sitting and lying of the human body, the hand waving amplitude information and the hand waving posture confidence coefficient to the hand waving distress detection unit.
8. The identification method of the hand-waving distress identification system according to claim 5, wherein in the step 3-2), the calculation module obtains the extension degree index delta according to the key point information of the human body HPS The method specifically comprises the following steps:
3-2-1) the calculation module constructs a matrix structure for the human body key point information of the given posture as follows:
X=[x 1 ,…,x n ]∈R D×n
wherein D is the dimension of the key points, n is the number of the key points, and x n The coordinates of the nth key point are shown, and R is a real number set;
3-2-2) orderDefining the mean value of all key point coordinates in XThe variance sum delta of the principal axes of the key points of the human body posture HPS Is defined as follows:
where U is the projection matrix, x i Is the ith keypoint coordinate, D is a positive number less than the keypoint dimension D, λ j Is a covariance matrixThe jth eigenvalue, tr, represents a trace of the matrix;
3-2-3) according to the elongation index delta HPS Calculating the hand waving frequency specifically as follows:
δ HPS the sum of the variances of the principal axes of the key points of the human body posture, namely the posture extension degree of the reaction person;
when the hand swings to two sides of the human body, delta HPS The value is maximum; when the hand swings to the vertex and is in line with the trunk of the human body, delta HPS Value minimum, by counting delta HPS Changing the rule of the period on a time axis, namely obtaining hand waving frequency information during hand waving action; and sending the obtained hand waving frequency information to a hand waving distress detection unit.
The step 4) is specifically as follows:
4-1) for sound event classification, specifically:
during model training, input MFCC features are normalized, label texts for representing corresponding expression names are read to generate label vectors, the MFCC features and the label vectors are bound, then preprocessing is carried out, and then the MFCC features and the label vectors are sent to a deep learning model to learn parameters to obtain a voice classification model;
wherein, the deep learning model is any one of a CNN model, an RCNN model and an LSTM model;
during model prediction, input MFCC features are normalized, then preprocessing is carried out, then the input MFCC features are sent into a voice classification model for prediction, and the obtained data are subjected to post-processing to obtain a voice classification result containing classified voice and corresponding confidence information of current frame voice output;
4-2) classifying facial expressions, specifically:
extracting facial features from the face image through a deep learning model;
the deep learning model is any one of a convolutional neural network, a deep confidence network, a deep automatic encoder and a recurrent neural network;
after the facial features are extracted, the facial expressions are classified through a convolutional neural network, so that a facial expression classification result containing the classified facial expressions of the human body and corresponding confidence coefficient information output by the current frame facial expressions is obtained.
The step 5) is specifically as follows:
(1) The hand-waving distress detection unit obtains a global strategy of waving hands for distress by utilizing a comprehensive weight method according to a voice classification result containing classified voices and corresponding confidence information of current frame voice output, a facial expression classification result containing classified facial expressions of human bodies and corresponding confidence information of current frame facial expression output, human body corresponding station, sitting posture information, lying posture information, waving amplitude information, waving posture confidence and waving frequency sent by a calculation module:
wherein,the confidence of the hand waving distress at the current time t,the confidence of the hand waving distress gesture at the current time t,for the confidence of the sound event detection at the current time t,the confidence coefficient of the facial expression at the current moment t is obtained; w is a pose ,w sound ,w face Respectively weighting the hand-waving gesture, the sound event and the facial expression;
wherein:
w pose +w sound +w face =1
w pose ,w sound ,w face for a preset assignment, w pose >w sound ,w pose >w face ;
wherein, beta represents a weighting coefficient,for the confidence of the current frame hand waving distress posture,confidence information output for the current frame sound detection,confidence information output for the current frame facial expression;
(3) For the confidence of the current frame waving hand for asking for helpThe current frame wave hand gesture confidence, wave hand amplitude information, wave hand frequency information and gesture addition information are comprehensively obtained, namely:
wherein,for the current frame hand-waving gesture confidence level,for the current frame of hand-waving amplitude information,for the current frame of hand waving frequency information,is the attitude addition coefficient;
wherein,in order to detect the amplitude of the hand swing,is the maximum value of the preset waving amplitude,is a value between 0 and 1, and the larger the hand waving amplitude is, the higher the value is;
(5) Current frame waving frequency informationThe method specifically comprises the following steps:
wherein,in order to detect the frequency of the waving of the hand,is the maximum value of the preset waving frequency,is a value between 0 and 1, the higher the waving frequency, the higher the value;
(6) In the end of this process,judging with a preset threshold value, if the threshold value is larger than the preset threshold value, judging that the current frame is in a hand waving distress state; if the continuous n frames are in the state of waving hands for help, the hands waving for help alarm is carried out。
The invention has the following beneficial effects and advantages:
1. the voice information is introduced, so that the video stream and the audio stream are combined, and the accuracy of SOS identification is improved in an auxiliary way through the audio information;
2. the method introduces facial expression recognition of the face, and assists in judging the distress state when the face is panic or fear;
3. for gesture recognition, when the gesture is considered to be a hand waving state, the gesture standing/sitting/lying information of a person is judged, the arm waving amplitude is obtained at the same time, and when the arm waving amplitude is large, the standing, sitting and lying gesture information of the person is combined to assist in judging a distress state;
4. the hand waving action is periodic frequency action, and the periodicity of the hand waving action is judged through the extension degree evaluation index, so that the hand waving action frequency is obtained, and the judgment of whether a person normally waves a hand or performs a help-seeking action is assisted
5. The invention relates to a hand waving distress fusion strategy: and providing a multi-state fusion strategy according to the information data obtained by the hand waving posture model, the sound event detection model and the expression recognition model, so that the finally obtained hand waving distress judgment result is more robust.
Drawings
FIG. 1 is a system framework diagram of the present invention;
FIG. 2 is a flowchart illustrating the operation of the voice feature extraction module in feature extraction according to the present invention;
FIG. 3 is a schematic diagram of the present invention for building a speech classification model;
FIG. 4 is a circuit diagram of a equalization management module of the present invention;
FIG. 5 is a schematic diagram of the human body key points of the present invention;
wherein, the corresponding relationship of the key point serial numbers in fig. 5 is: 0: nose, 1: neck, 2: right shoulder, 3: right elbow, 4: right wrist, 5: left shoulder, 6: left elbow, 7: left wrist, 8: middle hip, 9: right hip, 10: right knee, 11: right ankle, 12: left hip, 13: left knee, 14: left ankle, 15: right eye, 16: left eye, 17: right ear, 18: left ear, 19: left thumb, 20: left little finger, 21: left heel, 22: right thumb, 23: right little thumb, 24: the right heel.
Detailed Description
As shown in fig. 1, which is a system framework diagram of the present invention, the overall structure of the system of the present invention is divided into two major parts, a front-end camera and a back-end application server.
The existing front-end Camera is connected to a rear-end AppServer through RTSP or API, and a plurality of modules in the AppServer perform artificial intelligence analysis and processing on video streams and audio streams.
The AppServer comprises: the system comprises a feature extraction unit, a calculation module and a waving help-seeking detection unit, wherein the functions of each unit/module are as follows:
the feature extraction unit is used for receiving the audio stream and the video stream sent by the camera, extracting MFCC acoustic features and acquiring a preprocessed face image, respectively sending the face image and the MFCC acoustic features to the hand-waving distress detection unit, and meanwhile, extracting the skeleton posture information of the personnel in the video stream, acquiring the key point information of the skeleton of the personnel and transmitting the key point information to the calculation module;
the computing module is used for detecting hand waving action, hand waving amplitude and sitting, lying or standing postures of the personnel according to the key point information of the personnel skeleton, and sending detection results comprising the extension degree index, the hand waving action state and the sitting, lying or standing postures of the personnel to the hand waving help-seeking detection unit; meanwhile, according to the coordinates of the key points of the human body, obtaining the extension degree index, further obtaining the waving frequency, and sending the waving frequency to a waving help-seeking detection unit;
and the hand-waving distress detection unit is used for processing the MFCC acoustic features extracted by the feature extraction unit, the sent face image and the detection result sent by the calculation module, and judging whether the person is waving for distress by adopting a comprehensive weight method.
Wherein, the feature extraction unit includes: the system comprises a sound feature extraction module, a facial feature preprocessing module and a human body posture detection module;
the voice feature extraction module is used for receiving the audio stream sent by the camera, processing continuous voice in the audio stream, acquiring N-dimensional MFCC features and sending the N-dimensional MFCC features to the waving help-seeking detection unit;
the face feature preprocessing module is used for acquiring images through video stream of a camera, detecting faces of people through a machine learning method, removing background and non-face areas, acquiring face point coordinates, performing face feature normalization processing on the face images according to the face point coordinates, and sending the processed face images to the waving help-seeking detection unit;
and the human body posture detection module is used for receiving the video stream sent by the camera, extracting the personnel skeleton posture information through the deep neural network, acquiring the personnel skeleton key point information and transmitting the personnel skeleton key point information to the calculation module.
A computing module, comprising: the hand waving action detection module and the extension degree index calculation module;
the hand waving motion detection module is used for carrying out normalization preprocessing on coordinates and scales according to the key point information of the human body to obtain key point characteristics, judging whether a person standing, sitting and lying in the posture information has a hand waving motion state or not, and obtaining hand waving amplitude information according to the hand waving motion state; meanwhile, outputting the hand waving gesture confidence coefficient in the hand waving action state of the person corresponding to standing, sitting and lying at the moment according to the gesture detection model; sending posture information of the human body corresponding to standing, sitting and lying, hand waving amplitude information and hand waving posture confidence coefficient to a hand waving distress detection unit;
the extension degree index calculation module is used for acquiring an extension degree index according to the coordinates of the key points of the human body so as to reflect the posture extension degree of the human body and calculating the waving frequency according to the extension degree index; and sending the hand waving frequency to a hand waving distress detection unit.
Wave hand SOS detecting element includes: the system comprises a voice event identification module, a facial expression identification module and a hand waving distress detection module;
the voice event recognition module is used for detecting events by utilizing a deep learning model according to the MFCC characteristics extracted by the characteristic extraction unit to obtain a voice classification result containing classified voice and corresponding confidence information output by the current frame voice;
the facial expression recognition module is used for extracting facial features through a deep learning model according to the facial images sent by the feature extraction unit, carrying out training reasoning and obtaining a facial expression classification result containing the facial expressions of the classified human bodies and the corresponding confidence information output by the facial expressions of the current frame;
and the hand waving distress detection module is used for judging whether the person waves hands for distress according to the voice classification result, the facial expression classification result and the posture information of the corresponding station, the sitting posture and the lying posture, the hand waving amplitude information, the hand waving posture confidence coefficient and the hand waving frequency sent by the calculation module by utilizing a comprehensive weight method.
As shown in fig. 1, the workflow method of the present invention is specifically implemented based on data streams transmitted by each module, and the method of the present invention includes the following steps:
1) Sending the audio stream sent by the camera to a sound feature extraction module, and sending the video stream to a facial feature preprocessing module and a human body posture detection module respectively;
2-1) the sound feature extraction module receives the audio stream sent by the camera, processes the continuous voice, obtains MFCC acoustic features and sends the MFCC acoustic features to the waving help-seeking detection unit;
2-2) the facial feature preprocessing module acquires images through video stream of a camera, performs face detection on personnel through a machine learning method, removes background and non-face areas, acquires face point coordinates, performs face feature normalization processing on the face images according to the face point coordinates, and sends the processed face images to a waving help detection unit;
2-3) the human body posture detection module receives the video stream sent by the camera, extracts the personnel skeleton posture information through the deep neural network, acquires the personnel skeleton key point information and transmits the personnel skeleton key point information to the calculation module;
3-1) the calculation module performs normalization pretreatment on coordinates and scales according to the key point information of the human body to obtain key point characteristics, judges whether a person standing, sitting and lying in the gesture information has a hand waving action state or not, and obtains hand waving amplitude information according to the hand waving action state; meanwhile, outputting the hand waving gesture confidence coefficients of the person correspondingly standing, sitting and lying at the moment according to the gesture detection model; sending posture information of the human body corresponding to standing, sitting and lying, hand waving amplitude information and hand waving posture confidence coefficient to a hand waving distress detection unit;
3-2) the calculation module obtains the extension degree index delta according to the key point information of the human body HPS And according to the elongation index delta HPS Calculating the hand waving frequency, and sending the hand waving frequency to a hand waving distress detection unit;
4) The hand-waving help-seeking detection unit detects an event by using a deep learning model according to the MFCC acoustic features extracted by the feature extraction unit, and the event comprises classified voices and a voice classification result of confidence information corresponding to the current frame voice output; meanwhile, facial features are extracted through a deep learning model according to the facial images sent by the feature extraction unit, training reasoning is carried out, and a facial expression classification result containing the facial expressions of the classified human body and confidence information output by the corresponding facial expressions of the current frame is obtained;
5) The hand waving distress detection unit judges whether the person is waving for help according to the voice classification result, the facial expression classification result and the posture information of the corresponding station, sitting and lying of the human body, the hand waving amplitude information, the hand waving posture confidence and the hand waving frequency sent by the calculation module by utilizing a comprehensive weight method.
As shown in fig. 2, it is a work flow diagram of feature extraction by the sound feature extraction module of the present invention, wherein the work flow diagram specifically includes the following steps:
carrying out preprocessing based on continuous voice, including the steps of pre-emphasis, framing, windowing and the like;
for the sound information after the preprocessing, the steps of fast fourier transform, mei filter bank, logarithm operation, discrete cosine transform, dynamic feature extraction, and the like are performed, and finally 39-dimensional MFCC features (Mel-scale frequency cepstral coefficients, abbreviated as MFCC) are extracted in this embodiment.
As shown in fig. 3, in order to establish a schematic diagram of a speech classification model, a deep learning model is based on, which includes: but not limited to, convolutional neural networks such as CNN/RCNN/LSTM or recurrent neural networks, to recognize the input speech features.
During model training, input MFCC features are normalized, label texts representing real expressions are read to generate label vectors, the features and the labels are bound, preprocessing is carried out, and then the label vectors are sent to a CNN/RCNN/LSTM model to learn parameters, so that a reliable speech classification model is obtained.
As shown in fig. 4, in the model prediction, the 39-dimensional MFCC features are normalized, then preprocessed, and then sent to the speech classification model for prediction, and the obtained data is post-processed to obtain a speech classification result including classified speech and corresponding confidence information of current frame sound output;
as shown in fig. 1, the facial feature preprocessing module in the present system acquires images through a video stream, performs face detection through a machine learning manner, and then removes background and non-face regions.
Face feature normalization: and carrying out brightness normalization on the contents of the face image, so that the distribution of face brightness pixels is unified as much as possible, and meanwhile, the face contrast is enhanced.
The facial expression recognition module extracts features through a deep learning model, including a Convolutional Neural Network (CNN), a Deep Belief Network (DBN), a deep automatic encoder (DAN), and a Recurrent Neural Network (RNN).
After the feature extraction is completed, the facial expressions are classified. The facial expression recognition can be performed in a deep learning manner to form an end-to-end model, and a traditional machine learning method, such as an SVM (support vector machine) and other classifiers used in the embodiment, can be used for classification to obtain whether the human face has the expressions of panic and panic.
As shown in fig. 1 and 5, the human body posture detection module trains and predicts the human body posture key points through a deep learning human body posture network model structure, specifically, as the human body key point schematic diagram in fig. 5.
In the method for identifying the hand waving for help, step 3-1) specifically comprises the following steps:
3-1-1) obtaining the standing, sitting or lying posture information of the person at the moment and the current frame hand-waving posture confidence coefficient corresponding to the standing, sitting and lying posture through a posture detection model according to the key point information (as shown in figure 5) of the human body by a hand-waving motion detection module; the gesture detection model is a CNN model; whether the person who learns gesture information has the state of waving his hand to give the confidence that people at this moment stood, sat, lain, when sitting and lain the state, all be better additional work to the judgement whether output is SOS.
3-1-2) acquiring included angle data between the joints according to the key point information of the human body, and acquiring waving range information of the human body during waving according to included angle data between a forearm and a forearm (i.e. included angles formed between key points 2, 3 and 4 in fig. 5) and included angle data between the forearm and a shoulder (i.e. included angles formed between key points 1-4 and key points 1 and 5-7) in the included angle data between the joints;
3-1-3) the extension degree index calculation module sends posture information (similarly, judged according to the key point information identified in the attached figure 5) of standing, sitting and lying corresponding to the human body, hand waving amplitude information and hand waving posture confidence coefficient to the hand waving distress detection unit.
In the step 3-2), the calculation module acquires an extension index delta according to the human body key point information HPS The method specifically comprises the following steps:
introduction of delta HPS And the index is used for calculating the extension degree of the human body posture so as to assist in judging the waving frequency of the waving hand detection. Delta HPS Is based on the sum of the distribution variances of the key points of the human posture on the main axis of the key points.
3-2-1) as shown in fig. 4, the extension degree index calculation module constructs a matrix structure for the human body key point information of the given posture as follows:
X=[x 1 ,…,x n ]∈R D×n
wherein D is the dimension of the key points, n is the number of the key points, x n The coordinates of the nth key point are shown, and R is a real number set;
3-2-2) reactingDefining the mean value of all the key point coordinates in XThe variance sum delta of the principal axes of the key points of the human body posture HPS Defined as Eq1:
eq1 is only delta HPS A form of the expression, which can also be written as Eq2:
where U is the projection matrix, x i Is the ith keypoint coordinate, D is a positive number less than the keypoint dimension D, in this embodiment, D is 2, D is 1, λ j Is a covariance matrixThe jth eigenvalue, tr, represents a trace of the matrix;
thus, δ HPS Is the sum of the variances of the principal axes of the key points of the human posture and is used for reflecting the posture extension degree of the human.
3-2-3) an extension degree index calculation module according to the extension degree index delta HPS Calculating the hand waving frequency specifically as follows:
will delta HPS The indicator is applied to the waving action. Since the waving motion is also a periodic motion, δ is a motion in which the hand is waved to both sides of the human body during the motion HPS The value is larger, delta when the hand swings to the vertex and is in line with the trunk of the human body HPS Small value, by counting delta HPS The law is changed on the time axis periodically, so that the hand waving frequency during hand waving action can be obtained, and the obtained hand waving frequency information is sent to the hand waving distress detection unit.
δ HPS Has the advantages that: the index is only related to the human body action, is not related to the information such as the visual angle, the scale and the like of the camera, andthe method is not influenced by posture interference, and can effectively reflect the periodic data of the hand waving frequency of the person, so that whether the person wave the hand normally or wave the hand violently for help can be judged.
The step 5) of the invention specifically comprises the following steps:
(1) The hand waving distress detection unit obtains a global strategy of hand waving distress according to a voice classification result containing classified voice and corresponding confidence information of current frame voice output, a facial expression classification result containing classified human facial expressions and corresponding confidence information of current frame facial expression output, posture information of sitting and lying, hand waving amplitude information, hand waving posture confidence and hand waving frequency sent by a calculation module, and by utilizing a comprehensive weight method:
wherein,the confidence of the hand waving distress at the current time t,for the confidence level of the hand waving distress gesture at the current time t,a confidence is detected for the sound event at the current time t,the confidence coefficient of the facial expression at the current moment t is obtained; w is a pose ,w sound ,w face The weight of the hand waving gesture, the weight of the sound event and the weight of the facial expression are respectively;
wherein:
w pose +w sound +w face =1
w pose ,w sound ,w face for a preset assignment, w pose >w sound ,w pose >w face (ii) a In this embodiment, w is a hand waving for help pose The values may be higher, let:
w pose =0.8,w sound =0.1,w face =0.1
the above is merely an example, and other values may be selected according to the situation in practical application.
In order to avoid the problem of false detection of single-frame detection of the model,obtaining by using an exponential weighted moving average, namely the step (2) is as follows:
(2) Obtaining by means of exponentially weighted moving averageComprehensively considering the current data and the historical data, and giving higher weight to the current data, namely:
wherein, beta represents a weighting coefficient,for the confidence of the current frame hand waving distress posture,confidence information output for the current frame sound detection,confidence information output for the current frame facial expression;
at the initial moment, there is a deviation, and therefore a deviation correction is introduced, the formula is as follows:
(3) For the confidence of the current frame waving hand for asking for helpObtaining the current frame hand waving attitude confidence coefficient, hand waving amplitude information, hand waving frequency information and attitude addition information comprehensively, namely:
wherein,for the current frame hand-waving gesture confidence level,for the current frame of hand-waving amplitude information,for the current frame of hand waving frequency information,is the attitude addition coefficient;
wherein,in order to detect the amplitude of the hand swing,is the maximum value of the preset waving range,is a value between 0 and 1, and the larger the hand waving amplitude is, the higher the value is;
wherein,in order to detect the frequency of the waving of the hand,is the maximum value of the preset waving frequency,is a value between 0 and 1, the higher the waving frequency is, the higher the value is;
in practical application, gamma conversion can be performed according to the situation, nonlinear compensation is performed on the hand waving amplitude information and the hand waving frequency information, and the final weight values of the hand waving amplitude information and the hand waving frequency information in hand waving posture judgment are adjusted as follows:
gamma is an empirical value and is typically taken to be slightly less than 1.
if the gesture detection model detects that the person is in a sitting/lying gesture, the expected probability of asking for help that the person swings hands under the sitting/lying gesture is considered to be larger, the expected probability can be set as an empirical value of 0.1, and the addition value can be increased or decreased:
otherwise, if the person's posture is standing:
(6) Finally, V t (sos) Judging with a preset threshold value, if the threshold value is larger than the preset threshold value, determining that the current frame is in a hand waving distress state; and if the continuous n frames are in the hand-waving distress state, carrying out hand-waving distress alarm.
The above description is only an embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, extension, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.
Claims (10)
1. A hand-waving distress identification system is characterized by comprising: the system comprises a feature extraction unit, a calculation module and a hand waving distress detection unit;
the feature extraction unit is used for receiving the audio stream and the video stream sent by the camera, extracting MFCC acoustic features and acquiring a preprocessed face image, respectively sending the face image and the MFCC acoustic features to the hand-waving distress detection unit, and meanwhile, extracting the skeleton posture information of the personnel in the video stream, acquiring the key point information of the skeleton of the personnel and transmitting the key point information to the calculation module;
the calculation module is used for detecting hand waving actions, hand waving amplitude and sitting, lying or standing postures of personnel according to the key point information of the personnel skeleton, and sending detection results including the extension degree index, the hand waving action state and the sitting, lying or standing postures of the personnel to the hand waving help-seeking detection unit; meanwhile, according to the coordinates of the key points of the human body, obtaining the extension degree index, further obtaining the waving frequency, and sending the waving frequency to a waving help-seeking detection unit;
and the hand-waving distress detection unit is used for processing the MFCC acoustic features extracted by the feature extraction unit, the sent face image and the detection result sent by the calculation module, and judging whether the person is waving for distress by adopting a comprehensive weight method.
2. The system as claimed in claim 1, wherein the feature extraction unit comprises: the system comprises a sound feature extraction module, a facial feature preprocessing module and a human body posture detection module;
the voice feature extraction module is used for receiving the audio stream sent by the camera, processing continuous voice in the audio stream, acquiring N-dimensional MFCC features and sending the N-dimensional MFCC features to the waving help-seeking detection unit;
the face feature preprocessing module is used for acquiring images through video stream of a camera, detecting faces of people through a machine learning method, removing background and non-face areas, acquiring face point coordinates, performing face feature normalization processing on the face images according to the face point coordinates, and sending the processed face images to the waving help-seeking detection unit;
and the human body posture detection module is used for receiving the video stream sent by the camera, extracting the personnel skeleton posture information through the deep neural network, acquiring the personnel skeleton key point information and transmitting the personnel skeleton key point information to the calculation module.
3. The system as claimed in claim 1, wherein the computing module comprises: the hand waving action detection module and the extension degree index calculation module;
the hand waving motion detection module is used for carrying out normalization preprocessing on coordinates and scales according to the key point information of the human body to obtain key point characteristics, judging whether a person standing, sitting and lying in the posture information has a hand waving motion state or not, and obtaining hand waving amplitude information according to the hand waving motion state; meanwhile, outputting the hand waving gesture confidence coefficient in the hand waving action state of the person corresponding to standing, sitting and lying at the moment according to the gesture detection model; sending posture information of the human body corresponding to standing, sitting and lying, hand waving amplitude information and hand waving posture confidence coefficient to a hand waving distress detection unit;
the extension degree index calculation module is used for acquiring an extension degree index according to the coordinates of the key points of the human body so as to reflect the posture extension degree of the human body and calculating the waving frequency according to the extension degree index; and sending the hand waving frequency to a hand waving distress detection unit.
4. The system as claimed in claim 1, wherein the hand-waving distress detection unit comprises: the system comprises a voice event recognition module, a facial expression recognition module and a hand waving distress detection module;
the voice event recognition module is used for detecting events by utilizing a deep learning model according to the MFCC characteristics extracted by the characteristic extraction unit to obtain a voice classification result containing classified voice and corresponding confidence information output by the current frame voice;
the facial expression recognition module is used for extracting facial features through a deep learning model according to the facial images sent by the feature extraction unit, carrying out training reasoning and obtaining a facial expression classification result containing the facial expressions of the classified human bodies and the corresponding confidence information output by the facial expressions of the current frame;
and the hand waving distress detection module is used for judging whether the person waves hands for help according to the voice classification result, the facial expression classification result and the posture information of the corresponding station, the sitting posture and the lying posture, the hand waving amplitude information, the hand waving posture confidence coefficient and the hand waving frequency sent by the calculation module by utilizing a comprehensive weight method.
5. The identification method of a hand-waving distress identification system as claimed in claim 1, characterized by comprising the following steps:
1) Sending the audio stream sent by the camera to a sound feature extraction module, and sending the video stream to a facial feature preprocessing module and a human body posture detection module respectively;
2-1) the sound feature extraction module receives the audio stream sent by the camera, processes the continuous voice, obtains MFCC acoustic features and sends the MFCC acoustic features to the waving help-seeking detection unit;
2-2) the facial feature preprocessing module acquires images through video stream of a camera, performs face detection on personnel through a machine learning method, removes background and non-face areas, acquires face point coordinates, performs face feature normalization processing on the face images according to the face point coordinates, and sends the processed face images to a waving help detection unit;
2-3) the human body posture detection module receives the video stream sent by the camera, extracts the personnel skeleton posture information through the deep neural network, acquires the personnel skeleton key point information and transmits the personnel skeleton key point information to the calculation module;
3-1) the calculation module performs normalization preprocessing on coordinates and scales according to the key point information of the human body to obtain key point characteristics, judges whether a person standing, sitting and lying in the posture information has a hand waving action state or not, and obtains hand waving amplitude information according to the hand waving action state; meanwhile, outputting the confidence coefficients of the hand waving postures of standing, sitting and lying corresponding to the person at the moment according to the posture detection model; sending posture information of the human body corresponding to standing, sitting and lying, hand waving amplitude information and hand waving posture confidence coefficient to a hand waving distress detection unit;
3-2) the calculation module obtains the extension degree index delta according to the key point information of the human body HPS And according to the elongation index delta HPS Calculating the hand waving frequency, and sending the hand waving frequency to a hand waving distress detection unit;
4) The hand-waving help-seeking detection unit detects an event by using a deep learning model according to the MFCC acoustic features extracted by the feature extraction unit, and the event comprises classified voices and a voice classification result of confidence information corresponding to the current frame voice output; meanwhile, facial features are extracted through a deep learning model according to the facial images sent by the feature extraction unit, training reasoning is carried out, and a facial expression classification result containing the facial expressions of the classified human body and confidence information output by the corresponding facial expressions of the current frame is obtained;
5) The hand waving distress detection unit judges whether the person is waving for distress according to the voice classification result, the facial expression classification result, and the posture information of the corresponding station, sitting and lying of the human body, the hand waving amplitude information, the hand waving posture confidence coefficient and the hand waving frequency sent by the calculation module by utilizing a comprehensive weight method.
6. The recognition method of a hand-waving distress recognition system according to claim 5, wherein in the step 2-1), after the continuous speech is processed, MFCC acoustic features are obtained, specifically:
carrying out pre-emphasis, framing and windowing operations on continuous voice of an audio stream in sequence to obtain pre-processed voice information;
and sequentially performing fast Fourier transform, mei filter bank, logarithm operation, discrete cosine transform and dynamic feature extraction on the preprocessed sound information to finally obtain the N-dimensional MFCC acoustic features.
7. The identification method of the hand-waving distress identification system according to claim 5, wherein the step 3-1) is specifically as follows:
3-1-1) obtaining the posture information of standing, sitting or lying of the person at the moment and the confidence coefficient of the hand waving posture of the current frame corresponding to the standing, sitting and lying by the calculation module through a posture detection model according to the key point information of the human body; the attitude detection model is a CNN model;
3-1-2) simultaneously acquiring included angle data among the joints according to key point information of the human body, and acquiring hand waving amplitude information of the human body during hand waving according to included angle data between a small arm and a large arm and included angle data between the large arm and a shoulder in the included angle data among the joints;
3-1-3) the calculation module sends the posture information of the corresponding station, sitting and lying of the human body, the hand waving amplitude information and the hand waving posture confidence coefficient to the hand waving distress detection unit.
8. The identification method of the hand-waving distress identification system according to claim 5, wherein in the step 3-2), the calculation module obtains the extension degree index delta according to the key point information of the human body HPS The method specifically comprises the following steps:
3-2-1) the calculation module constructs a matrix structure for the human body key point information of the given posture as follows:
X=[x 1 ,…,x n ]∈R D×n
wherein D is the dimension of the key points, n is the number of the key points, x n The coordinates of the nth key point are shown, and R is a real number set;
3-2-2) reactingDefining the mean value of all key point coordinates in XThe variance sum delta of the principal axes of the key points of the human body posture HPS Is defined as follows:
wherein U is a projection matrix,x i is the ith keypoint coordinate, D is a positive number less than the keypoint dimension D, λ j Is a covariance matrixThe jth eigenvalue, tr, represents a trace of the matrix;
3-2-3) according to the elongation index delta HPS Calculating the hand waving frequency specifically as follows:
δ HPS the sum of the variances of the principal axes of the key points of the human posture is used for reflecting the posture extension degree of the human body;
when the hand swings to two sides of the human body, delta HPS The value is maximum; when the hand swings to the vertex and is in line with the trunk of the human body, delta HPS Minimum value by counting delta HPS Changing the rule on a time axis, namely obtaining hand waving frequency information during hand waving action; and sending the obtained hand waving frequency information to a hand waving distress detection unit.
9. The identification method of the hand-waving distress identification system according to claim 5, wherein the step 4) is specifically as follows:
4-1) for sound event classification, specifically:
during model training, input MFCC features are normalized, label texts for representing corresponding expression names are read to generate label vectors, the MFCC features and the label vectors are bound, then preprocessing is carried out, and then the MFCC features and the label vectors are sent to a deep learning model to learn parameters to obtain a voice classification model;
wherein, the deep learning model is any one of a CNN model, an RCNN model and an LSTM model;
during model prediction, input MFCC characteristics are normalized, then preprocessing is carried out, then the input MFCC characteristics are sent into a voice classification model for prediction, and the obtained data are subjected to post-processing to obtain a voice classification result containing classified voice and corresponding confidence information of current frame voice output;
4-2) classifying facial expressions, specifically:
extracting facial features from the face image through a deep learning model;
the deep learning model is any one of a convolutional neural network, a deep confidence network, a deep automatic encoder and a recurrent neural network;
after the facial features are extracted, the facial expressions are classified through a convolutional neural network, so that a facial expression classification result containing the classified facial expressions of the human body and corresponding confidence coefficient information output by the current frame facial expressions is obtained.
10. The identification method of the hand-waving distress identification system according to claim 5, wherein the step 5) is specifically as follows:
(1) The hand waving distress detection unit obtains a global strategy of hand waving distress according to a voice classification result containing classified voice and corresponding confidence information of current frame voice output, a facial expression classification result containing classified human facial expressions and corresponding confidence information of current frame facial expression output, posture information of sitting and lying, hand waving amplitude information, hand waving posture confidence and hand waving frequency sent by a calculation module, and by utilizing a comprehensive weight method:
V t (sos) =w pose V t (pose) +w sound V t (sound) +w face V t (face)
wherein, V t (sos) Is the confidence of the help-seeking when waving hand at the current time t, V t (pose) Is the confidence coefficient of the hand waving distress gesture at the current moment t, V t (sound) For the confidence of the detection of the sound event at the current time t, V t (face) The confidence coefficient of the facial expression at the current moment t is obtained; w is a pose ,w sound ,w face Respectively weighting the hand-waving gesture, the sound event and the facial expression;
wherein:
w pose +w sound +w face =1
w pose ,w sound ,w face for a preset assignment, w pose >w sound ,w pose >w face ;
(2) Obtaining V by means of exponentially weighted moving average t (pose) ,V t (sound) ,V t (face) Namely:
wherein, beta represents a weighting coefficient,for the confidence of the current frame hand waving distress posture,confidence information output for the current frame sound detection,confidence information output for the current frame facial expression;
(3) For the confidence of the current frame waving hand for asking for helpObtaining the current frame hand waving attitude confidence coefficient, hand waving amplitude information, hand waving frequency information and attitude addition information comprehensively, namely:
wherein,for the current frame hand-waving gesture confidence level,for the current frame of hand-waving amplitude information,for the current frame of hand waving frequency information,is the attitude addition coefficient;
wherein,in order to detect the amplitude of the hand swing,is the maximum value of the preset waving range,is a numerical value between 0 and 1, and the larger the waving amplitude is, the higher the value is;
(5) Current frame waving frequency informationThe method specifically comprises the following steps:
wherein,in order to detect the frequency of the hand waving,is the maximum value of the preset waving frequency,is a value between 0 and 1, the higher the waving frequency is, the higher the value is;
(6) Finally, V t (sos) Judging with a preset threshold value, if the threshold value is larger than the preset threshold value, judging that the current frame is in a hand waving distress state; and if the continuous n frames are in the hand-waving distress state, carrying out hand-waving distress alarm.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211259423.6A CN115641610A (en) | 2022-10-14 | 2022-10-14 | Hand-waving help-seeking identification system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211259423.6A CN115641610A (en) | 2022-10-14 | 2022-10-14 | Hand-waving help-seeking identification system and method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115641610A true CN115641610A (en) | 2023-01-24 |
Family
ID=84944112
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211259423.6A Pending CN115641610A (en) | 2022-10-14 | 2022-10-14 | Hand-waving help-seeking identification system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115641610A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116229581A (en) * | 2023-03-23 | 2023-06-06 | 珠海市安克电子技术有限公司 | Intelligent interconnection first-aid system based on big data |
CN118280552A (en) * | 2024-05-31 | 2024-07-02 | 西安四腾环境科技有限公司 | Hospital management method based on video monitoring |
-
2022
- 2022-10-14 CN CN202211259423.6A patent/CN115641610A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116229581A (en) * | 2023-03-23 | 2023-06-06 | 珠海市安克电子技术有限公司 | Intelligent interconnection first-aid system based on big data |
CN116229581B (en) * | 2023-03-23 | 2023-09-19 | 珠海市安克电子技术有限公司 | Intelligent interconnection first-aid system based on big data |
CN118280552A (en) * | 2024-05-31 | 2024-07-02 | 西安四腾环境科技有限公司 | Hospital management method based on video monitoring |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11783183B2 (en) | Method and system for activity classification | |
CN115641610A (en) | Hand-waving help-seeking identification system and method | |
CN110826466A (en) | Emotion identification method, device and storage medium based on LSTM audio-video fusion | |
Mohandes et al. | Arabic sign language recognition using the leap motion controller | |
Chibelushi et al. | A review of speech-based bimodal recognition | |
CN105139864B (en) | Audio recognition method and device | |
CN110575663A (en) | physical education auxiliary training method based on artificial intelligence | |
CN111103976A (en) | Gesture recognition method and device and electronic equipment | |
CN115169507A (en) | Brain-like multi-mode emotion recognition network, recognition method and emotion robot | |
CN114724224A (en) | Multi-mode emotion recognition method for medical care robot | |
CN113516005A (en) | Dance action evaluation system based on deep learning and attitude estimation | |
CN114155512A (en) | Fatigue detection method and system based on multi-feature fusion of 3D convolutional network | |
CN112418166A (en) | Emotion distribution learning method based on multi-mode information | |
CN115188074A (en) | Interactive physical training evaluation method, device and system and computer equipment | |
CN116312512A (en) | Multi-person scene-oriented audiovisual fusion wake-up word recognition method and device | |
Lin et al. | Adaptive multi-modal fusion framework for activity monitoring of people with mobility disability | |
CN116244474A (en) | Learner learning state acquisition method based on multi-mode emotion feature fusion | |
CN117635904A (en) | Dynamic self-adaptive feature-aware credible low-speed unmanned aerial vehicle detection method | |
Raghavachari et al. | Deep learning framework for fingerspelling system using CNN | |
CN111339878A (en) | Eye movement data-based correction type real-time emotion recognition method and system | |
CN114694254B (en) | Method and device for detecting and early warning robbery of articles in straight ladder and computer equipment | |
WO2022247118A1 (en) | Pushing method, pushing apparatus and electronic device | |
CN112597842B (en) | Motion detection facial paralysis degree evaluation system based on artificial intelligence | |
CN115472182A (en) | Attention feature fusion-based voice emotion recognition method and device of multi-channel self-encoder | |
CN115223218A (en) | Adaptive face recognition technology based on ALFA meta-learning optimization algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |