CN116616747A

CN116616747A - Gesture recognition method and device, electronic equipment and storage medium

Info

Publication number: CN116616747A
Application number: CN202211092665.0A
Authority: CN
Inventors: 曾昭泽; 宋志龙; 杨景; 姚沁; 刘莹胜
Original assignee: Lumi United Technology Co Ltd
Current assignee: Lumi United Technology Co Ltd
Priority date: 2022-09-08
Filing date: 2022-09-08
Publication date: 2023-08-22

Abstract

The application provides a gesture recognition method, a gesture recognition device, electronic equipment and a storage medium, and relates to the technical field of computers. Wherein the method comprises the following steps: based on the positioning of the target object, obtaining target data related to the gesture of the target object in a detection area; extracting time-frequency domain characteristics of the target data to obtain a first target characteristic and a second target characteristic; the first target feature is a frequency domain representation of the pose of the target object in the detection region, and the second target feature is a time domain representation of the pose of the target object in the detection region; and recognizing the gesture of the target object according to the first target feature and the second target feature to obtain a gesture recognition result. The application solves the problem of low accuracy of gesture recognition in the related art.

Description

Gesture recognition method and device, electronic equipment and storage medium

Technical Field

The application relates to the technical field of computers, in particular to a gesture recognition method, a gesture recognition device, electronic equipment and a storage medium.

Background

With the development of computer technology, gesture recognition is increasingly used to assist users in various information analyses, for example, based on sleep gesture recognition, the users may be assisted in analyzing sleep states, sleep phases, and the like.

Currently, gesture recognition is mainly implemented based on various types of detection devices, which may be cameras, radar devices, sensors, etc., for example. Whichever type of detection device affects the accuracy of gesture recognition. Taking a pressure sensor as an example, the pressure sensor is easy to be influenced by environment to cause detection failure, so that the accuracy of gesture recognition is influenced.

For this reason, how to improve the accuracy of gesture recognition remains to be solved.

Disclosure of Invention

The application provides a gesture recognition method, a gesture recognition device, electronic equipment and a storage medium, which can solve the problem of low accuracy of gesture recognition in the related technology. The technical scheme is as follows:

according to one aspect of the present application, a gesture recognition method includes: based on the positioning of the target object, obtaining target data related to the gesture of the target object in a detection area; extracting time-frequency domain characteristics of the target data to obtain a first target characteristic and a second target characteristic; the first target feature is a frequency domain representation of the pose of the target object in the detection region, and the second target feature is a time domain representation of the pose of the target object in the detection region; and recognizing the gesture of the target object according to the first target feature and the second target feature to obtain a gesture recognition result.

According to one aspect of the present application, a gesture recognition apparatus includes: the data acquisition module is used for acquiring target data related to the gesture of the target object in the detection area based on the positioning of the target object; the feature extraction module is used for extracting time-frequency domain features of the target data to obtain a first target feature and a second target feature; the first target feature is a frequency domain representation of the pose of the target object in the detection region, and the second target feature is a time domain representation of the pose of the target object in the detection region; and the gesture recognition module is used for recognizing the gesture of the target object according to the first target feature and the second target feature to obtain a gesture recognition result.

In an exemplary embodiment, the feature extraction module includes: the frequency domain transformation unit is used for carrying out frequency domain transformation on the target data to obtain a plurality of frequency domain signals of the target data in a frequency domain, wherein each frequency domain signal corresponds to each frequency point in the frequency domain; the time domain transformation unit is used for extracting frequency domain characteristics of frequency domain signals corresponding to all frequency points in the set frequency band to obtain first target characteristics; and performing time domain transformation on the frequency domain signals corresponding to the frequency points in the set frequency band to obtain a second target feature.

In an exemplary embodiment, the gesture recognition module includes: the feature fusion unit is used for fusing the first target feature and the second target feature to obtain a target joint feature; and the gesture prediction unit is used for inputting the target joint characteristics into a gesture recognition model to perform gesture category prediction, so as to obtain the gesture recognition result.

In an exemplary embodiment, the apparatus further comprises: the model training module is used for carrying out model training on the basic model according to the sample joint characteristics in the training set and the corresponding sample labels thereof to obtain the gesture recognition model, and the sample labels are used for indicating gesture categories marked for samples to which the corresponding sample joint characteristics belong; the model training module comprises: the sample input unit is used for inputting the current sample joint characteristic in the training set into the basic model, and carrying out gesture recognition on the sample joint characteristic through the basic model to obtain a gesture prediction result; the loss calculation unit is used for obtaining a corresponding loss value based on the difference between the gesture prediction result and a sample label corresponding to the current sample joint characteristic; and the convergence unit is used for adjusting the gradient of each model parameter in the basic model according to the loss value and continuing training until the gradient of each model parameter meets the set convergence condition, and training by the basic model to obtain the gesture recognition model.

In an exemplary embodiment, the apparatus further comprises: the signal acquisition module is used for acquiring echo signals matched with the set window length from a plurality of echo signals based on a sliding window with the set window length; the plurality of echo signals are formed by reflecting a plurality of radar signals emitted by the detection equipment in the positioning process through the target object; the frequency spectrum analysis module is used for carrying out frequency spectrum analysis on the acquired echo signals to obtain a plurality of distance data, wherein the distance data are used for indicating the radial distance between the target object and the detection equipment in the detection area; a stationary detection module for determining whether the target object in the detection area is stationary according to a plurality of the distance data; if not, the sliding window is controlled to continue sliding in the echo signals.

In an exemplary embodiment, the gesture is a sleeping gesture, and the gesture recognition result is used to indicate the sleeping gesture of the target object in the detection area.

In an exemplary embodiment, the apparatus further comprises: and the first automatic control module is used for notifying the equipment to execute the action corresponding to the sleeping gesture indicated by the gesture recognition result.

In an exemplary embodiment, the apparatus further comprises: the vital sign detection module is used for detecting vital signs of the target object based on the sleeping gesture indicated by the gesture recognition result to obtain vital sign data of the target object, wherein the vital sign data are used for indicating vital signs of the target object; the sleep stage module is used for determining a sleep stage result according to vital signs of the target object indicated by the vital sign data and the reliability configured for the vital signs of the target object in the sleeping posture; and the second automatic control module is used for notifying the equipment to execute actions corresponding to the sleep stage indicated by the sleep stage result.

In an exemplary embodiment, the apparatus further comprises: and the configuration module is used for carrying out credibility configuration on vital signs of the target object under the sleeping gesture according to the sleeping gesture indicated by the gesture recognition result, the first target feature and/or the second target feature.

According to one aspect of the application, an electronic device comprises: at least one processor, at least one memory, and at least one communication bus, wherein the memory stores computer programs, and the processor reads the computer programs in the memory through the communication bus; the computer program, when executed by a processor, implements the gesture recognition method as described above.

According to one aspect of the present application, a storage medium has stored thereon a computer program which, when executed by a processor, implements the gesture recognition method as described above.

According to one aspect of the present application, a computer program product comprises a computer program stored in a storage medium, from which the computer program is read by a processor of a computer device, the computer program being executed by the processor such that the computer device, when executed, implements the gesture recognition method as described above.

The technical scheme provided by the application has the beneficial effects that:

in the technical scheme, after the target data related to the gesture of the target object in the detection area is obtained, the time-frequency domain feature extraction is carried out on the target data to respectively obtain the first target feature and the second target feature, the first target feature is the frequency domain expression of the gesture of the target object in the detection area, the second target feature is the time domain expression of the gesture of the target object in the detection area, that is, the first target feature and the second target feature are utilized, namely, the different expressions of the gesture of the target object in the detection area in the time-frequency domain are utilized, the gesture of the target object is comprehensively predicted, and higher recognition accuracy can be obtained, so that the problem of low accuracy of gesture recognition in the related technology is solved.

Drawings

In order to more clearly illustrate the technical solutions provided by the present application, the drawings that are required to be used in the description of the embodiments of the present application will be briefly described below.

FIG. 1 is a schematic illustration of an implementation environment in accordance with an embodiment of the present application;

FIG. 2 is a flowchart illustrating a gesture recognition method according to an example embodiment;

FIG. 3 is a flow chart of step 370 in one embodiment of the corresponding embodiment of FIG. 2;

FIG. 4 is a flowchart illustrating another gesture recognition method according to an example embodiment;

FIG. 5 is a flow chart of step 410 in one embodiment of the corresponding embodiment of FIG. 4;

FIG. 6 is a schematic diagram illustrating a sliding window movement according to an exemplary embodiment;

FIG. 7 is a flowchart illustrating another gesture recognition method according to an example embodiment;

FIG. 8 is a flow chart of step 373 in one embodiment of the corresponding embodiment of FIG. 3;

FIG. 9 is a block diagram illustrating a gesture recognition apparatus according to an exemplary embodiment;

FIG. 10 is a hardware block diagram of a server shown in accordance with an exemplary embodiment;

fig. 11 is a block diagram illustrating a structure of an electronic device according to an exemplary embodiment.

Detailed Description

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein includes all or any element and all combination of one or more of the associated listed items.

The following is an introduction and explanation of several terms involved in the present application:

CFAR, english is called Constant False-Alarm Rate, and Chinese is defined as Constant False Alarm Rate.

FFT, english is known as Fast Fourier Transform, and Chinese means fast Fourier transform.

IFFT, english is known as Inverse Fast Fourier Transform and chinese means inverse fast fourier transform.

SNR, english is called Signal-Noise Ratio, and Chinese meaning is Signal-to-Noise Ratio.

PCA, english is called Principal Component Analysis, and Chinese meaning is the principal component analysis.

The SVM is named Support Vector Machine in English, and the Chinese meaning is a support vector machine.

CNN, english, is known as Convolutional Neural Networks and chinese means convolutional neural network.

LSTM, english is called Long Short-term Memory, chinese meaning Long-Short term Memory.

GRU, english, gated Recurrent Unit, chinese meaning gating cyclic unit.

Currently, a user can perform various information analysis, such as analysis of his own sleep condition, through data obtained by various types of detection devices. Specifically, the piezoelectric sensor is used for collecting fluctuation signals of the chest region of the user to detect vital signs such as heart rate, respiratory rate and body movement of the user, and further, the sleep state, the sleep stage and the like of the user are analyzed based on the vital signs.

In general, the accuracy of vital signs of the user detected in different sleeping positions is different, for example, if the radar device detects vertically downwards, the detection device is faced by the lying plane, and the chest cavity fluctuation caused by respiration and heartbeat is larger, so that the detected respiration rate and heart rate are more accurate; the chest cavity fluctuation caused by breathing and heartbeat in the lying state is smaller in the radial direction of the radar relative to lying state, so that the breathing and heartbeat are weakened compared with the lying state, and the accuracy of the breathing rate and the heart rate is affected, wherein the left lying state is farthest from radar equipment due to the fact that the heart is included, the heart beat is weakest, and then the accuracy of the heart rate is lowest.

Thus, the sleep state judgment, sleep stage analysis, and the like based on the vital signs are deviated. In order to analyze the sleep state, sleep stage and the like of the user more accurately, sleep gesture recognition is performed in the vital sign detection process, so that effective data basis is provided for judging the sleep state and analyzing the sleep stage.

However, as described above, the accuracy is not high regardless of the sleep position recognition based on any detection device of the camera, the radar device, and the sensor. In addition, the camera also has the problem that relates to user privacy, and the radar equipment needs to detect the profile in the gesture recognition process and can lead to the cost to be too high, and the sensor is then easily influenced by the environment or is easily damaged, and influences the validity of detection for the accuracy of gesture recognition further drops.

From the above, taking sleeping gesture as an example, the related art still has the defect of low accuracy of sleeping gesture recognition, and similarly, gesture recognition realized based on various types of detection devices still has the limitation of low accuracy.

Therefore, the gesture recognition method provided by the application can effectively improve the accuracy of gesture recognition, and is correspondingly suitable for a gesture recognition device which can be deployed in an electronic device, for example, the electronic device can be a computer device with a von neumann architecture, the computer device comprises but is not limited to a desktop computer, a notebook computer, a tablet computer, a server and the like, or the electronic device can also be a detection device, such as a human body sensor and the like.

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.

Fig. 1 is a schematic diagram of an implementation environment involved in a gesture recognition method. The implementation environment includes user terminal 110, intelligent device 130, gateway 150, server side 170, and router 190.

Specifically, the user terminal 110 may also be considered as a user terminal or a terminal, and the deployment (also understood as installation) of the client associated with the smart device 130 may be performed, where the user terminal 110 may be an electronic device such as a smart phone, a tablet computer, a notebook computer, a desktop computer, an intelligent control panel, or other devices having display and control functions, which is not limited herein.

The client is associated with the smart device 130, and is essentially that the user registers an account in the client, and configures the smart device 130 in the client, for example, the configuration includes adding a device identifier to the smart device 130, so that when the client is operated in the user terminal 110, functions related to device display, device control, and the like of the smart device 130 can be provided for the user, where the client may be in the form of an application program or a web page, and correspondingly, an interface where the client performs device display may be in the form of a program window or a web page, which is not limited herein.

The intelligent device 130 is disposed in the gateway 150 and communicates with the gateway 150 through its own configured communication module, and is further controlled by the gateway 150. It should be understood that smart device 130 is generally referred to as one of a plurality of smart devices 130, and embodiments of the present application are merely illustrated with smart device 130, i.e., embodiments of the present application are not limited in the number and type of smart devices deployed in gateway 150. In one application scenario, intelligent device 130 accesses gateway 150 via a local area network, thereby being deployed in gateway 150. The process of intelligent device 130 accessing gateway 150 through a local area network includes: a local area network is first established by gateway 150 and intelligent device 130 joins the local area network established by gateway 150 by connecting to gateway 150. Such local area networks include, but are not limited to: ZIGBEE or bluetooth. The intelligent device 130 may be an intelligent printer, an intelligent fax machine, an intelligent camera, an intelligent air conditioner, an intelligent door lock, an intelligent lamp, or an electronic device such as a human body sensor, a door and window sensor, a temperature and humidity sensor, a water immersion sensor, a natural gas alarm, a smoke alarm, a wall switch, a wall socket, a wireless switch, a wireless wall-mounted switch, a magic cube controller, a curtain motor, etc. which are configured with a communication module.

Interaction between user terminal 110 and intelligent device 130 may be accomplished through a local area network, or through a wide area network. In an application scenario, the ue 110 establishes a communication connection between the router 190 and the gateway 150 in a wired or wireless manner, for example, including but not limited to WIFI, so that the ue 110 and the gateway 150 are disposed in the same local area network, and further the ue 110 may implement interaction with the smart device 130 through a local area network path. In another application scenario, the ue 110 establishes a wired or wireless communication connection between the server 170 and the gateway 150, for example, but not limited to, 2G, 3G, 4G, 5G, WIFI, etc., so that the ue 110 and the gateway 150 are deployed in the same wide area network, and further the ue 110 may implement interaction with the smart device 130 through a wide area network path.

The server 170 may be considered as a cloud, a cloud platform, a server, etc., where the server 170 may be a server, a server cluster formed by a plurality of servers, or a cloud computing center formed by a plurality of servers, so as to better provide background services to a large number of user terminals 110. For example, the background service includes a gesture recognition service.

Taking the example of providing the gesture recognition service by the server 170, the following description is given to the gesture recognition process:

in the case of the smart device 130, positioning the target object can obtain target data related to the gesture of the target object in the detection area, and then send the target data to the server 170, so as to request the server 170 to provide the gesture recognition service.

Then, with the interaction between the intelligent device 130 and the server 170 through the local area network path or the wide area network path, for the server 170, the target data sent by the intelligent device 130 can be received, and then the time-frequency domain feature extraction is performed on the target data, so as to obtain a first target feature and a second target feature, so that the gesture of the target object is identified by combining the first target feature and the second target feature, and a gesture identification result with high identification accuracy is obtained.

Based on the gesture recognition result with high recognition accuracy, the user can be better assisted in various information analyses, such as analysis of sleep states, sleep phases and the like based on sleep gesture recognition.

Referring to fig. 2, an embodiment of the present application provides a gesture recognition method, which is suitable for an electronic device, and the electronic device may specifically be a server 170 in the implementation environment shown in fig. 1, or may be an intelligent device 130 in the implementation environment shown in fig. 1, for example, a body sensor.

In the following method embodiments, for convenience of description, the execution subject of each step of the method is described as an electronic device, but this configuration is not particularly limited.

As shown in fig. 2, the method may include the steps of:

in step 310, based on the positioning of the target object, target data related to the posture of the target object in the detection area is obtained.

First, the target object refers to any object that can appear in the detection area, and the object may specifically be an object having a posture such as a person, a robot, an animal, or the like. The detection area refers to an area where the detection device can emit an effective detection signal to the target object.

Secondly, the positioning of the target object is achieved by the detection device. In one possible implementation, the detection device is mounted directly above the target object. For example, if the target object is a person, the detection device may be mounted directly above the chest of the person, such as a ceiling.

Taking a human body sensor with a millimeter wave radar as an example, in the human body sensor, the millimeter wave radar is provided with an antenna array, the antenna array comprises a transmitting antenna and a receiving antenna, the human body sensor can transmit millimeter wave signals to a target object through the transmitting antenna, and the receiving antenna receives echo signals formed by the millimeter wave signals transmitted by the target object, and the positioning of the target object is realized through relevant processing.

By locating the target object, the detection device is able to obtain position data of the target object in the detection area. Wherein the position data is used to indicate the position of the target object in the detection area. For a target object that remains in pose, in one possible implementation, the position data includes, but is not limited to: the pose of the target object in the detection area, the radial distance of the target object in the detection area from the detection device.

Thus, the target data, which is related to the posture of the target object in the detection area, is extracted from the position data of the target object in the detection area. In one possible implementation, the target data is described by the azimuth of the receiving antenna, which may be a horizontal angle, or may be referred to as a pitch angle, for indicating the pose of the target object in the detection area.

For example, the position data of the target object in the detection area can be expressed asWherein ρ is ^* Representing the radial distance, θ, between the target object in the detection zone and the detection device ^* Horizontal angle for indicating gesture of target object in detection area, +.>Representing the pitch angle of the target object in the detection area for indicating the attitude. Then, from the position data, target data can be extracted, expressed as +. >In one possible implementation, if the pose is a sleeping pose, the target data specifically indicates a horizontal angle and a pitch angle of the sleeping pose of the target object in the detection area.

It should be noted that, if the target object is not present in the detection area, and the detection device still locates the target object, the power consumption of the detection device is high, so that the detection device needs to replace the battery frequently, which affects the service life of the detection device, and is not beneficial to reducing the hardware cost of gesture recognition. For this purpose, in one possible implementation, it is first determined whether the target object is present in the detection region before the target object is located. For example, the detection device performs target object detection on the detection area by using a two-dimensional CFAR (Constant False-Alarm Rate) algorithm, so that if the detection device detects that a target object exists in the detection area, the detection device locates the target object. Of course, in other embodiments, the target object detection and the target object positioning are not limited to being performed by the same detection device, and the target object detection and the target object positioning may also be performed by different detection devices, which are not particularly limited herein.

And step 350, extracting time-frequency domain features of the target data to obtain a first target feature and a second target feature.

The first target feature refers to a feature of a frequency domain signal corresponding to a frequency point in a set frequency band, namely, a frequency domain feature, and can be considered as a frequency domain expression of the gesture of the target object in the detection area; the second target feature refers to a feature of the time domain signal obtained by performing time domain transformation on the frequency domain signal, namely, the time domain feature, and can also be considered as a time domain expression of the gesture of the target object in the detection area.

It should be understood that, after the time-frequency domain transformation, the different expressions of the target object in the time-frequency domain will also be different, that is, the first target feature is different from the first target feature, in other words, the first target feature accurately describes the pose of the target object in the detection area from the frequency domain perspective, and the second target feature accurately describes the pose of the target object in the detection area from the time domain perspective.

In one possible implementation manner, the time-frequency domain feature extraction process specifically refers to: performing frequency domain transformation on the target data to obtain a plurality of frequency domain signals of the target data in a frequency domain; extracting frequency domain characteristics of frequency domain signals corresponding to all frequency points in a set frequency band to obtain first target characteristics; and performing time domain transformation on the frequency domain signals corresponding to the frequency points in the set frequency band to obtain a second target feature. Wherein, each frequency domain signal corresponds to each frequency point in the frequency domain.

The frequency domain is a coordinate system describing characteristics of signals in terms of frequency, and can describe signal quantities in each specified frequency band in the frequency range. Wherein, the frequency point may be a specified frequency, and the frequency may be an amount describing how frequently the object vibrates.

In one possible implementation, the frequency domain transform is referred to as a fast fourier transform. Specifically, assuming that target data is represented as x (n), performing frequency domain transformation such as fast fourier transformation on the target data to obtain a frequency domain signal x (k), wherein n represents the number of vectors in the target data; k represents the frequency bin number of the frequency domain signal, and may also be considered that the frequency domain signal corresponds to the kth frequency bin in the frequency domain.

In one possible implementation manner, the target data is windowed, so that the frequency domain transformation is performed based on the processed target data, thereby preventing the occurrence of spectrum leakage phenomenon caused by waveform discontinuity of the target data, and further being beneficial to improving the recognition accuracy. The windowing process may be a hamming window or a rectangular window, which is not limited herein.

In one possible implementation, the set frequency bands include a respiratory band and a heartbeat band. Wherein the breathing frequency range specifically refers to 0.1 Hz-0.6 Hz; the heartbeat frequency band is specifically 0.8 Hz-4 Hz.

In one possible implementation, the time domain transform refers to an inverse fast fourier transform. Specifically, assuming a frequency domain signal x (k) corresponding to a kth frequency point in the frequency domain, if the kth frequency point belongs to a set frequency band, performing inverse fast fourier transform or the like on the frequency domain signal corresponding to each frequency point in the set frequency band to obtain a time domain signal corresponding to the set frequency band. For example, the frequency band is set as the respiratory frequency band, and the corresponding time domain signal is the respiratory signal x _b (n) or, setting the frequency band as the heartbeat frequency band, and the corresponding time domain signal as the heartbeat signal x _h (n), wherein n represents the number of vectors in the time domain signal.

The first target feature and the second target feature are illustrated below with the pose as a sleeping pose:

in one possible implementation, the first target features include, but are not limited to: the frequency domain characteristics such as respiratory frequency band energy, heartbeat frequency band energy, energy ratio between respiratory frequency band and heartbeat frequency band, respiratory frequency band peak signal to noise ratio, heartbeat frequency band peak signal to noise ratio, peak signal to noise ratio between respiratory frequency band and heartbeat frequency band, ratio between different peaks of heartbeat frequency band, ratio between corresponding frequency points of different peaks of heartbeat frequency band, and the like.

The frequency domain features are specifically as follows:

1. respiratory band energy:wherein k is _b,s Represents the starting frequency point, k of the respiratory frequency band _b,e Represents the ending frequency point of the respiratory frequency band, and x (k) represents the kth frequency point in the respiratory frequency bandA corresponding frequency domain signal;

2. heart beat frequency band energy:wherein k is _h,s The initial frequency point k of the heart beat frequency band _h,e A termination frequency point of the heartbeat frequency band is represented, and x (k) represents a frequency domain signal corresponding to a kth frequency point in the heartbeat frequency band;

3. energy ratio between respiratory band and heartbeat band: r is R _bh,E ＝E _b /E _h The method comprises the steps of carrying out a first treatment on the surface of the Wherein E is _b Representing respiratory band energy, E _h Representing heartbeat frequency band energy;

4. respiratory band peak signal-to-noise ratio: SNT (social network site) _b ＝|X(k _b，1 )|/M；k _b,1 Represents the frequency point, X (k), where the maximum peak value found by peak value search in the respiratory frequency band is located _b,1 ) Representing frequency bin k _b,1 The corresponding frequency domain signal M represents the noise bottom mean value and is the termination frequency point k based on the respiratory frequency band _b,e The subsequent frequency point k _b,e +1 to maximum frequency point k in the frequency domain _e The frequency domain signal X (k) corresponding to each frequency point k can be specifically obtained by calculating the average value of the frequency domain signal peak value |x (k) | according to the following calculation formula:

5. peak signal-to-noise ratio of heartbeat frequency band: SNR of _h ＝|X(k _h，1 )|/M；k _h,1 Represents the frequency point, X (k) of the maximum peak value found by peak value search in the heartbeat frequency band _h,1 ) Representing frequency bin k _h,1 The corresponding frequency domain signal M represents the noise bottom mean value and is the termination frequency point k based on the heartbeat frequency band _h,e The subsequent frequency point k _h,e +1 to maximum frequency point k in the frequency domain _e The frequency domain signal X (k) corresponding to each frequency point k can be obtained by calculating the average value of the frequency domain signal peak value |x (k) | according to the following calculation formula:

6. calling a callPeak signal-to-noise ratio between suction band and heartbeat band: r is R _bh，SNR ＝SNR _b /SNR _h The method comprises the steps of carrying out a first treatment on the surface of the Wherein SNR is _b Peak signal-to-noise ratio, SNR, representing respiratory frequency band _h A peak signal-to-noise ratio representing the heartbeat frequency band;

7. ratio between highest peak and second highest peak of heartbeat frequency band: r is R _h，12 ＝|X(k _h，1 )|/|X(k _h，2 )|；

8. Ratio between frequency points corresponding to the highest peak and the next highest peak of the heartbeat frequency band: r is R _h，k12 ＝k _h，1 /k _h，2 The method comprises the steps of carrying out a first treatment on the surface of the Wherein k is _h，1 Represents the frequency point, k, where the maximum peak value in the heartbeat frequency band is located _h，2 And the frequency point of the second largest peak value in the heartbeat frequency band is indicated.

In one possible implementation, the second target features include, but are not limited to: respiration average peak, heartbeat average peak, average peak ratio between respiration and heartbeat, respiration waveform average symmetry ratio, respiration rate fluctuation, heart rate fluctuation, and the like.

The time domain features are specifically as follows:

1. breath average peak: in the respiration signal xb (n), all peaks n are found by peak search _b，peak (m)，m＝1，...，N _b，peak ，N _b，peak Representing respiration signal x _b The number of all peaks in (n); the average peak value of respiration is calculated according to the following calculation formula

2. Average peak of heartbeat: at the heartbeat signal x _h In (n), all peaks n are found by peak search _h，peak (m)，m＝1，...，N _h，peak ，N _h，peak Representing heartbeat signal x _h The number of all peaks in (n); the heartbeat average peak value is calculated according to the following calculation formula

3. Average peak between respiration and heartbeatRatio of: r is R _bh，P ＝M _b，peak /M _h，peak The method comprises the steps of carrying out a first treatment on the surface of the Wherein M is _b，peak Represents the average peak of respiration, M _h，peak Representing the average peak of the heartbeat;

4. average symmetry ratio of respiratory waveform: at respiration signal x _b In (n), all wave troughs n are found by peak search _b，valley (m)，m＝1，...，N _b，valley ，n _b，valley (m) represents the respiration signal x _b The number of all valleys in (n); the time difference between the trough and the crest is calculated according to the following calculation formula:

n _{valley，peak} (m)＝n _b，valley (m)-n _b，peak (m)，m＝1，...，min(N _b，valley ，N _b，peak )；

the time difference between the wave crest and the wave trough is calculated according to the following calculation formula:

n _{peak，valley} (m)＝n _b，peak (m+1)-n _b，valley (m)，m＝1，...，min(N _b，valley ，N _b，peak )；

based on the time difference, the average symmetry ratio of the respiratory waveform is calculated according to the following calculation formula:

R _b，symm ＝mean(n _{valley，peak} (m)/n _{peak，valley} (m))；

5. respiration rate variability: in the respiration signal xb (n), the time difference n between adjacent peaks is calculated _b，peak (m+1)-n _b，peak (m) and calculating the time difference n between adjacent valleys _b，valley (m+1)-n _b，valley (m)；

Based on the time difference, the respiratory rate fluctuation is calculated according to the following calculation formula:

6. Heart rate variability: in the heartbeat signal xh (n), all the wave troughs n are found through peak value searching _h，valley (m)，m＝1，...，N _h，valley ，N _h，valley Representing heartbeat signal x _h The number of all valleys in (n); in the heartbeat signal, calculating to obtain the time difference n between adjacent wave peaks _h，peak (m+1)-n _h，peak (m) and calculating the time difference n between adjacent valleys _h，valley (m+1)-n _h，valley (m)；

Based on the time difference, heart rate fluctuation is calculated according to the following calculation formula:

of course, in other embodiments, the frequency domain features and the time domain features are not limited to the above, but may be other time-frequency domain features that are beneficial to gesture recognition, for example, may be frequency domain features in a power spectrum or a higher-order spectrum, and may also be other effective waveform features, which are not particularly limited herein.

And step 370, recognizing the gesture of the target object according to the first target feature and the second target feature to obtain a gesture recognition result.

And the gesture recognition result is used for indicating the gesture type of the target object. In one possible implementation, if the posture is a sleeping posture, the posture categories include lying flat, lying on the left side, lying on the right side, lying on the back, lying on the stomach, and lying on the stomach.

In one possible implementation, gesture recognition is implemented by invoking a gesture recognition model. Of course, in other embodiments, gesture recognition may also be implemented by a gesture recognition algorithm, which may refer to a principal component analysis (Principal Component Analysis, PCA) algorithm, for example.

FIG. 3 illustrates a method flow diagram of the gesture recognition process in one embodiment, as shown in FIG. 3, in particular, step 370 may include the steps of:

and 371, fusing the first target feature and the second target feature to obtain a target joint feature.

Wherein the fusing may be adding the first target feature and the second target feature, in particular, the target joint feature = first target feature + second target feature; it is also possible to stitch the first target feature with the second target feature, specifically, the target joint feature= [ first target feature, second target feature ], which is not limited herein.

Taking the concatenation as an example for illustration, based on the 8 frequency domain features and the 6 time domain features involved in step 350, the target joint feature F may be represented as follows:

F＝[E _b ，E _h ，R _bh，E ，SNR _b ，SNR _h ，R _bh，SNR ，R _h，12 ，R _b，k12 ，M _b，peak ，M _h，peak ，R _bh，P ，R _b，symm ，W _b ，W _h ]。

and 373, inputting the target combined features into a gesture recognition model to perform gesture category prediction, so as to obtain a gesture recognition result.

The gesture category prediction refers to calculating the probability that the target joint feature belongs to different gesture categories.

Taking the gesture as the sleeping gesture, for example, the gesture category comprises lying, lying on the left side and lying on the right side, the probability of the target joint feature belonging to lying on the flat is assumed to be P1, the probability of the target joint feature belonging to lying on the left side is assumed to be P2, the probability of the target joint feature belonging to lying on the right side is assumed to be P3, and the prediction result is that the target joint feature belongs to lying on the flat if the probability of the target joint feature P1 is the largest. Further, if p1 > =0.8 (setting a decision condition), the predicted result is taken as the gesture recognition result, that is, the gesture recognition result indicates that the gesture class of the target object is lying flat.

It should be noted that, the decision setting condition is set to prevent the erroneous recognition of the gesture category, so as to further improve the accuracy of gesture recognition, and the decision setting condition can be flexibly set according to the actual needs of the application scenario, which is not limited herein. The set decision condition may be regarded as a refusal criterion for refusing the predicted result as the gesture recognition result, and if P1 < 0.8, the predicted result is refused to be recognized as the gesture recognition result in the foregoing example.

After determining the gesture category of the target object according to the indication of the gesture recognition result, a corresponding intelligent service can be provided for the target object based on the gesture category of the target object.

In one application scenario, a notification device performs an action corresponding to a gesture category. For example, if the posture of the user is lying, the gateway can send a start instruction to the humidifier when the lying time of the user exceeds a set period (for example, 20 minutes), so as to control the humidifier to be started, and avoid the throat dryness and the like caused by easy calling of the user due to lying for a long time.

Through the process, the gesture of the target object is comprehensively predicted by utilizing the first target feature and the second target feature, namely different expressions of the gesture of the target object in the detection area in the time-frequency domain, so that higher recognition accuracy can be obtained.

Referring to fig. 4, in an exemplary embodiment, prior to step 330, the method may further include the steps of:

in step 410, it is determined whether the target object in the detection area is stationary.

It will be appreciated that if the gesture is a sleeping gesture, which is a sleeping gesture, there is no apparent action when the target object enters a sleeping state, and therefore, before gesture recognition is performed, it may be detected whether the target object is stationary in the detection area.

If it is detected that the target object is stationary in the detection area, it indicates that there is no obvious action of the target object, that is, the target object enters a sleep state, and at this time, gesture recognition is performed on the target object, that is, the process proceeds to step 430 to execute.

On the other hand, if it is detected that the target object is not stationary in the detection area, it means that the target object does not enter a sleep state, and at this time, it is unnecessary to recognize the gesture of the target object.

In this way, gesture recognition is only performed on the stationary target object in the detection area, which is beneficial to reducing the task processing amount of the electronic equipment and further beneficial to improving the processing efficiency of the electronic equipment.

In one possible implementation, as shown in fig. 5, the detection process may include the following steps:

In step 411, echo signals matching the set window length are acquired from the plurality of echo signals based on the sliding window of the set window length.

The plurality of echo signals are formed by reflecting a plurality of radar signals emitted by the detection device in the positioning process through the target object.

Still taking a body sensor as an example, for a millimeter wave radar configured in the body sensor, a transmitting antenna of the millimeter wave radar may transmit a plurality of millimeter wave signals (i.e., radar signals) to a target object in each time period, and accordingly, the millimeter wave radar may receive a plurality of echo signals formed by reflecting the plurality of millimeter wave signals through the target object through a receiving antenna.

As described above, the plurality of echo signals in step 411 may be a plurality of echo signals that are continuous in time in the same time period, or may be a plurality of echo signals that are continuous in time in different time periods, and are not limited herein.

Fig. 6 shows a schematic view of the sliding window movement. In fig. 6, assuming that the set window length of the sliding window 401 is 3, 3 echo signals 403 matching in number can be acquired from the plurality of echo signals 402 based on the set window length of the sliding window 401 for the plurality of echo signals 402.

In step 413, spectrum analysis is performed on the acquired echo signals, so as to obtain a plurality of distance data.

Wherein the distance data is used to indicate a radial distance between the target object and the detection device in the detection area.

Specifically, the spectrum analysis is to perform processing such as mixing, fast fourier transform, and modulo analysis on the echo signal, so that position data can be obtained.

As previously described, for a target object that maintains a pose, the position data includes, but is not limited to: the pose of the target object in the detection area, the radial distance of the target object in the detection area from the detection device. The distance data may thus be extracted from the position data, e.g. the position of the target object in the detection areaData, expressed asWherein ρ is ^* Representing the radial distance, θ, between the target object in the detection zone and the detection device ^* Horizontal angle for indicating gesture of target object in detection area, +.>Representing the pitch angle of the target object in the detection area for indicating the attitude. Then, distance data, denoted as θ, can be extracted from the position data ^* 。

In step 415, it is determined whether the target object in the detection area is stationary based on the plurality of distance data.

Specifically, if the radial distances indicated by the plurality of distance data are identical, it is determined that the target object in the detection area is stationary.

In contrast, if at least one of the distance data indicates a radial distance inconsistent with the radial distance indicated by the other distance data, the target object in the detection area is determined not to be stationary, and at this time, the sliding window is controlled to continue sliding in the echo signals.

With continued reference to fig. 6, for the first 2 echo signals, the radial distance indicated by the obtained distance data is a, and starting from the 3 rd echo signal, the radial distance indicated by the obtained distance data is b, and then, for the first 3 echo signals obtained by the sliding window 401, it is determined that the target object in the detection area is not stationary, and then, the sliding window 401 is controlled to continue sliding from the echo signal 403 corresponding to the distance data with inconsistent radial distances, that is, the sliding window 401 is updated to be the sliding window 401'.

Step 430, obtaining target data related to the posture of the target object in the detection area based on the positioning of the stationary target object in the detection area.

When the object is detected to be stationary in the detection area, the object in the detection area can be obtained by positioning the stationary object And then extracting target data from the position data. As previously described, the position data of the target object in the detection area is expressed asThe target data of the target object in relation to the posture in the detection area is expressed as +.>In a possible implementation, the target data is described by the pitch angle of the target object in the detection area for indicating the attitude, which can be further expressed as +.>Where n represents the number of vectors in the target data.

And step 450, performing interference elimination preprocessing on the target data.

The interference elimination preprocessing comprises unwrapping processing, difference solving processing and the like.

Specifically, the unwrapping process means that when the phase value of the target data is greater than pi or less than-pi, the phase value of the target data is always maintained between-pi and pi by subtracting 2 pi or adding 2 pi from the phase value of the target data.

The difference processing is to calculate the difference between the current target data and the next target data, and take the difference as the processed target data. For example, the processed target data is denoted as x (n), and the current target data is denoted asThe latter target data is denoted +.>Then (I) >

Of course, in other embodiments, the interference cancellation preprocessing may also be implemented by algorithms such as a high-pass filter, wavelet transform, etc., which are not particularly limited herein.

Then, after the interference cancellation preprocessing is completed, time-frequency domain feature extraction can be performed on the processed target data to obtain a first target feature and a second target feature, i.e., step 350 is performed.

Therefore, the interference elimination preprocessing of the target data is realized, and the signal offset in the target data, such as multipath interference caused by echo signals formed by the reflection of non-target objects by the millimeter wave radar, can be eliminated, so that the accuracy rate of gesture recognition is further improved.

In an exemplary embodiment, the gesture recognition model for gesture recognition is trained based on a base model, which may be a machine learning model or a deep learning model, including but not limited to: long Short-Term Memory (LSTM) networks, gated loop unit (GRU, gate Recurrent Unit) networks, long-Term dependency (Long-Term Dependencies) networks, SVM (Support Vector Machine ), CNN (Convolutional Neural Networks, convolutional neural network), RNN (Recurrent Neural Network ), or any combination thereof.

In one possible implementation manner, the gesture recognition model is obtained based on basic model training, specifically: and carrying out model training on the basic model according to the sample joint characteristics in the training set and the corresponding sample labels, and obtaining a gesture recognition model. The sample labels are used for indicating gesture categories marked for samples to which the corresponding sample joint features belong.

It is noted that the extraction process of the sample joint feature is substantially identical to the extraction process of the target joint feature described above, and the description thereof will not be repeated here.

Referring to fig. 7, in an exemplary embodiment, the model training process may include the steps of:

step 510, inputting the current sample joint feature in the training set into a basic model, and performing gesture recognition on the sample joint feature through the basic model to obtain a gesture prediction result.

In the process of carrying out gesture recognition on the sample joint features, forward propagation can be adopted to process the input sample joint features. The forward propagation is a process of predicting the posture category of the sample to which the sample joint feature belongs according to the sample joint feature, and accordingly, the output result is used for indicating the predicted posture category of the sample to which the sample joint feature belongs.

And step 530, obtaining a corresponding loss value based on the difference between the gesture prediction result and the sample label corresponding to the current sample joint feature.

Specifically, in the process of calculating the difference between the gesture prediction result and the sample label, the loss value of each layer can be obtained through back propagation calculation.

Wherein, back propagation is a process of calculating a loss value of each layer according to a difference between an output result of each layer and a sample tag.

In one possible implementation, the back propagation calculation refers to calculating the loss value of each layer according to the difference between the output result of each layer and the sample tag by using a loss function. The loss function may be a mean square error loss function, a cross entropy loss function, a regression loss function, etc., and is not limited herein.

And 550, adjusting the gradient of each model parameter in the basic model according to the loss value, and continuing training until the gradient of each model parameter meets the set convergence condition, and training by the basic model to obtain the gesture recognition model.

The model parameters are initialized when the basic model starts to perform model training and are continuously updated in the subsequent model training process, and when the parameters are updated, gradients of model parameters of each layer in the basic model can be calculated according to the loss values, and the model parameters are updated based on the gradients so as to obtain a trained gesture recognition model. The initialization may be any of random initialization, orthogonal initialization, and zero initialization, and is not particularly limited herein.

And if the gradient of each model parameter meets the set convergence condition, obtaining a trained gesture recognition model based on each updated model parameter.

Otherwise, if the gradient of each model parameter does not meet the set convergence condition, the next sample joint feature in the training set is continuously input into the basic model for training until the gradient of each model parameter meets the set convergence condition, and the basic model is trained to obtain a trained gesture recognition model.

The convergence condition may be set, which may mean that the gradient of each model parameter is not changed, or that the iteration number reaches the iteration threshold, or that all batches of samples complete training, and the convergence condition may be flexibly set according to the actual requirement of the application scenario, which is not limited herein. It should be noted that, in order to fully achieve the training efficiency and increase the convergence stability of the parameters of each model, all samples can be divided into a plurality of batches, so as to realize the parameter adjustment of the long-term and short-term memory network in a sample batch training mode.

In one possible implementation, adam (Adaptive momentum, adaptive motion estimation) algorithm is used to update the model parameters.

After updating the model parameters, the process returns to step 510, where the last sample joint feature in the training set is input into the long-short term memory network, and the model training on the long-short term memory network is continued.

After model training is completed, the gesture recognition model has gesture recognition capability, and can recognize the gesture of the target object according to the target joint characteristics.

Because the sleeping gesture is a continuous gesture, the long-term memory network is used as a basic model to train and predict the gesture recognition model, and the accuracy of sleeping gesture recognition is improved.

The long-term and short-term memory network comprises multiple layers, each layer comprises an LSTM unit, and each LSTM unit comprises a forgetting gate, an input gate, a tanh layer and an output layer.

Specifically, forget door f _t Input x for LSTM cell according to t _t And hidden layer state h of the t-1 th LSTM cell _t-1 Determining the cell state C of the LSTM cell from t-1 _t-1 Discarding part of the information; input gate i _t Input x for LSTM cell according to t _t And hidden layer state h of the t-1 th LSTM cell _t-1 Determining the cell state C of the LSTM cell from t-1 _t-1 Storing part of information; the tanh layer is used for determining the cell state C which can be increased to the t-th LSTM cell _t Candidate value of (2)Thereby, the door f is forgotten to pass _t Input gate i _t And a tanh layer for setting the cell state c of the t-1 th LSTM cell _t-1 Cell state C updated to the t-th LSTM cell _t The calculation formula is shown as (1):

wherein σ (·) represents a sigmoid function, W _f 、W _i And W is _C Forget gate f in the t-th LSTM unit _t Input gate i _t And weights for cell state update, b _f 、b _i And b _c Then it is the corresponding bias.

Further, the output layer is configured to output the output o of the t-th LSTM unit according to the output result _t And hidden layer state h _t The calculation formula is shown as (2):

wherein σ (·) represents a sigmoid function, W _o And b _o Is the weight and bias of the output gate in the t-th LSTM cell.

It should be noted that, in this embodiment, when the posture is a sleeping posture, the postureThe state type includes at least three types of lying, left lying and right lying, and in this embodiment, a linear layer is added to the standard long-short term memory network substantially, so that the output o of the last layer (assuming that t LSTM units are included, the last layer is the t LSTM unit) in the standard long-short term memory network is different from the standard long-short term memory network of two types _t Inputting the linear layer, calculating to obtain output result of the linear layer, i.e. o _d ＝Softmax(W _d ·o _t +b _d ) Wherein W is _d And b _d The weight and bias of the linear layer, respectively.

Thus, o, which is based on the maximum probability of linear layer output _d It is possible to identify which sleeping posture the target object is in the detection area.

Referring now to fig. 8, the following details will be given on the process of recognizing the sleeping gesture by using the gesture recognition model trained by using the long-short term memory network as the basic model:

training process:

first, a training set and a testing set of a long-term memory network are constructed.

The method comprises the steps of acquiring sample data under different sleeping positions, obtaining sample joint characteristics according to steps 310 to 330, and synchronously labeling sample labels, so that a training set and a testing set with the ratio of 4:1 are formed. In one possible implementation, the sample tag comprises at least a lie flat (0), a lie left (1) and a lie right (2).

Second, model parameters of each layer (also referred to as each LSTM unit) in the long-and-short-term memory network are initialized. In one possible implementation, the initializing of the model parameters includes: weights are initialized with orthogonality, and offsets are initialized with 0.

Third, forward propagation computes the output of each layer (also referred to as each LSTM unit) in the long-term memory network.

Specifically, the output of the t-1 LSTM unit is used as the input of the t LSTM unit, and the output of the t LSTM unit is calculated until the last linear layer is calculated.

And fourthly, selecting cross entropy as a loss function of the long-short-period memory network, and calculating loss values of all layers through back propagation.

And fifthly, calculating the gradient of each model parameter in the long-term memory network according to the loss value of each layer, and finishing updating the model parameters through an optimization algorithm when the gradient of each model parameter does not meet the set convergence condition.

In one possible implementation, the optimization algorithm includes Adam's algorithm.

And sixthly, when the gradient of each model parameter does not meet the set convergence condition, iterating the first step to the fifth step until the gradient of each model parameter in the long-short-period memory network meets the set convergence condition, and obtaining the gesture recognition model by the long-short-period memory network.

In order to fully achieve the training efficiency, in one possible implementation manner, the convergence stability of each model parameter is increased, and a batch training mode is adopted, that is, in one round of training (representing that all samples are trained once), all samples are batched, and whether the gradient of each model parameter meets the set convergence condition is judged, so that the next round of training is performed.

The prediction process comprises the following steps:

as shown in fig. 8, the gesture recognition process based on the gesture recognition model trained by the long-term memory network may include the following steps:

step 3731, inputting the target joint features into the first layer of the gesture recognition model to obtain an output result of the first layer.

Step 3733, inputting the output result of the first layer into the second layer of the gesture recognition model to obtain the output result of the second layer until the output result of the last layer of the gesture recognition model is obtained.

The output result is used for indicating the gesture category of the target object to which the target joint feature belongs.

Step 3735, if the output result of the last layer meets the set decision condition, the output result of the last layer is used as the gesture recognition result.

Under the action of the embodiment, the gesture recognition based on the long-short-term memory network is realized, and the gesture recognition method has better generalization capability for a continuous gesture detection process, so that the accuracy of gesture recognition can be fully ensured.

In an exemplary embodiment, after step 370, the method may further include the steps of:

and carrying out credibility configuration on vital signs of the target object under different postures according to the posture category, the first target feature and/or the second target feature indicated by the posture identification result.

As described above, if the gesture is a sleeping gesture, the accuracy of the vital signs of the target objects detected in different sleeping gestures is different, and thus, the sleep state judgment, the sleep stage analysis, and the like performed based on the vital signs deviate, so in this embodiment, after the gesture category indicated by the gesture recognition result is obtained, the reliability of the vital signs of the target objects in different gestures is configured by combining the first target feature and the second target feature, thereby assisting the user in performing more effective information analysis.

Taking the gesture as a sleeping gesture, vital signs comprise respiratory rate and heart rate as examples, and the process of carrying out credibility configuration on the respiratory rate and the heart rate of a target object under different sleeping gestures is described as follows:

in one possible implementation manner, according to the sleeping postures indicated by the posture identification result, reliability configuration is performed on the respiration rate and the heart rate of the target object under different sleeping postures.

TABLE 1 credibility of gesture class based configuration

As can be seen from table 1, the reliability of the respiration rate and the heart rate is highest when the sleeping posture is lying down.

In one possible implementation manner, according to the sleeping gesture indicated by the gesture recognition result and the peak signal-to-noise ratio (first target feature) of the set frequency band, reliability configuration is performed for the respiration rate and the heart rate of the target object under different sleeping gestures.

Firstly, the peak signal-to-noise ratio of the set frequency band is divided into a high signal-to-noise ratio, a medium signal-to-noise ratio and a low signal-to-noise ratio.

Let τ be _bl，SNR Is the signal-to-noise ratio lower threshold value of the respiratory frequency band, tau _bh，SNR An upper threshold for the signal-to-noise ratio of the respiratory band, then the peak signal-to-noise ratio SNR for the respiratory band in the first target feature _b In other words, if SNR _i ≤τ _bl，SNR Then consider a low signal-to-noise ratio if τ _bl，SNR ＜SNR _b ＜τ _bh，SNR Then consider the signal-to-noise ratio in the middle, if SNR _b ≥τ _bh，SNR A high signal to noise ratio is considered.

Similarly, let r _hlSNR Is the signal-to-noise ratio lower threshold value of the heartbeat frequency band, tau _hh，SNR An upper threshold for the signal-to-noise ratio of the heartbeat frequency band, then the peak signal-to-noise ratio SNR for the heartbeat frequency band in the first target feature _h In other words, if SNR _h ≤τ _hl，SNR Then consider a low signal-to-noise ratio if τ _hl，SNR ＜SNR _h ＜τ _hh，SNR Then consider the signal-to-noise ratio in the middle, if SNR _h ≥τ _hh，SNR A high signal to noise ratio is considered.

Therefore, the reliability configuration of the respiratory rate and the heart rate can be carried out by combining sleeping posture, high signal-to-noise ratio, medium signal-to-noise ratio and low signal-to-noise ratio.

TABLE 2 respiratory rate reliability based on sleep and respiratory band peak SNR configurations

As can be seen from table 2, when the sleeping posture is lying down, the reliability of the respiratory rate is highest when the peak signal-to-noise ratio of the respiratory frequency band is high.

TABLE 3 Heart Rate confidence level based on sleep position and Heartbeat frequency band Peak Signal-to-noise ratio configuration

As can be seen from table 3, when the sleeping posture is lying down, the reliability of the heart rate is highest when the peak signal-to-noise ratio of the heart beat frequency band is high.

Then, after determining the credibility of the vital signs of the target object under different sleeping postures, the sleeping state, the sleeping stage and the like of the user can be more accurately analyzed based on the credibility.

Specifically, in one possible implementation, the analysis process of the sleep stage may include the steps of: based on the sleeping gesture indicated by the gesture recognition result, vital sign detection is carried out on the target object, and vital sign data of the target object are obtained; determining a sleep stage result according to vital signs of a target object indicated by the vital sign data and the reliability configured for the vital signs of the target object in a sleeping posture; the notification device performs an action corresponding to the sleep stage indicated by the sleep stage result.

Wherein vital sign data is used to indicate vital signs of the target subject, such as respiration rate and heart rate. The sleep stage result is used to indicate the sleep stage of the target subject.

In one possible embodiment, the sleep stages include a deep sleep stage, a shallow sleep stage. It should be understood that the deep sleep stage indicates that the sleep state of the target subject is a deep sleep state; the light sleep stage indicates that the sleep state of the target subject is a light sleep state.

After determining the sleep stage of the target object according to the indication of the sleep stage result, a corresponding intelligent service can be provided for the target object based on the sleep stage of the target object.

In one application scenario, a notification device performs an action corresponding to a sleep stage. For example, if the sleep stage of the user is a shallow sleep stage, the gateway sends a control instruction to the intelligent lamp or the intelligent sound box to control the intelligent lamp to adjust the brightness of the intelligent lamp, or controls the intelligent sound box to play the relaxed music, so that the user can quickly enter the deep sleep stage, and the sleep quality of the user is improved.

For example, before the alarm clock is woken up, according to different sleep stages of the user, the gateway sends corresponding control instructions to the alarm clock to control the alarm clock to play different ring tones, and/or the gateway sends corresponding control instructions to the intelligent lamp to control the intelligent lamp to be started and adjust different brightness, so that the user is woken up.

Through the cooperation of the embodiments, the configuration of the credibility of the vital signs of the target object under different sleeping postures is realized, and more accurate data basis can be provided for analyzing the sleeping state and the sleeping stage of the user, so that the credibility of intelligent service is greatly improved, and the user experience is facilitated to be improved.

The following is an embodiment of the apparatus of the present application, which may be used to perform the gesture recognition method according to the present application. For details not disclosed in the embodiment of the apparatus of the present application, please refer to a method embodiment of the gesture recognition method related to the present application.

Referring to fig. 9, in an embodiment of the present application, a gesture recognition apparatus 900 is provided, including but not limited to: a data acquisition module 910, a feature extraction module 950, and a gesture recognition module 970.

The data acquisition module 910 is configured to obtain, based on positioning of the target object, target data related to a pose orientation of the target object in the detection area.

The feature extraction module 950 is configured to perform time-frequency domain feature extraction on the target data to obtain a first target feature and a second target feature; the first target feature is a frequency domain representation of the pose of the target object in the detection region and the second target feature is a time domain representation of the pose of the target object in the detection region.

The gesture recognition module 970 is configured to recognize the gesture of the target object according to the first target feature and the second target feature, so as to obtain a gesture recognition result.

In an exemplary embodiment, the feature extraction module includes: the frequency domain transformation unit is used for carrying out frequency domain transformation on the target data to obtain a plurality of frequency domain signals of the target data in a frequency domain, wherein each frequency domain signal corresponds to one frequency point in the frequency domain; the time domain conversion unit is used for obtaining a first target feature from the frequency domain signal corresponding to the intermediate frequency point of the set frequency band, and obtaining a second target feature through time domain conversion of the frequency domain signal corresponding to the intermediate frequency point of the set frequency band.

In an exemplary embodiment, the apparatus further comprises: the model training module is used for carrying out model training on the basic model according to the sample joint characteristics in the training set and the corresponding sample labels thereof to obtain the gesture recognition model, and the sample labels are used for indicating gesture categories marked for samples to which the corresponding sample joint characteristics belong; the model training module comprises: the sample input unit is used for inputting the current sample joint characteristic in the training set into the basic model, and carrying out gesture recognition on the sample joint characteristic through the basic model to obtain a gesture prediction result; the loss calculation unit is used for obtaining a corresponding loss value through back propagation calculation based on the sample label corresponding to the combined characteristic of the gesture prediction result and the current sample; and the convergence unit is used for adjusting the gradient of each model parameter in the basic model according to the loss value and continuing training until the gradient of each model parameter meets the set convergence condition, and training by the basic model to obtain the gesture recognition model.

It should be noted that, in the gesture recognition apparatus provided in the foregoing embodiment, only the division of the functional modules is used for illustration, and in practical application, the above-mentioned function allocation may be performed by different functional modules according to needs, that is, the internal structure of the gesture recognition apparatus is divided into different functional modules to perform all or part of the functions described above.

In addition, the gesture recognition apparatus and the gesture recognition method provided in the foregoing embodiments belong to the same concept, and the specific manner in which each module performs the operation has been described in detail in the method embodiment, which is not described herein again.

Fig. 10 shows a schematic structure of an electronic device according to an exemplary embodiment. The electronic device is suitable for use at the server side 170 of the implementation environment shown in fig. 1.

It should be noted that the electronic device is only an example adapted to the present application, and should not be construed as providing any limitation on the scope of use of the present application. Nor should the electronic device be construed as necessarily relying on or necessarily having one or more of the components of the exemplary electronic device 2000 illustrated in fig. 10.

The hardware structure of the electronic device 2000 may vary widely depending on the configuration or performance, as shown in fig. 10, the electronic device 2000 includes: a power supply 210, an interface 230, at least one memory 250, and at least one central processing unit (CPU, central Processing Units) 270.

Specifically, the power supply 210 is configured to provide an operating voltage for each hardware device on the electronic device 2000.

The interface 230 includes at least one wired or wireless network interface 231 for interacting with external devices. For example, interactions between smart device 130 and server side 170 in the implementation environment shown in FIG. 1 are performed.

Of course, in other examples of the adaptation of the present application, the interface 230 may further include at least one serial-parallel conversion interface 233, at least one input-output interface 235, at least one USB interface 237, and the like, as shown in fig. 10, which is not particularly limited herein.

The memory 250 may be a carrier for storing resources, such as a read-only memory, a random access memory, a magnetic disk, or an optical disk, where the resources stored include an operating system 251, application programs 253, and data 255, and the storage mode may be transient storage or permanent storage.

The operating system 251 is used for managing and controlling various hardware devices and applications 253 on the electronic device 2000, so as to implement the operation and processing of the cpu 270 on the mass data 255 in the memory 250, which may be Windows server, mac OS XTM, unixTM, linuxTM, freeBSDTM, etc.

The application 253 is a computer program that performs at least one specific task based on the operating system 251, and may include at least one module (not shown in fig. 10), each of which may respectively include a computer program for the electronic device 2000. For example, the gesture recognition apparatus may be considered as an application 253 deployed on the electronic device 2000.

The data 255 may be a photograph, a picture, or the like stored in a disk, or may be target data or the like, and is stored in the memory 250.

The central processor 270 may include one or more processors and is configured to communicate with the memory 250 via at least one communication bus to read the computer program stored in the memory 250, thereby implementing the operation and processing of the bulk data 255 in the memory 250. The gesture recognition method is accomplished, for example, by the central processor 270 reading a series of computer programs stored in the memory 250.

Furthermore, the present application can be realized by hardware circuitry or by a combination of hardware circuitry and software, and thus, the implementation of the present application is not limited to any specific hardware circuitry, software, or combination of the two.

Referring to fig. 11, in an embodiment of the present application, an electronic device 4000 is provided, and the electronic device 400 may include:

in fig. 11, the electronic device 4000 includes at least one processor 4001, at least one communication bus 4002, and at least one memory 4003.

Wherein the processor 4001 is coupled to the memory 4003, such as via a communication bus 4002. Optionally, the electronic device 4000 may further comprise a transceiver 4004, the transceiver 4004 may be used for data interaction between the electronic device and other electronic devices, such as transmission of data and/or reception of data, etc. It should be noted that, in practical applications, the transceiver 4004 is not limited to one, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.

The processor 4001 may be a CPU (Central Processing Unit ), general purpose processor, DSP (Digital Signal Processor, data signal processor), ASIC (Application Specific Integrated Circuit ), FPGA (Field Programmable Gate Array, field programmable gate array) or other programmable logic device, transistor logic device, hardware components, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules and circuits described in connection with this disclosure. The processor 4001 may also be a combination that implements computing functionality, e.g., comprising one or more microprocessor combinations, a combination of a DSP and a microprocessor, etc.

The communication bus 4002 may include a pathway to transfer information between the aforementioned components. The communication bus 4002 may be a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus or an EISA (Extended Industry Standard Architecture ) bus, or the like. The communication bus 4002 can be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 11, but not only one bus or one type of bus.

Memory 4003 may be, but is not limited to, ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, RAM (Random Access Memory ) or other type of dynamic storage device that can store information and instructions, EEPROM (Electrically Erasable Programmable Read Only Memory ), CD-ROM (Compact Disc Read Only Memory, compact disc Read Only Memory) or other optical disk storage, optical disk storage (including compact discs, laser discs, optical discs, digital versatile discs, blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

The memory 4003 has stored thereon a computer program, and the processor 4001 reads the computer program stored in the memory 4003 through the communication bus 4002.

The computer program, when executed by the processor 4001, implements the gesture recognition method in the above embodiments.

Further, in an embodiment of the present application, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the gesture recognition method in each of the above embodiments.

In an embodiment of the application, a computer program product is provided, which comprises a computer program stored in a storage medium. The processor of the computer device reads the computer program from the storage medium, and the processor executes the computer program so that the computer device executes the gesture recognition method in the above embodiments.

Compared with the related art, the sleeping gesture of the target object is comprehensively predicted by combining the time-frequency domain expression of vital signs such as respiration, heartbeat and the like, so that the sleeping gesture recognition accuracy is high, and meanwhile, different sleeping gestures can be accurately recognized by utilizing the millimeter wave radar without adding other sensors; the sleep gesture recognition by using the long-term and short-term memory network has better generalization capability, and is further beneficial to improving the accuracy of sleep gesture recognition; in addition, based on the credibility of the vital sign configuration of the target object under different sleeping postures, the user is assisted in analyzing the sleeping states, the sleeping phases and the like, and the accuracy of analysis is improved.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

The foregoing is only a partial embodiment of the present application, and it should be noted that it will be apparent to those skilled in the art that modifications and adaptations can be made without departing from the principles of the present application, and such modifications and adaptations are intended to be comprehended within the scope of the present application.

Claims

1. A gesture recognition method, comprising:

based on the positioning of the target object, obtaining target data related to the gesture of the target object in a detection area;

extracting time-frequency domain characteristics of the target data to obtain a first target characteristic and a second target characteristic; the first target feature is a frequency domain representation of the pose of the target object in the detection region, and the second target feature is a time domain representation of the pose of the target object in the detection region;

and recognizing the gesture of the target object according to the first target feature and the second target feature to obtain a gesture recognition result.

2. The method of claim 1, wherein the performing time-frequency domain feature extraction on the target data to obtain a first target feature and a second target feature comprises:

performing frequency domain transformation on the target data to obtain a plurality of frequency domain signals of the target data in a frequency domain, wherein each frequency domain signal corresponds to each frequency point in the frequency domain;

Extracting frequency domain characteristics of frequency domain signals corresponding to all frequency points in a set frequency band to obtain first target characteristics;

and performing time domain transformation on the frequency domain signals corresponding to the frequency points in the set frequency band to obtain a second target feature.

3. The method of claim 1, wherein the identifying the pose of the target object based on the first target feature and the second target feature to obtain the pose identification result comprises:

fusing the first target feature and the second target feature to obtain a target joint feature;

and inputting the target combined features into a gesture recognition model to perform gesture category prediction, so as to obtain the gesture recognition result.

4. A method as claimed in claim 3, wherein the method further comprises: according to the sample joint characteristics in the training set and the corresponding sample labels, carrying out model training on the basic model to obtain the gesture recognition model, wherein the sample labels are used for indicating gesture categories marked for samples to which the corresponding sample joint characteristics belong;

model training is carried out on the basic model according to the combined characteristics of the samples in the training set and the corresponding sample labels to obtain the gesture recognition model, and the method comprises the following steps:

Inputting the current sample joint feature in the training set into the basic model, and carrying out gesture recognition on the sample joint feature through the basic model to obtain a gesture prediction result;

obtaining a corresponding loss value based on the difference between the gesture prediction result and a sample label corresponding to the current sample joint characteristic;

and adjusting the gradient of each model parameter in the basic model according to the loss value, and continuing training until the gradient of each model parameter meets the set convergence condition, and training by the basic model to obtain the gesture recognition model.

5. The method of claim 1, wherein prior to obtaining target data related to a pose of the target object in the detection zone based on the positioning of the target object, the method further comprises:

acquiring echo signals matched with the set window length from a plurality of echo signals based on a sliding window with the set window length; the plurality of echo signals are formed by reflecting a plurality of radar signals emitted by the detection equipment in the positioning process through the target object;

performing spectrum analysis on the acquired echo signals to obtain a plurality of distance data, wherein the distance data are used for indicating the radial distance between the target object and the detection equipment in the detection area;

Determining whether the target object in the detection area is stationary according to a plurality of the distance data;

if not, the sliding window is controlled to continue sliding in the echo signals.

6. The method of any one of claims 1 to 5, wherein the gesture is a sleeping gesture and the gesture recognition result is used to indicate the sleeping gesture of the target object in the detection area.

7. The method of claim 6, wherein the method further comprises:

and the notification device executes an action corresponding to the sleeping gesture indicated by the gesture recognition result.

8. The method of claim 6, wherein the method further comprises:

based on the sleeping gesture indicated by the gesture recognition result, vital sign detection is carried out on the target object to obtain vital sign data of the target object, wherein the vital sign data are used for indicating vital signs of the target object;

determining a sleep stage result according to vital signs of the target object indicated by the vital sign data and the reliability configured for the vital signs of the target object in the sleeping posture;

and notifying the equipment to execute the action corresponding to the sleep stage indicated by the sleep stage result.

9. The method of claim 8, wherein the determining a sleep stage outcome is preceded by determining a confidence level for the vital sign of the target subject indicated by the vital sign data and configured for the vital sign of the target subject in different sleeping positions, the method further comprising:

and carrying out credibility configuration on vital signs of the target object under the sleeping posture according to the sleeping posture indicated by the posture identification result, the first target characteristic and/or the second target characteristic.

10. A gesture recognition apparatus, comprising:

the data acquisition module is used for acquiring target data related to the gesture of the target object in the detection area based on the positioning of the target object;

the feature extraction module is used for extracting time-frequency domain features of the target data to obtain a first target feature and a second target feature; the first target feature is a frequency domain representation of the pose of the target object in the detection region, and the second target feature is a time domain representation of the pose of the target object in the detection region;

and the gesture recognition module is used for recognizing the gesture of the target object according to the first target feature and the second target feature to obtain a gesture recognition result.

11. An electronic device, comprising: at least one processor, at least one memory, and at least one communication bus, wherein,

the memory stores a computer program, and the processor reads the computer program in the memory through the communication bus;

the computer program, when executed by the processor, implements the gesture recognition method of any one of claims 1 to 9.

12. A storage medium having stored thereon a computer program, which when executed by a processor implements the gesture recognition method according to any one of claims 1 to 9.