WO2022161026A1

WO2022161026A1 - Action recognition method and apparatus, and electronic device and storage medium

Info

Publication number: WO2022161026A1
Application number: PCT/CN2021/139746
Authority: WO
Inventors: 裴璇; 郭彦东
Original assignee: Oppo广东移动通信有限公司
Priority date: 2021-01-28
Filing date: 2021-12-20
Publication date: 2022-08-04
Also published as: CN112817450A; CN114167984A; CN114167984B

Abstract

Disclosed are an action recognition method and apparatus, and an electronic device and a storage medium. The method comprises: acquiring sensor data of a pre-set window length on a time sequence; inputting the sensor data into an action recognition model, and acquiring an action classification result output by the action recognition model; and if it is determined, on the basis of the action classification result, that an action occurs, controlling an electronic device to respond to an operation corresponding to the action. By means of the method, acquired sensor data of a pre-set window length on a time sequence is recognized by means of an action recognition model, and whether an action occurs can be quickly and accurately recognized, such that the accuracy of action recognition is improved, and thus, when it is determined that an action occurs, an electronic device can be controlled to respond, in real time, to an operation corresponding to the action.

Description

Motion recognition method, device, electronic device and storage medium

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to Chinese Application No. 202110118213.4 filed on January 28, 2021, which is hereby incorporated by reference in its entirety for all purposes.

technical field

The present application relates to the technical field of human motion recognition, and more particularly, to a motion recognition method, device, electronic device, and storage medium.

Background technique

Human action mainly refers to the way the human body moves and the human response to the environment or objects. The human body describes or expresses complex human actions through the complex movements of the limbs. It can be said that most of the actions of the human body need to be reflected through the movement of the human body. It is a very effective way to analyze the movement of the human body by studying and exploring the movement of the human body.

SUMMARY OF THE INVENTION

In view of the above problems, the embodiments of the present application propose a motion recognition method, apparatus, electronic device, and storage medium to improve the above problems.

In a first aspect, an embodiment of the present application provides an action recognition method, the method includes: acquiring sensor data with a preset window length in a time series; inputting the sensor data into an action recognition model, and acquiring the action recognition model The output action classification result; if it is determined that an action occurs based on the action classification result, the electronic device is controlled to respond to an operation corresponding to the action.

In a second aspect, an embodiment of the present application provides a motion recognition device, the device includes: a data acquisition unit for acquiring sensor data of a preset window length on a time series; a result output unit for The data is input into the action recognition model, and the action classification result output by the action recognition model is obtained; the operation execution unit is used to control the electronic device to respond to the action corresponding to the action if it is determined that an action occurs based on the action classification result. operate.

In a third aspect, embodiments of the present application provide an electronic device, including one or more processors and a memory; one or more programs are stored in the memory and configured to be executed by the one or more processors Execute to implement the above method.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where a program code is stored in the computer-readable storage medium, wherein the above method is executed when the program code is executed by a processor.

Description of drawings

In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the drawings that are used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those skilled in the art, other drawings can also be obtained from these drawings without creative effort.

FIG. 1 shows a schematic diagram of an application environment of an action recognition method proposed by an embodiment of the present application;

FIG. 2 shows a flowchart of an action recognition method proposed by an embodiment of the present application;

FIG. 3 shows a schematic structural diagram of an action recognition model proposed by an embodiment of the present application;

FIG. 4 shows a schematic diagram of a shaking head motion according to an embodiment of the present application;

FIG. 5 shows a schematic diagram of a shaking motion without shaking according to an embodiment of the present application;

FIG. 6 shows a flowchart of an action recognition method proposed by another embodiment of the present application;

FIG. 7 shows a flowchart of an action recognition method proposed by another embodiment of the present application;

FIG. 8 shows a structural block diagram of a motion recognition device proposed by an embodiment of the present application;

FIG. 9 shows a structural block diagram of another motion recognition device proposed by an embodiment of the present application;

FIG. 10 shows a structural block diagram of still another action recognition device proposed by an embodiment of the present application;

FIG. 11 shows a structural block diagram of an electronic device for executing the motion recognition method according to an embodiment of the present application in real time of the present application;

FIG. 12 shows a storage unit in real time of the present application for storing or carrying program codes for implementing the motion recognition method according to the embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present application.

The inventor found in the research on the related action recognition methods that the accuracy of the related action recognition methods for recognizing human actions still needs to be improved.

Therefore, the inventor proposes the method, device, electronic device, and storage medium for motion recognition in the embodiments of the present application. By acquiring sensor signal data with a preset window length, and then inputting the sensor signal data into the motion recognition model, the motion recognition model is obtained. The output action classification result, and finally, if it is determined that an action occurs based on the action classification result, the electronic device is controlled to respond to an operation corresponding to the action. The sensor data of the preset window length in the acquired time series can be recognized by the motion recognition model, which can quickly and accurately identify whether there is an action, which improves the accuracy of the action recognition, and can then determine when an action occurs. , the control electronic device responds in real time to the operation corresponding to the action.

The following is an introduction to the application environment of the action recognition method provided by the implementation of the present invention:

Referring to FIG. 1 , the motion recognition method provided by the implementation of the present invention can be applied to a human-computer interaction system 100 , and the human-computer interaction system 100 may include an electronic device 110 and a head-mounted device 120 . As shown in FIG. 1 , the electronic device 110 may be a smart phone, a tablet computer, a smart wearable device (such as a smart bracelet, a smart watch, etc.), a smart screen, a gateway, a vehicle-mounted device, a laptop, etc. The device 120 may be a headset, such as a wireless Bluetooth headset or a wired headset. The electronic device 110 and the head-mounted device 120 may be connected wirelessly, or may be connected by a physical connection line. The electronic device 110 and the head mounted device 120 may establish a communication link through a wireless communication protocol, where the wireless communication protocol may include a Wlan protocol, a Bluetooth protocol, or a ZigBee protocol.

The electronic device 110 may include a motion recognition module for performing motion recognition on the sensor data collected by the head mounted device 120, and then the electronic device 110 may be controlled through the motion recognition result. The electronic device 110 may be connected to one head mounted device 120 , or may be connected to multiple head mounted devices 120 . The multiple head-mounted devices may be two or more Bluetooth headsets.

In addition, the electronic device 110 may also establish a network connection with other electronic devices through a wired network or a wireless network. Such as via Wi-Fi connection, via mobile wireless network connection, etc.

The embodiments of the present application will be described in detail below with reference to the accompanying drawings.

Referring to FIG. 2, an action recognition method provided by an embodiment of the present application is applied to an electronic device, and the method includes:

S110: Acquire sensor data of a preset window length on a time series.

In one way, the preset window length is a window with a certain length preset in the electronic device, the length of the window can be set to 64ms, and the step size of each sliding of the window in the time series can be set to 12ms; the sensor data is the data collected and sent by the head-mounted device in real time.

In the embodiment of the present application, before acquiring the sensor data of the preset window length in the time series, it is also possible to detect whether the electronic device is in a connected state with the head-mounted device. Sensor data for preset window lengths over time series.

Specifically, the connection state between the electronic device and the head-mounted device may include a connected state and a non-connected state, wherein the non-connected state includes a non-connected state and a connection interruption state.

As one of the methods, the connection state between the electronic device and the head-mounted device can be judged by checking the state value of the electronic device. Specifically, two different state values can be set for the electronic device in advance. When the electronic device is connected When the head-mounted device is used, the first state value is returned, and when the electronic device is not connected to the head-mounted device, the second state value is returned, so that whether the electronic device is connected to the head-mounted device can be determined by detecting the first state value and the second state value. The device is connected. Exemplarily, the first state value of the electronic device is set to 1 in advance, and the second state value of the electronic device is set to 0. If the state value of the electronic device is detected to be 1, it is determined that the electronic device is connected to the head-mounted device. If it is detected that the status value returned by the electronic device is 0, it is determined that the electronic device and the headset are in a disconnected state. Optionally, if it is detected in time sequence that the state value returned by the electronic device changes from 1 to 0 at adjacent moments, it is determined that the connection between the electronic device and the head-mounted device is in a disconnected state.

As another method, the electronic device sends a broadcast when the head-mounted device is connected and disconnected, so the electronic device can determine whether the electronic device is in a connected state with the head-mounted device by monitoring the broadcast.

In the embodiment of the present application, the acquisition of the sensor data of the preset window length in the time series is to acquire the sensor data sent by the head-mounted device in the time sequence. The head-mounted device collects sensor data in real time, and collects sensor data from the electronic device in real time. The electronic device saves the sensor data in real time. When the data length of the sensor data obtained by the electronic device meets the preset preset window length, the preset window length is set. The sensor data is sent to the motion recognition module in the electronic device.

S120: Input the sensor data into an action recognition model, and obtain an action classification result output by the action recognition model.

In the embodiment of the present application, the action recognition model includes a first convolutional layer, a second convolutional layer, a maximum pooling layer, a third convolutional layer, a fourth convolutional layer, and a global average pooling layer that are connected in sequence , the fully connected layer and the softmax layer; the first convolutional layer and the second convolutional layer are convolutional layers with a convolution kernel of 7 and a dimension of 64; the third convolutional layer and the fourth convolutional layer The convolutional layer is a convolutional layer with a convolution kernel of 7 and a dimension of 128. The structure of the action recognition model is shown in Figure 3.

In Figure 3, the functions of the first convolutional layer and the second convolutional layer are to extract the features of the sensor data with a preset window length on the acquired time series; the function of the maximum pooling layer It is to reduce the dimension of the sensor data of the preset window length on the acquired time series and the translation invariance of the sensor data of the preset window length on the acquired time series to a certain extent; the third convolution layer and the fourth convolution layer two The function of each convolutional layer is to further extract the high-order features of the sensor data with a preset window length on the acquired time series and increase the dimension to maintain the richness of information after the feature scale is reduced; the global average pooling layer The function is to collect the features detected at each position in the sensor data of the preset window length on the acquired time series to enhance the translation invariance; the function of the fully connected layer is to convert all the features into the logits value of each action category; Softmax The role of the layer is to convert the logits value into a probability value that sums to 1.

Also not shown in Figure 3 is that a Relu activation function is added after each convolutional layer to enhance the non-linear capability of the action recognition model. In the training of the action recognition model, a dropout layer with a probability of 0.5 is added before the fully connected layer. In this action recognition model, the function of the dropout layer is to randomly set the value of half of the neurons to 0, according to the remaining half The neurons are used to predict the result to enhance the generalization ability of the action recognition model.

In this embodiment of the present application, the loss function used in the action recognition model is a cross-entropy loss function. Among them, the formula of the cross entropy loss function is as follows:

Among them: M represents the number of action categories; y _ic represents the indicator variable (0 or 1), if the predicted action category is the same as the action category of the observed sample i, it is 0, if the predicted action category is the same as the observed sample i. The difference is 1; _pic represents the predicted probability that the observed sample i belongs to the action category c.

Based on the above cross-entropy loss function, the action recognition model is iteratively trained to obtain the gradient of the action recognition model, and the stochastic gradient descent method is used to update the parameters of the action recognition model until the maximum number of iterations is reached, and the trained action recognition model is obtained. Among them, the action recognition model is a convolutional neural network model.

Optionally, the motion recognition model is in the motion recognition module of the electronic device, when the motion recognition module receives the sensor data of the preset window length, the sensor data of the preset window length is input into the motion recognition model, and the action The recognition model infers the sensor data of the preset window length, outputs the action classification result, and returns the action classification result to the electronic device.

S130: If it is determined that an action occurs based on the action classification result, control the electronic device to respond to an operation corresponding to the action.

In this embodiment of the present application, the action may be a nodding action and a shaking head action, and the action may also be other head actions, such as turning the head to the left, turning the head to the right, etc., which are not specifically limited here. ; The operation corresponding to the action can be a preset operation that the electronic device can automatically perform, such as a page turning operation and a return operation. Optionally, the operation corresponding to the action can also be a user-defined operation. This is not specifically limited.

As one way, if the operations corresponding to the nodding action and the shaking action are page turning and returning operations, when it is determined that the nodding action occurs, the electronic device is controlled to respond to the page turning operation corresponding to the nodding action; when it is determined that there is a shaking action When this occurs, the control electronics respond to a return operation corresponding to the shaking motion. Optionally, when it is determined that a nodding action occurs, the electronic device can also be controlled to respond to a return operation corresponding to the nodding action; when it is determined that a shaking action occurs, the electronic device can also be controlled to respond to a page turning operation corresponding to the shaking action. .

Specifically, the electronic device is pre-configured with a method for giving action judgment according to the action classification result, which is called a decision strategy here. The electronic device combines the action classification result output by the action recognition model and the preset decision strategy for action judgment to determine whether an action has occurred. If it is determined by the decision strategy that a nodding action occurs, the electronic device is controlled to respond to the page turning operation corresponding to the nodding action. Or return operation; if it is determined through the decision strategy that a shaking motion occurs, the electronic device is controlled to respond to a returning operation or a page turning operation corresponding to the shaking motion.

Further, the preset decision-making strategy can also be adjusted according to the action classification results output by the action recognition model. For example, the same action category appears several times in the n results, or the same action category appears several times in a row, then The decision-making strategy can be adjusted to determine that an action occurs when the same action category occurs several times in a row.

Optionally, when determining whether an action occurs, it may also be detected whether the magnitude of the action exceeds a preset range, and if the magnitude of the action exceeds the preset range, it is determined that an action occurs. For example, the amplitude of the nodding action is set to face the ground downwards by 10 degrees. If it is detected that the user's head is facing the ground downwards by more than 10 degrees, it is determined that the nodding action has occurred; if it is detected that the user's head is facing the ground If the magnitude of the downward movement does not exceed 10 degrees, it is determined that no nodding action occurs. Correspondingly, when determining whether there is a shaking motion, it can be detected whether the left or right movement of the user's head exceeds the preset range, and then when it is detected that the user's head moves left or right more than When the preset amplitude is used, it can be determined that a shaking motion occurs; if it is detected that the amplitude of the left or right movement of the user's head does not exceed the preset amplitude, it is determined that no shaking motion occurs. Exemplarily, the preset amplitude of the shaking motion may be set to move 40 degrees to the left or right in advance, and when it is detected that the amplitude of the user's head moving to the left or right exceeds 40 degrees, it can be determined that a shaking motion occurs. , as shown in Figure 4; if it is detected that the amplitude of the user's head movement to the left or right does not exceed 40 degrees, it is determined that no shaking action occurs, as shown in Figure 5.

In an action recognition method provided in this embodiment, sensor signal data with a preset window length is first obtained, then the sensor signal data is input into an action recognition model, and an action classification result output by the action recognition model is obtained. When an action occurs, the control electronics respond to an operation corresponding to the action. Through the above method, the sensor data of the preset window length in the acquired time series can be identified through the action recognition model, so that whether there is an action can be quickly and accurately identified, the accuracy of the action identification can be improved, and then it can be determined When an action occurs, the control electronic device responds to the operation corresponding to the action in real time.

Referring to FIG. 6, an action recognition method provided by an embodiment of the present application is applied to an electronic device, and the method includes:

S210: When a designated event is detected, start acquiring sensor data collected by an acceleration sensor in the head mounted device, so as to acquire sensor data with a preset window length on a time series.

In one way, the sensor data is sensor data collected by an acceleration sensor built in the head-mounted device when the head-mounted device is in a wearing state. Wherein, the head-mounted device is a wireless Bluetooth headset; the acceleration sensor is a three-axis acceleration sensor.

Therefore, before acquiring the sensor data of the preset window length in the time series, it is necessary to detect whether the wireless Bluetooth headset is in a wearing state. Optionally, the infrared sensor set in the wireless Bluetooth headset can be used to detect whether the wireless Bluetooth headset is in a wearing state. . Among them, it should be noted that when the wireless Bluetooth headset is worn on the human ear, some areas will be blocked. In this case, the area that will be blocked after the wireless Bluetooth headset is worn can be set. The method of infrared sensor, and then can determine whether the infrared signal emitted by the infrared sensor is blocked by detecting the state value returned by the infrared sensor, so as to determine whether the wireless Bluetooth headset is in the wearing state or not. Then it can be understood that when the returned state value indicates that the infrared signal is blocked, it is determined that the wireless Bluetooth headset is in a wearing state, and when the returned state value indicates that the infrared signal is not blocked, it is determined that the wireless Bluetooth headset is not wearing.

Further, after it is determined that the wireless Bluetooth headset is in a wearing state, it is also possible to obtain whether the state of the wireless Bluetooth headset is in a powered-on state. Yes, the wireless Bluetooth headset can at least include a locked state, an unlocked state, an off state, an on state, a sleep state, or a combination of several of these states, for example, a locked and off state, a locked and off state, and an unlocked and powered on state. Status, locked and powered off status, etc., are not limited here. Specifically, when the wireless Bluetooth headset is in a locked state, it means that multiple function keys or function buttons of the wireless Bluetooth headset are inoperable, so as to prevent the user from accidentally touching the wireless Bluetooth headset and to prevent the wireless Bluetooth headset from being used by others without the user's permission; when the wireless Bluetooth headset is inoperable When it is in the unlocked state, it means that multiple function keys or function buttons of the wireless Bluetooth headset can be operated, so as to facilitate the user to adjust the functions of the wireless Bluetooth headset, such as turning up the volume, lowering the volume, etc.; when the wireless Bluetooth headset is turned on , it means that the wireless bluetooth headset can currently be used; when the wireless bluetooth headset is turned off, it means that the wireless bluetooth headset is currently unavailable; when the wireless bluetooth headset is in a dormant state, it means that the wireless bluetooth headset is currently in a standby state.

In this embodiment of the present application, it can be determined whether the wireless bluetooth headset is powered on by detecting whether multiple function keys or function buttons of the wireless bluetooth headset are in a normal working state, or whether the wireless bluetooth headset can currently be used. Specifically, if it is detected that multiple function keys or function buttons of the wireless Bluetooth headset are in a normal working state, or it is detected that the wireless Bluetooth headset can currently be used, it is determined that the wireless Bluetooth headset is in a powered-on state.

Through the above method, after it is determined that the wireless Bluetooth headset is turned on and worn, it can detect whether a specified event occurs, and when a specified event is detected, the built-in acceleration sensor in the wireless Bluetooth headset starts to collect sensor data to obtain a time series. sensor data over a preset window length. The specified event may be any preset event that can trigger the wireless Bluetooth headset to send sensor data to the electronic device, such as an incoming call event.

Optionally, whether the specified event is detected may be determined by detecting whether the electronic device receives a specific identification. Specifically, different identifiers can be set for different events in advance. When an event occurs, the electronic device will first receive the identifier corresponding to the event, and then can determine whether the event is detected by detecting whether the identifier is received. flag, it is determined that an event has been detected. Further, the electronic device may identify the received identification, so as to determine whether the specified event is detected by identifying the identification.

Exemplarily, different identifiers are set for different events in advance, wherein the events may include incoming call events, power-on events, and message receiving events, etc. Specifically, the identifiers corresponding to the above events may be respectively set as: the identifiers corresponding to the incoming call events are set. For LDSJ, the logo corresponding to the boot event is set to KJSJ, and the logo corresponding to the message receiving event is set to XXSJ. Furthermore, a specified event can be set in the electronic device in advance, and when the specified event is detected, the sensor data collected by the acceleration sensor in the wireless Bluetooth headset is started to be acquired. For example, the specified event is set as an incoming call event in advance, and then when the electronic device receives the identification and determines that an event has occurred, it starts to identify the identification. If the identification is identified as LDSJ, it is determined that the specified event is detected.

In the embodiment of the present application, since the acceleration sensor is a three-axis acceleration sensor, when the wireless Bluetooth headset is turned on and in the wearing state, when a specified event is detected, the built-in three-axis acceleration sensor of the wireless Bluetooth headset starts to Sensor data is collected in three dimensions: X, Y, and Z.

S220: Acquire a value of the data to be input, where the value of the data to be input is an average value of the values of all channels of the sensor data of the preset window length.

Through the above method, after acquiring the sensor data of the preset window length on the time series, the average value of the values of the X, Y, and Z channels of each sensor data in the sensor data of the preset window length is acquired, and the preset window length is obtained. The average value of the values of the X, Y, and Z channels of each sensor data in the sensor data of the window length is used as the value of the data to be input, wherein the data to be input needs to be input into the action recognition model for action recognition. sensor data.

S230: Input the value of the data to be input into the action recognition model, and obtain the action classification result output by the action recognition model.

As a method, the value of the data to be input obtained by the above method is input into the action recognition model, and the input data to be recognized is recognized by the action recognition model, and then the action classification result output by the action recognition model can be obtained.

S240: Perform zero return processing on the data to be input.

As a method, in order to avoid previous sensor data from affecting the sensor data input to the action recognition model later, after the value of the data to be input is input into the action recognition model, the data to be input is preprocessed to zero.

When the data to be input is obtained next time, the value of the data to be input is set as the average value of all channels of the sensor data of the preset window length currently obtained to obtain the new data to be input, and then the new data to be input is obtained. The input data is input into the action recognition model to perform the action recognition operation, and the above operations are repeated during the action recognition process.

In an action recognition method provided in this embodiment, when a specified event is detected, sensor data collected by an acceleration sensor in a head-mounted device is started to be obtained, so as to obtain sensor data of a preset window length in a time series, and then the sensor data of a preset window length in a time series is obtained. The value of the input data, the value of the data to be input is the average value of the values of all channels of the sensor with the preset window length, the value of the data to be input is input into the action recognition model, the action classification result output by the action recognition model is obtained, and then the The data to be input is reset to zero. Through the above method, after the value of the data to be input is input into the action recognition model for action recognition, the data to be input is reset to zero, which can avoid the influence of the previous action recognition result on the next action recognition, and improve the accuracy of the action recognition .

Referring to FIG. 7 , an action recognition method provided by an embodiment of the present application is applied to an electronic device, and the method includes:

S310: When a specified event is detected, start acquiring sensor data collected by an acceleration sensor in the head-mounted device, so as to acquire sensor data with a preset window length on a time series.

In one way, the specified event is an incoming call event; the sensor data of the preset window length on the time series may be the sensor data of the preset window length collected by the acceleration sensor in the head-mounted device in the time sequence, and then Send the sensor data of the preset window length to the electronic device, or the sensor data collected in real time for the acceleration sensor in the headset and then sent to the electronic device in real time. When the electronic device stores the preset window length to meet the preset window length, the preset The sensor data of the window length is input to the action recognition model.

Optionally, in this embodiment of the present application, one moment may correspond to one sensor data. When the electronic device stores the sensor data sent by the head-mounted device, in order to reduce the dimension of the sensor data, the sensor data corresponding to one time point can be selected every four time points from the sensor data in the time series sent by the head-mounted device. Store, or take every four sensor data points to store sensor data. Exemplarily, the sensor data sent by the head mounted device in chronological order is "data 1, data 2, data 3, data 4, data 5, data 6, data 7, data 8, data 9, data 10", and the electronic device according to The method of taking one sensor data every four sensor data points for storage, the stored sensor data includes "data 1, data 5 and data 9".

S320: Input the sensor data into an action recognition model, and obtain an action classification result output by the action recognition model.

For a detailed explanation of the steps included in S320, reference may be made to the corresponding steps in the foregoing embodiments, which will not be repeated here.

S330: Save the action classification result in a result queue with a preset length to update the result queue.

In one way, the result queue of preset length is a preset queue of length n that stores the action classification results output by the action recognition model. The inference results of the sensor data of the previous n preset window lengths are stored in the result queue of the preset length.

In this embodiment of the present application, updating the result queue can be understood as updating the result queue once it is detected that a new action classification result is stored in the result queue.

Further, since the length of the result queue is a fixed length, after storing n action classification results, when a new action classification result is stored, it is necessary to remove the originally stored action classification result in the result queue. In the above case, the action classification results originally stored in the result queue can be removed based on the first-in-first-out principle. After the action classification result originally stored in the result queue is removed, the new action classification result is stored in the result queue.

S340: Determine whether an action occurs based on the updated result queue.

In the embodiment of the present application, after the result queue is updated by the above method, every time the result queue is updated, it is determined whether an action occurs according to the decision policy. When the decision-making strategy determines that an action occurs, the electronic device will perform a certain cooling process, and the electronic device will set the value of the newly generated action classification result to a preset value to prevent the electronic device from responding continuously due to the detection of the same action.

Specifically, the step of setting the value of the newly generated action classification result as a preset value may include: if it is determined that a specified action occurs based on multiple classification results in the updated result queue, classifying the newly generated classification result as a preset value. The result is set to a preset value and stored in the result queue; or, if it is judged that a specified action occurs based on multiple classification results in the updated result queue, the newly generated classification result is stored in the result queue. Rear is the default value. Wherein, the preset value may be any set value.

Further, if it is judged that no action occurs according to the decision-making strategy, in order to avoid the impact of the previous action recognition result on the next action recognition, the electronic device will wait for a period of time before inputting the sensor data of the next preset window length into the action recognition model. .

S350: When it is determined that a nodding action occurs, control the electronic device to operate in response to an incoming incoming call corresponding to the nodding action.

In one way, when it is determined that the nodding action occurs according to the decision policy, the control electronic device invokes the interface to respond to the incoming incoming call operation corresponding to the nodding action.

S360: When it is determined that a shaking motion occurs, control the electronic device to respond to an operation of rejecting an incoming call corresponding to the shaking motion.

In one way, when it is judged that a shaking motion occurs according to the decision policy, the control electronic device responds to an operation of rejecting the incoming call corresponding to the shaking motion.

Through the above method, the user controls answering and rejecting calls by nodding and shaking his head, which solves the problem that the user is inconvenient to touch the screen in some scenarios. The application scenarios may include the following scenarios: Application scenarios of device operation, application scenarios of hands being occupied by other affairs, and application scenarios of private interaction when disabled or inconvenient to perform voice operations, etc.

In an action recognition method provided in this embodiment, sensor data with a preset window length in a time series is acquired, the sensor data is input into an action recognition model, an action classification result output by the action recognition model is obtained, and the action classification result is stored in a preset Set the length of the result queue to update the result queue, and then judge whether there is an action based on the updated result queue. When it is determined that the nodding action occurs, the control electronic device responds to the nodding action corresponding to the incoming incoming call operation, When it is determined that a shaking motion occurs, the control electronic device puts in an operation of rejecting the incoming call in response to the shaking motion. Through the above method, in the interaction between the user and each electronic device, the user can answer the call by nodding, and reject the call by shaking his head, which solves the inconvenient hand touch screen operation in some scenarios. , which provides users with a brand-new basic interaction method, and also provides users with one more interactive choice, which at the same time helps to improve the expressiveness of each electronic device and enhance the user experience.

Referring to FIG. 8 , a motion recognition apparatus 400 provided by the present application operates on an electronic device, and the apparatus 400 includes a data acquisition unit 410 , a result output unit 420 and an operation execution unit 430 .

The data acquisition unit 410 is configured to acquire sensor data with a preset window length on a time series.

Specifically, the data acquisition unit 410 is configured to start acquiring sensor data collected by an acceleration sensor in the headset when a specified event is detected, so as to acquire sensor data with a preset window length on a time series.

In one way, the data acquisition unit 410 is further configured to acquire the connection state between the electronic device and the head-mounted device; if the electronic device and the head-mounted device are in the connection state, acquire a preset window in the time series length of sensor data.

Optionally, the data acquisition unit 410 is further configured to acquire the state value of the electronic device; if the state value of the electronic device is the first state value, it is determined that the electronic device is connected to the head-mounted device. state; if the state value of the electronic device is the second state value, it is determined that the electronic device and the head mounted device are in a disconnected state.

Optionally, the data acquisition unit 410 is further configured to determine the connection state between the electronic device and the head-mounted device by monitoring broadcasts.

As another method, the data acquisition unit 410 is configured to detect whether the wireless Bluetooth headset is in a wearing state; if the wireless Bluetooth headset is in a wearing state, acquire sensor data of a preset window length on the time series.

Optionally, the data acquisition unit 410 is further configured to acquire the status value returned by the infrared sensor set in the wireless Bluetooth headset; if the status value indicates that the infrared signal emitted by the infrared sensor is blocked, it is determined that the wireless The Bluetooth headset is in a wearing state; if the state value indicates that the infrared signal emitted by the infrared sensor is not blocked, it is determined that the wireless Bluetooth headset is in an unworn state.

Optionally, the data acquisition unit 410 is further configured to detect whether the wireless Bluetooth headset is in a powered-on state if the wireless Bluetooth headset is in a wearing state; when it is determined that the wireless Bluetooth headset is in a wearing state and is in a powered-on state, Acquire sensor data for a preset window length over a time series.

Optionally, the data acquisition unit 410 is further configured to detect whether multiple function keys or function buttons of the wireless Bluetooth headset are in a normal working state; if it is detected that multiple function keys or function buttons of the wireless Bluetooth headset are in In a normal working state, it is determined that the wireless Bluetooth headset is in a powered-on state.

Optionally, the data acquisition unit 410 is further configured to detect whether the wireless Bluetooth headset can currently be used; if it is detected that the wireless Bluetooth headset can currently be used, it is determined that the wireless Bluetooth headset is in a powered-on state.

The result output unit 420 is configured to input the sensor data into the action recognition model, and obtain the action classification result output by the action recognition model.

The action recognition model includes a first convolutional layer, a second convolutional layer, a maximum pooling layer, a third convolutional layer, a fourth convolutional layer, a global average pooling layer, a fully connected layer and softmax layer; the first convolutional layer and the second convolutional layer are convolutional layers with a convolution kernel of 7 and a dimension of 64; the third convolutional layer and the fourth convolutional layer are volumes A convolutional layer with a kernel of 7 and a dimension of 128.

Based on the cross-entropy loss function, the convolutional neural network model is iteratively trained until the number of iterations reaches the maximum number of iterations, and the convolutional neural network model when the maximum number of iterations is reached is used as the action recognition model.

Wherein, the cross entropy loss function is

Wherein: the M represents the number of action categories; the y _ic represents the indicator variable; the _pic represents the predicted probability that the observed sample i belongs to the action category c.

Specifically, the result output unit 420 is configured to obtain the value of the data to be input, where the value of the data to be input is the average value of the values of all channels of the sensor data of the preset window length; The value of the action recognition model is input to the action recognition model, and the action classification result output by the action recognition model is obtained; the data to be input is zeroed.

The operation execution unit 430 is configured to control the electronic device to respond to an operation corresponding to the action if it is determined that an action occurs based on the action classification result.

Specifically, the operation execution unit 430 is configured to control the electronic device to respond to an incoming call operation corresponding to the nodding action when it is determined that a nodding action occurs; when it is determined that a shaking action occurs, control the electronic device The device responds to an operation of rejecting the incoming call corresponding to the shaking motion.

Referring to FIG. 9, the apparatus 400 further includes:

The action judging unit 440 is configured to save the action classification result in a result queue of preset length, so as to update the result queue; and determine whether an action occurs based on the updated result queue.

Referring to FIG. 10, the apparatus 400 further includes:

The result saving unit 450 is configured to set the newly generated classification result as a preset value and store it in the result queue if it is judged that a specified action occurs based on a plurality of classification results in the updated result queue; or, If it is determined that a specified action occurs based on the plurality of classification results in the updated result queue, the newly generated classification result is stored in the result queue and set as a preset value.

It should be noted that the apparatus embodiments in the present application correspond to the foregoing method embodiments, and the specific principles in the apparatus embodiments may refer to the content in the foregoing method embodiments, which will not be repeated here.

An electronic device provided by the present application will be described below with reference to FIG. 11 .

Referring to FIG. 11 , based on the above-mentioned action recognition method and apparatus, an embodiment of the present application further provides another electronic device 100 that can execute the foregoing action recognition method. The electronic device 100 includes one or more (only one shown in the figure) a processor 102, a memory 104, and a network module 106 that are coupled to each other. Wherein, the memory 104 stores a program that can execute the content in the foregoing embodiments, and the processor 102 can execute the program stored in the memory 104 .

The processor 102 may include one or more processing cores. The processor 102 uses various interfaces and lines to connect various parts of the entire electronic device 100, and executes by running or executing the instructions, programs, code sets or instruction sets stored in the memory 104, and calling the data stored in the memory 104. Various functions of the electronic device 100 and processing data. Optionally, the processor 102 may adopt at least one of digital signal processing (Digital Signal Processing, DSP), field-programmable gate array (Field-Programmable Gate Array, FPGA), and programmable logic array (Programmable Logic Array, PLA). implemented in a hardware form. The processor 102 may integrate one or a combination of a central processing unit (Central Processing Unit, CPU), a graphics processing unit (Graphics Processing Unit, GPU), a modem, and the like. Among them, the CPU mainly handles the operating system, user interface and application programs, etc.; the GPU is used for rendering and drawing of the display content; the modem is used to handle wireless communication. It can be understood that, the above-mentioned modem may not be integrated into the processor 102, and is implemented by a communication chip alone.

The memory 104 may include a random access memory (Random Access Memory, RAM), or may include a read-only memory (Read-Only Memory, ROM). Memory 104 may be used to store instructions, programs, codes, sets of codes, or sets of instructions. The memory 104 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing the operating system, instructions for implementing at least one function (such as a touch function, a sound playback function, an image playback function, etc.) , instructions for implementing the following method embodiments, and the like. For example, the memory 104 may store motion recognition means. The device for setting motion recognition may be the aforementioned device 400 . The storage data area can also store data (such as phone book, audio and video data, chat record data) created by the electronic device 100 during use.

The network module 106 is used for receiving and sending electromagnetic waves, realizing mutual conversion between electromagnetic waves and electrical signals, so as to communicate with a communication network or other devices, for example, communicate with an audio playback device. The network module 106 may include various existing circuit elements for performing these functions, eg, antennas, radio frequency transceivers, digital signal processors, encryption/decryption chips, subscriber identity module (SIM) cards, memory, etc. . The network module 106 can communicate with various networks such as the Internet, an intranet, a wireless network, or communicate with other devices through a wireless network. The aforementioned wireless network may include a cellular telephone network, a wireless local area network, or a metropolitan area network. For example, the network module 106 may interact with the base station for information.

Please refer to FIG. 12 , which shows a structural block diagram of a computer-readable storage medium provided by an embodiment of the present application. The computer-readable storage medium 800 stores program codes, and the program codes can be invoked by the processor to execute the methods described in the above method embodiments.

The computer readable storage medium 800 may be an electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM. Optionally, the computer-readable storage medium 800 includes a non-transitory computer-readable storage medium. Computer readable storage medium 800 has storage space for program code 810 to perform any of the method steps in the above-described methods. These program codes can be read from or written to one or more computer program products. Program code 810 may be compressed, for example, in a suitable form.

In an action recognition method, device, electronic device, and storage medium provided by this embodiment, sensor signal data with a preset window length is first obtained, and then the sensor signal data is input into an action recognition model to obtain an action classification result output by the action recognition model, Finally, if it is determined that an action occurs based on the action classification result, the electronic device is controlled to respond to an operation corresponding to the action. Through the above method, the sensor data of the preset window length in the acquired time series can be recognized by the motion recognition model, so that whether there is an action can be quickly and accurately identified, the accuracy of the action recognition can be improved, and then it can be determined When an action occurs, the control electronic device responds to the operation corresponding to the action in real time.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand: it can still be Modifications are made to the technical solutions described in the foregoing embodiments, or some technical features thereof are equivalently replaced; and these modifications or replacements do not drive the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

An action recognition method, characterized in that, applied to an electronic device, the method comprising:

Obtain sensor data for a preset window length on a time series;

Inputting the sensor data into an action recognition model to obtain an action classification result output by the action recognition model;

If it is determined that an action occurs based on the action classification result, the electronic device is controlled to respond to an operation corresponding to the action.
The method according to claim 1, wherein the acquiring sensor signal data of a preset window length on a time series comprises:

When a specified event is detected, the sensor data collected by the acceleration sensor in the head-mounted device is acquired, so as to acquire sensor data with a preset window length on the time series.
The method according to claim 2, wherein the inputting the sensor data into an action recognition model, and obtaining an action classification result output by the action recognition model, comprises:

obtaining the value of the data to be input, where the value of the data to be input is the average value of the values of all channels of the sensor data of the preset window length;

Input the value of the data to be input into the action recognition model, and obtain the action classification result output by the action recognition model;

The data to be input is reset to zero.
The method according to claim 2, wherein the designated event is an incoming call event, and if it is determined that an action occurs based on the action classification result, the electronic device is controlled to respond to an operation corresponding to the action ,include:

When it is determined that a nodding action occurs, controlling the electronic device to operate in response to an incoming incoming call corresponding to the nodding action;

When it is determined that a shaking motion occurs, the electronic device is controlled to respond to an operation of rejecting an incoming call corresponding to the shaking motion.
The method according to claim 1, wherein if it is determined that an action occurs based on the action classification result, before controlling the electronic device to respond to an operation corresponding to the action, the method further comprises:

saving the action classification result in a result queue of preset length to update the result queue;

Based on the updated result queue, it is determined whether an action has occurred.
The method according to claim 5, wherein the method further comprises:

If it is determined that a specified action occurs based on multiple classification results in the updated result queue, the newly generated classification result is set to a preset value and then stored in the result queue; or,

If it is judged that a designated action occurs based on a plurality of classification results in the updated result queue, the newly generated classification result is stored in the result queue and set as a preset value.
The method according to claim 1, wherein the acquiring sensor data of a preset window length on a time series comprises:

Obtain the connection status between the electronic device and the head-mounted device;

If the electronic device is in a connected state with the head mounted device, acquire sensor data of a preset window length on a time series.
The method according to claim 7, wherein the acquiring the connection status between the electronic device and the head-mounted device comprises:

obtain the status value of the electronic device;

If the state value of the electronic device is a first state value, determine that the electronic device and the head-mounted device are in a connected state;

If the state value of the electronic device is the second state value, it is determined that the electronic device and the head mounted device are in a disconnected state.
The method according to claim 8, wherein the method further comprises:

The connection state between the electronic device and the head-mounted device is determined by monitoring the broadcast.
The method according to any one of claims 7-9, wherein the head-mounted device is a wireless Bluetooth headset; the acquiring sensor data of a preset window length on a time series comprises:

Detecting whether the wireless Bluetooth headset is in a wearing state;

If the wireless Bluetooth headset is in a wearing state, acquire sensor data of a preset window length on the time series.
The method according to claim 10, wherein the detecting whether the wireless Bluetooth headset is in a wearing state comprises:

Obtain the status value returned by the infrared sensor set in the wireless Bluetooth headset;

If the state value indicates that the infrared signal emitted by the infrared sensor is blocked, it is determined that the wireless Bluetooth headset is in a wearing state;

If the state value indicates that the infrared signal emitted by the infrared sensor is not blocked, it is determined that the wireless Bluetooth headset is in an unworn state.
The method according to any one of claims 7-9, wherein the head-mounted device is a wireless Bluetooth headset; the acquiring sensor data of a preset window length on a time series comprises:

If the wireless Bluetooth headset is in a wearing state, detect whether the wireless Bluetooth headset is in a powered-on state;

When it is determined that the wireless Bluetooth headset is in a wearing state and is in a power-on state, sensor data of a preset window length on a time series is acquired.
The method according to claim 12, wherein the detecting whether the wireless Bluetooth headset is turned on comprises:

Detecting whether multiple function keys or function buttons of the wireless Bluetooth headset are in normal working state;

If it is detected that multiple function keys or function buttons of the wireless Bluetooth headset are in a normal working state, it is determined that the wireless Bluetooth headset is in a power-on state.
The method according to claim 12, wherein the detecting whether the wireless Bluetooth headset is turned on comprises:

Detecting whether the wireless Bluetooth headset can currently be used;

If it is detected that the wireless Bluetooth headset can currently be used, it is determined that the wireless Bluetooth headset is in a powered-on state.
The method according to any one of claims 1-14, wherein the action recognition model comprises a first convolutional layer, a second convolutional layer, a maximum pooling layer, a third convolutional layer, a Four convolutional layers, a global average pooling layer, a fully connected layer and a softmax layer; the first convolutional layer and the second convolutional layer are convolutional layers with a convolution kernel of 7 and a dimension of 64; the The third convolutional layer and the fourth convolutional layer are convolutional layers with a convolution kernel of 7 and a dimension of 128.
The method according to any one of claims 1-14, wherein the method further comprises:

Based on the cross-entropy loss function, the convolutional neural network model is iteratively trained until the number of iterations reaches the maximum number of iterations, and the convolutional neural network model when the maximum number of iterations is reached is used as the action recognition model.
The method according to claim 16, wherein the cross entropy loss function is

Wherein: the M represents the number of action categories; the y ic represents the indicator variable; the pic represents the predicted probability that the observed sample i belongs to the action category c.
A motion recognition device, characterized in that it runs on electronic equipment, and the device includes:

a data acquisition unit, used for acquiring sensor data of a preset window length on a time series;

a result output unit, configured to input the sensor data into an action recognition model, and obtain an action classification result output by the action recognition model;

The operation execution unit is configured to control the electronic device to respond to an operation corresponding to the action if it is determined that an action occurs based on the action classification result.
An electronic device, characterized by comprising one or more processors and a memory; one or more programs are stored in the memory and configured to be executed by the one or more processors of claims 1-17 any of the methods described.
A computer-readable storage medium, characterized in that a program code is stored in the computer-readable storage medium, wherein the method of any one of claims 1-17 is executed when the program code is executed by a processor.