WO2018076371A1

WO2018076371A1 - Gesture recognition method, network training method, apparatus and equipment

Info

Publication number: WO2018076371A1
Application number: PCT/CN2016/104121
Authority: WO
Inventors: 崔健
Original assignee: 深圳市大疆创新科技有限公司
Priority date: 2016-10-31
Filing date: 2016-10-31
Publication date: 2018-05-03
Also published as: CN107735796A

Abstract

A gesture recognition method, network training method, apparatus and equipment, the method comprising: acquiring movement data detected by an external device for a current gesture (101); and converting the movement data into frequency domain data, and using the frequency domain data to recognize a gesture corresponding to the frequency domain data (102). Therefore, the accuracy and reliability of gesture recognition may be improved.

Description

Motion recognition method, network training method, device and device

The disclosure of this patent document contains material that is subject to copyright protection. This copyright is the property of the copyright holder. The copyright owner has no objection to the reproduction of the patent document or the patent disclosure in the official records and files of the Patent and Trademark Office.

Technical field

The present application relates to the field of communications technologies, and in particular, to a motion recognition method, a network training method, an apparatus, and a device.

Background technique

With the continuous development of terminal technologies, more and more functions can be implemented on the terminal, which brings great convenience to the terminal users. For example, various wearable devices have emerged, and users can implement various functions by wearing a wearable device such as a wristband, such as viewing time, recording motion data, making a call, etc., and by recognizing gestures on the wearable device. Actions to achieve control of other devices, such as controlling the takeoff and landing of the aircraft, changing the flight path, and so on.

However, the current gesture recognition is identified by identifying the start point and the end point of the data waveform. In this mode, if the user does not make gestures, there may be various motions, and it is difficult to distinguish them. The start point and the end point result in error or unrecognizable gesture recognition, and the accuracy and reliability of gesture recognition are low.

Summary of the invention

The embodiment of the invention provides a motion recognition method, a network training method, a device and a device, which can improve the accuracy and reliability of recognizing a user's gesture action.

In a first aspect, an embodiment of the present invention provides a motion recognition apparatus, including: an acquisition module and a processing module;

An acquiring module, configured to acquire motion data detected by an external device for the current motion;

And a processing module, configured to convert the motion data acquired by the acquiring module into frequency domain data, and use the frequency domain data to identify an action corresponding to the frequency domain data.

In a second aspect, the embodiment of the present invention further provides a motion recognition apparatus, including: a first acquisition module, a second acquisition module, a fusion module, and a processing module;

a first acquiring module, configured to acquire motion data detected by the external device for the current motion, and acquire feature data corresponding to the current motion according to the motion data;

a second acquiring module, configured to acquire an image collected for the current action, and process the image to obtain image recognition data corresponding to the current action;

a fusion module, configured to fuse feature data corresponding to the current action and image recognition data to obtain fusion data;

And a processing module, configured to identify an action corresponding to the merged data.

In a third aspect, the embodiment of the present invention further provides a network training device based on motion recognition, including: a first acquiring module, a second acquiring module, a determining module, and a processing module;

a first acquiring module, configured to acquire motion data detected by an external device for a preset motion;

a second acquiring module, configured to acquire an image collected for the preset action, and process the image to obtain image recognition data corresponding to the preset action;

a determining module, configured to identify an action corresponding to the motion data;

a processing module, configured to perform supervised learning on the image identification data acquired by the second obtaining module by using the action identified by the determining module, and perform pre-determined network model based on the image recognition data after the supervised learning training.

In a fourth aspect, the embodiment of the present invention further provides a motion recognition method, including:

Acquiring motion data detected by an external device for the current motion;

Converting the motion data to frequency domain data and using the frequency domain data to identify an action corresponding to the frequency domain data.

In a fifth aspect, an embodiment of the present invention further provides a motion recognition method, including:

Acquiring motion data detected by the external device for the current motion, and acquiring feature data corresponding to the current motion according to the motion data;

Acquiring an image acquired for the current action, and processing the image to obtain image recognition data corresponding to the current action;

Combining the feature data corresponding to the current action and the image identification data to obtain the fused data;

Identifying an action corresponding to the fused data.

In a sixth aspect, the embodiment of the present invention further provides a network training method based on motion recognition, including:

Acquiring motion data detected by an external device for a preset motion;

Acquiring an image acquired for the preset action, and processing the image to obtain image recognition data corresponding to the preset action;

Identifying an action corresponding to the motion data;

The image recognition data is supervised and learned by the identified action, and the preset network model is trained based on the supervised learning image recognition data.

In a seventh aspect, the embodiment of the present invention further provides a motion recognition device, including: a processor and a communication interface, where the processor is connected to the communication interface;

The communication interface is configured to acquire motion data detected by an external device for a current motion;

The processor is configured to convert the motion data into frequency domain data, and use the frequency domain data to identify an action corresponding to the frequency domain data.

In an eighth aspect, the embodiment of the present invention further provides a motion recognition device, including: a processor, a communication interface, and an image acquisition device, where the processor is respectively connected to the image acquisition device and the communication interface, where

The image obtaining device is configured to collect an image for the current motion;

The processor is configured to acquire an image acquired by the image acquiring device for the current action, and process an image acquired by the image acquiring device to obtain image recognition data corresponding to the current action, according to the motion The data is acquired by the feature data corresponding to the current action, and the feature data corresponding to the current action and the image recognition data are merged to obtain the merged data, and the action corresponding to the merged data is identified.

A ninth aspect, the embodiment of the present invention further provides a network training device based on motion recognition, comprising: an image acquiring device, a processor, and a communication interface, wherein the processor is respectively connected to the image acquiring device and the communication interface ,among them,

The communication interface is configured to acquire motion data detected by an external device for a preset motion;

The image obtaining device is configured to collect an image for the preset action;

The processor is configured to acquire an image acquired by the image acquiring device for the preset action, and process an image acquired by the image acquiring device to obtain image recognition data corresponding to the preset action; The action corresponding to the motion data; performing supervised learning on the image recognition data by using the identified action, and training the preset network model based on the image recognition data after the supervised learning.

In a tenth aspect, an embodiment of the present invention further provides an aircraft, including

a power system for providing flight power to the aircraft;

The motion recognition device according to any of the above seventh aspects, for identifying an action.

In an eleventh aspect, an embodiment of the present invention further provides an aircraft, including

a power system for providing flight power to the aircraft;

The motion recognition device according to any of the above eighth aspects, for identifying an action.

In a twelfth aspect, an embodiment of the present invention further provides an aircraft, including

a power system that provides flight power to the aircraft;

The motion recognition-based network training device according to any one of the above ninth aspects is for training a network model for motion recognition.

Embodiments of the present invention have the following beneficial effects:

The embodiment of the present invention can obtain the motion corresponding to the frequency domain data by using the motion data of the external device, and convert the motion data into the frequency domain data, or by using the motion data and the image identification data. Integrating to obtain the fused data, thereby using the fused data to identify an action corresponding to the fused data, or by determining an action corresponding to the motion data, using the image recognition data and the determined action to supervise the image recognition data to enhance The accuracy and reliability of motion recognition are good.

DRAWINGS

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description are only It is a certain embodiment of the present invention, and other drawings can be obtained from those skilled in the art without any creative work.

1 is a schematic diagram of an aircraft control system according to an embodiment of the present invention;

2 is a schematic flowchart of a motion recognition method according to an embodiment of the present invention;

3 is a schematic flowchart of another motion recognition method according to an embodiment of the present invention;

4 is a schematic flowchart of a network training method based on motion recognition according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a motion recognition apparatus according to an embodiment of the present invention; FIG.

6 is a schematic structural diagram of another motion recognition apparatus according to an embodiment of the present invention;

FIG. 7 is a schematic structural diagram of a network training apparatus based on motion recognition according to an embodiment of the present invention; FIG.

FIG. 8 is a schematic structural diagram of a motion recognition device according to an embodiment of the present invention;

FIG. 9 is a schematic structural diagram of another motion recognition apparatus according to an embodiment of the present invention;

FIG. 10 is a schematic structural diagram of a network training device based on motion recognition according to an embodiment of the present invention.

detailed description

The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, but not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

1 provides a schematic diagram of an aircraft control system including an aircraft such as a drone 110 and a wearable device 120, wherein the drone 110 includes a flight body, a pan/tilt head, and an imaging device 130. In the present embodiment, the flying body includes a plurality of rotors and a rotor motor that drives the rotor to rotate, thereby providing the power required for the drone 110 to fly. The imaging device 130 is mounted on the flying body through the pan/tilt. The imaging device 130 is used for image or video capture during flight of the drone 110, including but not limited to multi-spectral imagers, hyperspectral imagers, visible light cameras, infrared cameras, and the like. The pan/tilt is a multi-axis transmission and stabilization system, including multiple rotating shafts and pan/tilt motors. The pan/tilt motor compensates for the photographing angle of the image forming apparatus 130 by adjusting the rotation angle of the rotating shaft, and prevents or reduces the shake of the image forming apparatus 130 by setting an appropriate buffer mechanism. Of course, in other embodiments, imaging device 130 can be mounted on the flying body either directly or by other means. The wearable device 120 is worn by the operator and communicates with the drone 110 by wireless communication, thereby setting the flight process and imaging of the drone 110. The photographing process of the preparation 130 is controlled. Specifically, the wearable device 120 has a built-in motion sensor. When the wearable device moves with the operator's hand, the motion sensor senses the movement of the hand and outputs corresponding motion data, and according to the movement. The data controls the drone accordingly. In addition, the imaging device of the device on the drone can also capture the image data of the human body motion, and recognize the limb motion according to the image data, and the aircraft can be correspondingly controlled according to the recognized limb motion.

The embodiment of the invention discloses a motion recognition method, a network training method based on the motion recognition method, a related device and related equipment, which can improve the accuracy and reliability of the motion recognition, and has good robustness. The details are explained below.

Referring to FIG. 2, FIG. 2 is a schematic flowchart diagram of a motion recognition method according to an embodiment of the present invention. Specifically, as shown in FIG. 2, the motion recognition method in the embodiment of the present invention may include the following steps:

101. Acquire motion data detected by an external device for the current motion.

Optionally, the technical solution of the embodiment of the present invention may be specifically applied to an external device, or may be specifically applied to a controlled device corresponding to the motion recognition, such as an aircraft, or may be specifically applied to other independent motion recognition devices, and the present invention The embodiment is not limited.

Optionally, the external device may be a wearable device or a handheld device, such as a wristband, a watch, a smart ring, etc., and the external device is configured with a motion sensor such as an Inertial Measurement Unit (IMU), when the external device When moving or making an action such as a gesture, the motion sensor configured by the external device outputs corresponding motion data, and the external device detects the motion data, wherein the motion data may be one or both of angular acceleration and acceleration.

102. Convert the motion data into frequency domain data, and use the frequency domain data to identify an action corresponding to the frequency domain data.

Optionally, the using the frequency domain data to identify an action corresponding to the frequency domain data may be specifically: inputting the frequency domain data into a network model, to identify and the frequency domain by using the network model The action corresponding to the data. Specifically, after acquiring the frequency domain data, the feature of the frequency domain data may be further extracted, for example, the feature of the frequency domain data is obtained by extracting and superimposing, and the extracted feature is input into a network model to identify and The action corresponding to the feature to improve the efficiency of the action recognition. The frequency domain data may be obtained by subjecting the motion data to Fourier transform. The network model may be a neural network, or other network models. The embodiment of the present invention uses a neural network as an example for description.

Optionally, by regularizing the acquired motion data and/or the frequency domain data, Reduce over-fitting. Further optionally, the acquired motion data and/or the frequency domain data may also be normalized.

Specifically, the current gesture recognition is based on time domain data for identification, and since the gesture actions performed within a certain time cannot be aligned to the same length of time, it is difficult to distinguish various gestures from the time domain waveform, and each person There is a big difference in doing the same gesture data. Therefore, the recognition effect is poor when the gesture motion recognition is directly performed in the time domain. The embodiment of the present invention uses the frequency domain identification method to perform motion recognition. Since the frequency domain has no time axis, the gestures are aligned, and the various frequency components of each person making the same gesture are very similar, which greatly improves the gesture. The recognition effect of motion recognition. In addition, the motion data acquired by the external device (such as the data output by the IMU) tends to be noisy, and the frequency range of the motion data is very low, resulting in inaccurate recognition of gesture gestures corresponding to the motion data. Thereby, the obtained original motion data can be subjected to low-pass filtering processing with a relatively low frequency band, and the data amount of the acquired motion data can be reduced by reducing the sampling rate of the motion data, thereby reducing the computational cost of the algorithm. Further, the acquired frequency domain data may be normalized to facilitate the network model to perform data processing.

Optionally, the motion data may include data obtained by sampling data output by the motion sensor of the external device within a preset time period; or the motion data may also include data outputted by the motion sensor of the external device. The obtained data is obtained by sampling a preset number of times. For example, the data output by the motion sensor can be collected according to a preset time period. For example, after a large number of tests, it is found that the gesture of the person generally does not exceed 5 s, and the network model can be continuously performed on the motion data collected every 5 s. Train or come to gesture gesture recognition. The embodiment of the invention performs the gesture recognition by converting the motion data into the frequency domain data, that is, the entire segment of the data recognition gesture, and does not identify the start point and the end point of the gesture within 5s, which improves the accuracy of the motion recognition. Sex.

Optionally, the motion data detected by the external device for the preset action may also be acquired; the motion data corresponding to the preset action is converted into frequency domain data, and the frequency domain corresponding to the preset action is utilized. The data and the preset action train the network model. Wherein, the network model is trained before using the network model to identify the action. Specifically, several stable gesture actions can be defined in advance, and different users can perform these gesture actions, and collect motion data obtained by a large number of different users to perform these gestures (such as data output by the watch IMU), and convert them. For frequency domain data, the frequency domain data is utilized to train a network model such as a neural network. Specifically, after acquiring the frequency domain data, the characteristics of the frequency domain data may be further extracted, for example, by extracting, superimposing, etc. The characteristics of the frequency domain data, and the extracted features are taken as inputs, and the preset action is used as an output to train the network model such as a neural network, thereby improving the stability and reliability of the network model through a large amount of network training. And improve the reliability of motion recognition based on the network model. Further optionally, a regularization process may be employed on the motion data and/or the frequency domain data during training to reduce overfitting. The network model is trained by various gesture actions performed by the user using an external device, thereby improving the recognition rate of the gesture action and reducing the false detection rate of the action.

Further optionally, after the using the frequency domain data to identify an action corresponding to the frequency domain data, the aircraft may also be controlled according to the identified action.

Specifically, after the external device, such as the watch, recognizes the current gesture action, the control instruction corresponding to the gesture action may be generated, and the control instruction is sent to the controlled device, such as an aircraft, to enable the aircraft to perform an operation corresponding to the gesture action.

Specifically, a plurality of gestures, such as a gesture action 1, a gesture action 2, a gesture action 3, and the like, may be predefined, and a control function corresponding to each gesture action may be further preset to control the corresponding controlled device. Taking the external device as the watch and taking the control aircraft as an example, it is assumed that when the user wears the hand of the watch to perform the gesture action 1, the aircraft can automatically start and take off; when the aircraft is in flight, the user can perform the gesture action 2, the aircraft can enter the surround Selfie function. Further, a mode switching control between the plurality of self-timer modes by the gesture action 1 may be set, and in the self-timer mode, the user may perform the gesture action 2 again to exit the self-timer mode, and the like. For another example, if the aircraft is already in flight state and is not in the self-portrait mode, the user simply raises the finger with the watch toward the aircraft and performs a gesture action 1 at which time the aircraft can enter which mode to fly, and the aircraft takes the user as the center of the ball. The aircraft is currently flying in a spherical plane with a radius of a person, and the aircraft flies according to the position pointed by the user's finger. When it is recognized that the user points at the aircraft and turns the arm, the aircraft can adjust the flight radius, and the aircraft can automatically detect whether the user is pointing to the ground and safely control the height to prevent squatting. In which mode of flying, the user can perform a gesture action 1 by a preset gesture action, such as pointing to the aircraft, and exiting which mode. For another example, when the flight must be emergency landing, the user can perform a gesture action 3 to control the aircraft to make a safe landing. This enables flight control of the aircraft without the need to use the remote control for control, enhancing the user experience.

In the embodiment of the present invention, the motion data of the external device is acquired, and the motion data is converted into the frequency domain data, so that the frequency domain data is used to identify the action corresponding to the frequency domain data, which is a good avoidance. Open the recognition start and end points of the gesture data waveform, which improves the recognition rate of motion recognition. The false detection rate is low, and the accuracy and reliability of motion recognition are further improved, and the robustness is good.

Referring to FIG. 3, FIG. 3 is a schematic flowchart diagram of another motion recognition method according to an embodiment of the present invention. Specifically, as shown in FIG. 3, the motion recognition method in the embodiment of the present invention may include the following steps:

201. Acquire motion data detected by an external device for the current motion, and acquire feature data corresponding to the current motion according to the motion data.

Optionally, the technical solution of the embodiment of the present invention may be specifically applied to the controlled device corresponding to the motion recognition, such as an aircraft, or may be specifically applied to other independent motion recognition devices, which is not limited in the embodiment of the present invention.

Alternatively, the external device may be a wearable device or a handheld device such as a wristband, a watch, a smart ring, or the like. The acquired motion data may be data collected by a motion sensor such as an IMU disposed in the external device. Further, in the embodiment of the present invention, the feature data may refer to frequency domain data converted from the acquired motion data; or may be obtained by converting the obtained motion data into frequency domain data, and further extracting the frequency data. The characteristics of the domain data, such as by extraction, superposition, etc., obtain the characteristics of the frequency domain data.

202. Acquire an image collected for the current action, and process the image to obtain image recognition data corresponding to the current action.

Optionally, the image identification data may be an image acquisition device such as a camera, and specifically may be a camera disposed on the aircraft, and the image data obtained by detecting the current motion (ie, capturing an image corresponding to the motion), and the image is The data obtained by processing the data.

203. Fusion the feature data corresponding to the current action and the image identification data to obtain the fused data.

Wherein, the fusion data includes features of the motion data and features in the image recognition data, so that features of the data for performing motion recognition can be added, thereby avoiding actions (such as gestures) when only the image recognition action is performed. The user of the action is not in the image acquired by the image acquisition device, or the user is small in the image, causing the motion recognition to be erroneous, or even unrecognizable, or the user cannot perform recognition when the action is recognized by the external device. By obtaining the fused data including the features of both, the motion recognition rate and accuracy can be improved.

204. Identify an action corresponding to the merged data.

Optionally, the identifying the action corresponding to the fused data may be specifically: inputting the fused data into a network model, to identify an action corresponding to the fused data by using the network model. The network model may be a neural network, or may be another network model. The embodiment of the present invention uses a neural network as an example for description.

Optionally, the motion data in the embodiment of the present invention may include data obtained by sampling data output by the motion sensor of the external device in a preset time period; or pre-predicting data output by the motion sensor of the external device. The obtained data is obtained by sampling the number of times.

Optionally, the acquiring the feature data corresponding to the current action according to the motion data may be specifically: converting the motion data into frequency domain data, and acquiring the current action according to the frequency domain data. Corresponding feature data. Further, the merging the feature data and the image identification data corresponding to the current action to obtain the fused data may be specifically: merging the frequency domain data and the image identification data to obtain fused data. Specifically, the obtained motion data, such as IMU data, is transformed from the time domain to the frequency domain by Fourier transform to obtain frequency domain data, and the feature data may be determined based on the frequency domain data, for example, directly using the frequency domain data as Feature data, or feature extraction of the frequency domain data, such as data obtained by processing such as extraction, superposition, etc., as the feature data. After obtaining the feature data and the image identification data corresponding to the current action, the fusion data including the feature data and the image recognition data may be generated, which is equivalent to the characteristics of the two data sources (image recognition data and motion data). Fusion, so that gesture recognition can be performed based on the fusion data, and device control can be performed based on the recognized gesture motion.

Further, the motion data detected by the external device for the preset motion may be acquired, and the feature data corresponding to the preset action is acquired according to the motion data; and the image identification data obtained by detecting the preset motion is acquired. And combining the image identification data corresponding to the preset action with the feature data corresponding to the preset action to obtain fusion data corresponding to the preset action; and using the fusion data corresponding to the preset action And training the network model with the preset action.

Specifically, the obtained motion data, such as the IMU data, is transformed from the time domain to the frequency domain by Fourier transform, and the frequency domain data is obtained, and the feature data is determined based on the frequency domain data, for example, directly the frequency domain data. As the feature data, the frequency domain data may be subjected to feature extraction, such as data obtained by processing such as extraction, superposition, etc., as the feature data. The image learns gestures through deep learning. In the process of deep learning, the middle layer is the learned image recognition data, and the special data from the IMU can be The levy data is inserted into an intermediate layer to obtain the fused data of the two, and continues to learn, which is equivalent to merging the characteristics of the two data sources (image recognition data and motion data). Based on the above fusion mode, after a large amount of training, including image recognition data corresponding to a gesture action that is difficult to recognize by a large number of images (such as when the user who performs the gesture action is not in the image or when the person is in the image is small, it is difficult to recognize by image recognition) The training is performed so that the gesture motion can be recognized by the integrated motion data even if the image is difficult to recognize the gesture, and the device control can be further performed based on the gesture motion.

Further optionally, after the identifying the action corresponding to the fusion data, the aircraft may also be controlled according to the identified action.

Specifically, a plurality of gestures may be predefined, and a control function corresponding to each gesture action may be further preset to control a corresponding controlled device such as an aircraft. Taking the control aircraft as an example, the operation of the aircraft can be controlled by gesture gesture recognition to control the takeoff, landing, self-timer, and which flight, thereby enhancing the user experience.

In the embodiment of the present invention, the motion data of the acquired external device and the image identification data may be merged, and specifically, the motion data may be converted into frequency domain data, and the feature data of the current action is determined based on the frequency domain data, and The fusion data is obtained by fusing the feature data with the image identification data, so that the fusion data can be used to identify the action corresponding to the fusion data, thereby improving the accuracy and reliability of the motion recognition, and the robustness is good. In the prior art, the problem that the motion data of only the external device is used for identification, or the motion recognition using only the image recognition is low or even unrecognizable is avoided.

Referring to FIG. 4, FIG. 4 is a schematic flowchart diagram of a network training method based on motion recognition according to an embodiment of the present invention. Specifically, as shown in FIG. 4, the network training method in the embodiment of the present invention may include the following steps:

301. Acquire motion data detected by an external device for a preset motion.

Optionally, the technical solution of the embodiment of the present invention may be specifically applied to the controlled device corresponding to the action identification, such as an aircraft, or may be specifically applied to other independent network training devices, which is not limited in the embodiment of the present invention.

Alternatively, the external device may be a wearable device or a handheld device such as a wristband, a watch, a smart ring, or the like. The acquired motion data may be data collected by a motion sensor such as an IMU disposed in the external device. Further, in the embodiment of the present invention, the acquiring according to the motion data The feature data corresponding to the preset action is also referred to as feature data corresponding to the motion data, and the feature data is used to determine a specific gesture action.

Optionally, the motion data in the embodiment of the present invention may include data obtained by sampling data acquired by the motion sensor of the external device in a preset time period, or preset data output by the motion sensor of the external device. The number of samples is taken to obtain the data.

302. Acquire an image collected for the preset action, and process the image to obtain image recognition data corresponding to the preset action.

303. Identify an action corresponding to the motion data.

304. Perform supervised learning on the image identification data by using the identified action, and train the preset network model based on the image recognition data after the supervised learning.

The network model may be a neural network, or may be another network model. The embodiment of the present invention uses a neural network as an example for description.

Optionally, the performing the supervised learning on the image identification data by using the identified action may be specifically: using depth learning, using the image recognition data as an input, and performing the action as a target output. Supervise learning. Specifically, since the feature dimension collected by the image is large, the depth of the image recognition feature dimension can be reduced by deep learning, thereby improving the stability and reliability of the network model based on the image recognition network training after the learning.

Optionally, the network model may be a network model corresponding to the image recognition action, and the network model may be trained by using the action corresponding to the identified motion data and the image recognition data. Specifically, taking the external device as a wristband as an example, the motion data feature acquired by the wristband can be pre-trained (may be data obtained by processing, filtering, etc. the motion data, or the motion data may be subjected to Fourier The corresponding relationship between the transformed frequency domain data, and the like) and the gesture action. Therefore, when the user gestures, the motion data and the image recognition data collected by the wristband are synchronously acquired, the characteristics of the motion data are acquired, and the corresponding relationship between the motion data feature and the gesture motion collected by the wristband is recognized, and the currently collected The action corresponding to the motion data feature of the bracelet. Further, deep learning can be used to take image recognition data as input, and the recognized action is the target output to supervise the image recognition data. Xi. After the learned image recognition data is reduced, the network model can be trained by using the reduced dimension image recognition data, and the corresponding relationship between the image recognition data and the gesture action is determined to perform the gesture action in the subsequent manner. At the time of recognition, the current image recognition data can be recognized by the acquired current image and the current gesture action can be quickly and accurately recognized by the network model to perform device control based on the gesture action.

Optionally, the network model may be a network model corresponding to the manner of the image recognition action based on the motion data recognition action, and the network model may be trained by using the acquired motion data and the fusion data of the image recognition data. . The feature data corresponding to the preset action may be acquired according to the motion data. Further, the image recognition data after the supervised learning is used to train the preset network model, which may be specifically: The feature data is merged with the image recognition data after the supervised learning to obtain fusion data; and the preset network model is trained by using the fusion data. Specifically, the external device is still used as an example of a wristband, and the corresponding relationship between the motion data feature and the gesture motion collected by the wristband can be pre-trained. Therefore, when the user gestures, the motion data and the image recognition data collected by the wristband are synchronously acquired, the characteristics of the motion data are acquired, and the corresponding relationship between the motion data feature and the gesture motion collected by the wristband is recognized, and the currently collected The action corresponding to the motion data feature of the bracelet. Further, the depth learning is used to take the image recognition data as an input, and the recognized motion is the target output to supervise and learn the image recognition data. After the learned image recognition data is reduced, the network model can be trained by using the reduced image identification data and the feature data of the motion data, that is, using the fusion data of the two to network The model is trained to input the fusion data of the learned current image recognition data acquired for a certain gesture action and the feature data of the current motion data to the network model, so as to be fast and accurate. The current gesture action is recognized, and device control can be further performed based on the gesture action.

Optionally, the acquiring the feature data corresponding to the current action according to the motion data may be specifically: converting the motion data into frequency domain data, and acquiring, according to the frequency domain data, the current action Characteristic data. Specifically, the feature data is determined based on the frequency domain data, for example, the converted frequency domain data may be directly used as feature data, or the frequency domain data may be subjected to feature extraction, such as after being extracted, superimposed, etc. The data is used as the feature data.

Further optionally, the current image identification data may also be obtained based on the current motion collection, and based on the The neural network trained by the above training method recognizes the gesture action corresponding to the current image recognition data, so that the controlled device such as the aircraft needs to be determined based on the pre-defined gesture action and the control function corresponding to each gesture action. The operations performed, such as gesture recognition, can be used to control the aircraft to take off, land, self-timer, and fly, thereby enhancing the user experience.

In the embodiment of the present invention, the motion data and the image recognition data are simultaneously collected when the user performs the gesture action, and the current gesture action is recognized based on the correspondence between the motion data and the gesture action, thereby being able to be identified by using the deep learning method. The gesture action supervises and learns the image recognition data, which improves the accuracy and reliability of motion recognition and is robust.

Referring to FIG. 5, FIG. 5 is a schematic structural diagram of a motion recognition apparatus according to an embodiment of the present invention. Optionally, the motion recognition device in the embodiment of the present invention may be specifically configured in an external device, or may be specifically disposed in a controlled device, such as an aircraft, corresponding to the motion recognition, or may be specifically configured in another independent motion recognition device. ,and many more. Specifically, as shown in FIG. 5, the motion recognition apparatus 10 of the embodiment of the present invention may include an acquisition module 11 and a processing module 12. among them,

The obtaining module 11 is configured to acquire motion data detected by an external device for the current motion;

The processing module 12 is configured to convert the motion data acquired by the acquiring module 11 into frequency domain data, and use the frequency domain data to identify an action corresponding to the frequency domain data.

Alternatively, the external device may be a wearable device or a handheld device such as a wristband, a watch, a smart ring, or the like. The external device is configured with a motion sensor such as an IMU. When the external device moves or makes an action such as a gesture, the motion sensor configured by the external device outputs corresponding motion data, and the external device detects the motion data.

Optionally, in some embodiments,

The processing module 12 is specifically configured to input the frequency domain data into a network model to identify an action corresponding to the frequency domain data by using the network model.

Specifically, after acquiring the frequency domain data, the processing module 12 may further extract features of the frequency domain data, such as acquiring, acquiring, or superimposing the characteristics of the frequency domain data, and inputting the extracted features into the network. The model identifies the action corresponding to the feature to improve the efficiency of the action recognition. The frequency domain data may be obtained by subjecting the motion data to Fourier transform. The network model may be a neural network, or other network models, which are not limited in the embodiment of the present invention.

Further, the obtaining module 11 is further configured to acquire motion data detected by the external device for the preset motion;

The processing module 12 is further configured to convert the motion data corresponding to the preset action into frequency domain data, and use the frequency domain data corresponding to the preset action and the preset action to the network. The model is trained.

Further optionally, in some embodiments,

The processing module 12 is further configured to perform regularization processing on the frequency domain data to reduce over-fitting of the frequency domain data.

Further optionally, in some embodiments,

The processing module 12 is further configured to perform normalization processing on the motion data.

Optionally, the motion data may include data obtained by sampling data acquired by the motion sensor of the external device in a preset time period, or sampling the data output by the motion sensor of the external device by a preset number of times. The data.

Further optionally, in some embodiments,

The processing module 12 is further configured to control the aircraft according to the identified action.

Specifically, after the current gesture action is recognized, the processing module 12 may further generate a control instruction corresponding to the gesture action, and send the control instruction to the controlled device, such as an aircraft, to enable the aircraft to perform an operation corresponding to the gesture action.

In the embodiment of the present invention, the motion data of the external device is acquired, and the motion data is converted into the frequency domain data, so that the frequency domain data is used to identify the action corresponding to the frequency domain data, which is a good avoidance. Open the recognition start and end points of the gesture data waveform, improve the recognition rate of the motion recognition, reduce the false detection rate, and further improve the accuracy and reliability of the motion recognition, and the robustness is better.

Referring to FIG. 6, FIG. 6 is a schematic structural diagram of another motion recognition apparatus according to an embodiment of the present invention. Optionally, the motion recognition device in the embodiment of the present invention may be specifically disposed in a controlled device corresponding to the motion recognition, such as an aircraft, or may be specifically disposed in another independent motion recognition device, etc., specifically, as shown in FIG. 6. The action recognition apparatus 20 of the embodiment of the present invention may include a first acquisition module 21, a second acquisition module 22, a fusion module 23, and a processing module 24. among them,

The first obtaining module 21 is configured to acquire motion data detected by the external device for the current motion, and acquire feature data corresponding to the current motion according to the motion data;

The second acquiring module 22 is further configured to acquire an image collected for the current action, and process the image to obtain image recognition data corresponding to the current action;

The merging module 23 is configured to combine feature data corresponding to the current action and image identification data to obtain fused data.

The processing module 24 is configured to identify an action corresponding to the fused data.

Alternatively, the external device may be a wearable device or a handheld device such as a wristband, a watch, a smart ring, or the like. The acquired motion data may be data collected by a motion sensor such as an IMU disposed in the external device. Further, in the embodiment of the present invention, the feature data may refer to frequency domain data obtained by converting the acquired motion data (such as transforming by Fourier transform), and may also refer to converting the obtained motion data into frequency. The domain data, and further extracting the characteristics of the frequency domain data, for example, by extracting, superimposing, etc., obtaining the characteristics of the frequency domain data.

Optionally, in some embodiments,

The processing module 24 is specifically configured to input the merged data into a network model to identify an action corresponding to the merged data by using the network model.

The network model may be a neural network, or other network models, which are not limited in the embodiment of the present invention.

Further optionally, in some embodiments,

The first obtaining module 21 may be specifically configured to convert the motion data into frequency domain data when acquiring feature data corresponding to the current motion according to the motion data, to use the frequency domain data as Feature data corresponding to the current action;

The merging module 23 may be specifically configured to combine the frequency domain data and the image identification data acquired by the second acquiring module 22 to obtain fused data.

Further optionally, in some embodiments,

The first obtaining module 21 is further configured to acquire motion data detected by the external device for the preset action, and acquire feature data corresponding to the preset action according to the motion data;

The second acquiring module 22 is further configured to acquire an image collected for the preset action, and process the collected image to obtain image recognition data corresponding to the preset action;

The merging module 23 is further configured to combine the image identification data corresponding to the preset action with the feature data corresponding to the preset action to obtain fused data corresponding to the preset action;

The processing module 24 is further configured to train the network model by using the fused data corresponding to the preset action and the preset action.

Further optionally, in some embodiments,

The processing module 24 is further configured to control the aircraft according to the identified action.

In the embodiment of the present invention, the motion data of the acquired external device and the image identification data may be merged, and specifically, the motion data may be converted into frequency domain data, and the feature data of the current action is determined based on the frequency domain data, and The fusion data is obtained by fusing the feature data with the image identification data, so that the fusion data can be used to identify the action corresponding to the fusion data, thereby improving the accuracy and reliability of the motion recognition, and the robustness is good. In the prior art, the problem of low recognition accuracy or even unrecognizable motion recognition caused by using only external device recognition or using only image recognition is avoided.

Referring to FIG. 7, FIG. 7 is a schematic structural diagram of a network training apparatus based on motion recognition according to an embodiment of the present invention. Optionally, the motion recognition device in the embodiment of the present invention may be specifically disposed in a controlled device, such as an aircraft, corresponding to the motion recognition, or may be specifically configured in another independent network training device. Specifically, as shown in FIG. 7, the network training device 30 of the embodiment of the present invention may include a first obtaining module 31, a second obtaining module 32, a determining module 33, and a processing module 34. among them,

The first obtaining module 31 is configured to acquire motion data detected by an external device for a preset motion;

The second acquiring module 32 is further configured to acquire an image collected for the preset action, and process the image to obtain image recognition data corresponding to the preset action;

The determining module 33 is configured to identify an action corresponding to the motion data;

The processing module 34 is configured to perform supervised learning on the image identification data acquired by the second obtaining module 32 by using the action identified by the determining module 33, and based on the image recognition data after the supervised learning Set up a network model for training.

Alternatively, the external device may be a wearable device or a handheld device such as a wristband, a watch, a smart ring, or the like. The acquired motion data may be data collected by a motion sensor such as an IMU disposed in the external device. Wherein, the network model can be a neural network or other network models. The embodiment of the present invention uses a neural network as an example for description.

Further optionally, in some embodiments,

The processing module 34 is specifically configured to use the depth learning to input the image recognition data acquired by the second acquiring module 32, and use the action as a target output to perform supervised learning.

Specifically, because the feature dimension collected by the image is large, the processing module 34 can reduce the image recognition feature dimension through deep learning, so as to improve the stability and reliability of the network model obtained based on the learned image recognition network training.

Further optionally, in some embodiments,

The first obtaining module 31 is further configured to acquire feature data corresponding to the preset action according to the motion data;

The processing module 34 is further configured to fuse the feature data with the image data after the supervised learning to obtain the fused data, and use the fused data to train the preset network model.

Further optionally, in some embodiments,

The first obtaining module 31 is specifically configured to: when acquiring feature data corresponding to the current action according to the motion data, specifically, converting the motion data into frequency domain data, to use the frequency domain data As feature data corresponding to the current action.

Specifically, the feature data is determined based on the frequency domain data. For example, the acquiring module 31 may directly use the converted frequency domain data as feature data, or perform feature extraction on the frequency domain data, such as extraction, superposition, and the like. The data obtained after the processing is taken as the feature data.

The embodiment of the present invention further provides a computer storage medium, wherein the computer storage medium stores program instructions, and the program may include some or all of the steps of the motion recognition method in the corresponding embodiment of FIG. 2 .

The embodiment of the invention further provides a computer storage medium, wherein the computer storage medium stores program instructions, and the program execution may include a part of the motion recognition method in the corresponding embodiment of FIG. Or all steps.

The embodiment of the present invention further provides a computer storage medium, where the computer storage medium stores program instructions, and the program execution may include some or all steps of the motion recognition based network training method in the corresponding embodiment of FIG. 4 . .

Referring to FIG. 8, FIG. 8 is a schematic structural diagram of a motion recognition device according to an embodiment of the present invention. The motion recognition device in the embodiment of the present invention may be an external device such as a wristband, a watch, a ring, or the like, or may be a controlled device such as an aircraft, or may be another independent motion recognition device, or the like. Specifically, the motion recognition device 1 in the embodiment of the present invention may include: a communication interface 300, a memory 200, and a processor 100, and the processor 100 may be respectively connected to the communication interface 300 and the memory 200. Alternatively, the motion recognition device 1 may further include a motion sensor, a camera, and the like.

The communication interface 300 can include a wired interface, a wireless interface, and the like, and can be used to receive data transmitted by an external device, such as receiving motion data collected by an external device for a certain gesture action of the user, or for transmitting motion data acquired by the external device. and many more.

The memory 200 may include a volatile memory such as a random-access memory (RAM); the memory 200 may also include a non-volatile memory such as a flash. A flash memory or the like; the memory 200 may further include a combination of the above types of memories.

The processor 100 may be a central processing unit (CPU), a graphics processing unit (GPU), or the like. The processor may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD) or a combination thereof. The PLD may be a complex programmable logic device (CPLD) or a field-programmable gate array (FPGA).

Optionally, the memory 200 is further configured to store program instructions. The processor 100 can invoke the program instructions to implement the motion recognition method as shown in the embodiment of FIG. 2 of the present application.

Specifically, the communication interface 300 can be configured to acquire motion data detected by an external device for current motion detection;

The processor 100 can invoke program instructions stored in the memory 200 for executing:

Optionally, the processor 100 is specifically configured to input the frequency domain data into a network model to identify an action corresponding to the frequency domain data by using the network model.

Alternatively, the network model may comprise a neural network.

Optionally, the communication interface 300 is further configured to acquire motion data detected by the external device for the preset action;

The processor 100 is further configured to convert motion data corresponding to the preset action into frequency domain data, and use the frequency domain data corresponding to the preset action and the preset action on the network. The model is trained.

Optionally, the processor 100 is further configured to perform regularization processing on the frequency domain data to reduce over-fitting of the frequency domain data.

Optionally, the processor 100 is further configured to perform normalization processing on the motion data.

Optionally, the motion data may include data obtained by sampling data acquired by a motion sensor of an external device within a preset time period, or sampling the data output by the motion sensor of the external device by a preset number of times. The data obtained.

Optionally, the processor 100 is further configured to control the aircraft according to the identified action.

An embodiment of the present invention further provides an aircraft, including

a power system for providing flight power to the aircraft;

The motion recognition device according to any of the above embodiments of FIG. 8 is configured to recognize an action.

Referring to FIG. 9, FIG. 9 is a schematic structural diagram of another motion recognition device according to an embodiment of the present invention. The motion recognition device in the embodiment of the present invention may be a controlled device such as an aircraft, or may be another independent motion recognition device, and the like. Specifically, the motion recognition device 2 in the embodiment of the present invention may include: a communication interface 700, an image acquisition device 600, a memory 500, and a processor 400. The processor 400 may be respectively connected to the communication interface 700 and image. The device 600 and the memory 500 are connected. Alternatively, the motion recognition device 2 may further include a motion sensor.

The image acquisition device 600 may include a camera for acquiring an image, such as an image when the user performs a gesture.

The communication interface 700 can include a wired interface, a wireless interface, and the like, and can be used to receive data transmitted by the external device, such as receiving motion data collected by the external device for a certain gesture action of the user.

The memory 500 may include a volatile memory, such as a random-access memory (RAM); the memory 500 may also include a non-volatile memory, such as a flash. A flash memory or the like; the memory 500 may further include a combination of memories of the above kind.

The processor 400 may be a central processing unit (CPU), a graphics processing unit (GPU), or the like. The processor may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD) or a combination thereof. The PLD may be a complex programmable logic device (CPLD) or a field-programmable gate array (FPGA).

Optionally, the memory 500 is further configured to store program instructions. The processor 400 can invoke the program instructions to implement the motion recognition method as shown in the embodiment of FIG. 3 of the present application.

Specifically, the communication interface 700 is configured to acquire motion data detected by an external device for a current motion.

The image obtaining device 600 is configured to collect an image for the current motion;

The processor 400 is configured to process an image acquired by the image acquiring device, obtain image recognition data, acquire feature data corresponding to the current action according to the motion data, and select a feature corresponding to the current action The data and the image identification data are fused to obtain fused data, and an action corresponding to the fused data is identified.

Optionally, the processor 400 is specifically configured to input the merged data into a network model to identify an action corresponding to the merged data by using the network model.

Optionally, the network model comprises a neural network.

Optionally, the processor 400 is specifically configured to convert the motion data into frequency domain data, and fuse the frequency domain data and the image identification data to obtain fused data.

Specifically, the obtained motion data, such as the IMU data, is transformed from the time domain to the frequency domain by Fourier transform, and the frequency domain data is obtained, and the feature data is determined based on the frequency domain data, for example, directly The frequency domain data obtained by the conversion is used as feature data, or the frequency domain data is subjected to feature extraction, such as data obtained by processing after extraction, superposition, etc., as the feature data.

Optionally, the communication interface 700 is further configured to acquire motion data detected by the external device for the preset action;

The image obtaining device 600 is further configured to collect an image for the preset action;

The processor 400 is further configured to process an image acquired by the image acquiring device to obtain image recognition data corresponding to the preset action, and acquire feature data corresponding to the preset action according to the motion data; Combining the image identification data corresponding to the preset action with the feature data corresponding to the preset action to obtain fusion data corresponding to the preset action; using the fusion data corresponding to the preset action and The preset action trains the network model.

Optionally, the processor 400 is further configured to control the aircraft according to the identified action.

An embodiment of the present invention further provides an aircraft, including

a power system for providing flight power to the aircraft;

The motion recognition device according to any of the above-described embodiments of FIG. 9 is for identifying an action.

Referring to FIG. 10, FIG. 10 is a schematic structural diagram of a network training device based on motion recognition according to an embodiment of the present invention. The network training device in the embodiment of the present invention may be a controlled device such as an aircraft, or may be other independent network training devices, and the like. Specifically, the network training device 3 in the embodiment of the present invention may include: a communication interface 1100, an image acquisition device 1000, a memory 900, and a processor 800, and the processor 800 may be respectively associated with the communication interface 1100. The image acquisition device 1000 and the memory 900 are connected. Alternatively, the motion recognition device 3 may further include a motion sensor or the like.

The image acquisition device 1000 may include a camera for acquiring an image, such as an image when the user performs a gesture.

The communication interface 1100 can include a wired interface, a wireless interface, and the like, and can be used to receive data transmitted by the external device, such as receiving motion data collected by the external device for a certain gesture action of the user.

The memory 900 may include a volatile memory, such as a random-access memory (RAM); the memory 900 may also include a non-volatile memory, such as a flash. A flash memory or the like; the memory 900 may further include a combination of the above types of memories.

The processor 800 can be a central processing unit (CPU), a graphics processing unit (GPU), and the like. The processor may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD) or a combination thereof. The PLD may be a complex programmable logic device (CPLD) or a field-programmable gate array (FPGA).

Optionally, the memory 900 is further configured to store program instructions. The processor 800 can invoke the program instructions to implement a motion recognition based network training method as shown in the embodiment of FIG. 4 of the present application.

Specifically, the communication interface 1100 is configured to acquire motion data detected by an external device for a preset action;

The image obtaining device 1000 is configured to collect an image for the preset action;

The processor 800 is configured to process an image acquired by the image acquiring device to obtain image recognition data, identify an action corresponding to the motion data, and supervise the image identification data by using the identified action Learning, and training the preset network model based on the image recognition data after the supervised learning.

Optionally, the processor 800 is specifically configured to perform the supervised learning by using the image recognition data as an input and using the action as a target output.

Optionally, the processor 800 is further configured to acquire feature data corresponding to the preset action according to the motion data, and fuse the feature data with the image recognition data after the supervised learning to obtain a fusion. Data; using the fusion data to train a preset network model.

Optionally, the processor 800 is specifically configured to convert the motion data into frequency domain data to use the frequency domain data as feature data corresponding to the current action.

Specifically, the obtained motion data can be transformed from the time domain to the frequency domain by Fourier transform. Go to the frequency domain data, and determine the feature data based on the frequency domain data, for example, directly using the frequency domain data as feature data, or performing feature extraction on the frequency domain data, such as data obtained after processing by extraction, superposition, and the like. As the feature data.

Alternatively, the network model may comprise a neural network.

In the embodiment of the present invention, the motion data of the external device may be acquired, and the motion data is converted into frequency domain data, thereby using the frequency domain data to identify an action corresponding to the frequency domain data, or by using motion data and The image recognition data is fused to obtain the fused data, thereby using the fused data to identify an action corresponding to the fused data, or by determining an action corresponding to the motion data, using the image recognition data and the determined action to supervise the image recognition data In order to improve the accuracy and reliability of motion recognition, the robustness is better.

In the above embodiments, the descriptions of the various embodiments are different, and the details that are not detailed in a certain embodiment can be referred to the related descriptions of other embodiments.

An embodiment of the present invention further provides an aircraft, including

a power system that provides flight power to the aircraft;

The motion recognition-based network training device according to any one of the foregoing embodiments of FIG. 10 is configured to train the network model of motion recognition.

In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the modules is only a logical function division. In actual implementation, there may be another division manner, for example, multiple modules or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or module, and may be electrical, mechanical or otherwise.

The modules described as separate components may or may not be physically separated. The components displayed as modules may or may not be physical modules, that is, may be located in one place, or may be distributed to multiple network modules. . Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, each functional module in each embodiment of the present invention may be integrated into one processing module, or each module may exist physically separately, or two or more modules may be integrated into one module. in. The above integrated modules can be implemented in the form of hardware or in the form of hardware plus software function modules.

The above-described integrated modules implemented in the form of software function modules can be stored in a computer readable storage medium. The software function modules described above are stored in a storage medium and include instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to perform the methods of the various embodiments of the present invention. Part of the steps. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program code. .

A person skilled in the art can clearly understand that for the convenience and brevity of the description, only the division of each functional module described above is exemplified. In practical applications, the above function assignment can be completed by different functional modules as needed, that is, the device is installed. The internal structure is divided into different functional modules to perform all or part of the functions described above. For the specific working process of the device described above, refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.

It should be noted that the above embodiments are merely illustrative of the technical solutions of the present disclosure, and are not intended to be limiting; although the present disclosure has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that The technical solutions described in the foregoing embodiments may be modified, or some or all of the technical features may be equivalently replaced; and the modifications or substitutions do not deviate from the technical solutions of the embodiments of the present disclosure. range.

Claims

A motion recognition device, comprising:

An acquiring module, configured to acquire motion data detected by an external device for the current motion;

And a processing module, configured to convert the motion data acquired by the acquiring module into frequency domain data, and use the frequency domain data to identify an action corresponding to the frequency domain data.
The device of claim 1 wherein:

The processing module is specifically configured to input the frequency domain data into a network model to identify an action corresponding to the frequency domain data by using the network model.
The device according to claim 2, characterized in that

The network model includes a neural network.
Device according to claim 2 or 3, characterized in that

The acquiring module is further configured to acquire motion data detected by the external device for the preset action;

The processing module is further configured to convert motion data corresponding to the preset action into frequency domain data, and use the frequency domain data corresponding to the preset action and the preset action to the network model Train.
Apparatus according to any one of claims 1 to 4, wherein

The processing module is further configured to perform regularization processing on the frequency domain data to reduce over-fitting of the frequency domain data.
Apparatus according to any one of claims 1 to 5, wherein

The processing module is further configured to perform normalization processing on the motion data.
Apparatus according to any of claims 1-6, characterized in that

The motion data includes: data obtained by sampling and acquiring data output by the motion sensor of the external device in a preset time period, or sampling the data output by the motion sensor of the external device by a preset number of times. The data obtained.
Apparatus according to any of claims 1-7, wherein

The processing module is further configured to control the aircraft according to the identified action.
A motion recognition device, comprising:

a first acquiring module, configured to acquire motion data detected by the external device for the current motion, and acquire feature data corresponding to the current motion according to the motion data;

a second acquiring module, configured to acquire an image collected for the current action, and process the image to obtain image recognition data corresponding to the current action;

a fusion module, configured to fuse feature data corresponding to the current action and image recognition data to obtain fusion data;

And a processing module, configured to identify an action corresponding to the merged data.
The device of claim 9 wherein:

The processing module is specifically configured to input the merged data into a network model to identify an action corresponding to the merged data by using the network model.
The device of claim 10 wherein:

The network model includes a neural network.
A device according to any one of claims 9-11, wherein

The first acquiring module is configured to convert the motion data into frequency domain data when the feature data corresponding to the current action is acquired according to the motion data, to use the frequency domain data as Feature data corresponding to the current action;

The merging module is specifically configured to combine the frequency domain data and the image identification data acquired by the second acquiring module to obtain fused data.
Device according to claim 10 or 11, characterized in that

The first acquiring module is further configured to acquire motion data detected by the external device for the preset action, and acquire feature data corresponding to the preset action according to the motion data;

The second acquiring module is further configured to acquire an image collected for the preset action, and process the collected image to obtain image recognition data corresponding to the preset action;

The merging module is further configured to combine the image identification data corresponding to the preset action with the feature data corresponding to the preset action to obtain fused data corresponding to the preset action;

The processing module is further configured to train the network model by using the fused data corresponding to the preset action and the preset action.
A device according to any one of claims 9-13, wherein

The motion data includes data obtained by sampling and acquiring data output by the motion sensor of the external device in a preset time period, or sampling the data output by the motion sensor of the external device by a preset number of times. The data.
A device according to any one of claims 9-14, wherein

The processing module is further configured to control the aircraft according to the identified action.
A network training device based on motion recognition, comprising:

a first acquiring module, configured to acquire motion data detected by an external device for a preset motion;

a second acquiring module, configured to acquire an image collected for the preset action, and process the image to obtain image recognition data corresponding to the preset action;

a determining module, configured to identify an action corresponding to the motion data;

a processing module, configured to perform supervised learning on the image identification data acquired by the second obtaining module by using the action identified by the determining module, and perform pre-determined network model based on the image recognition data after the supervised learning training.
The device of claim 16 wherein:

The processing module is specifically configured to use the depth learning to input the image recognition data acquired by the second acquiring module as an input, and use the action as a target output to perform supervised learning.
A device according to claim 16 or 17, wherein

The first acquiring module is further configured to acquire feature data corresponding to the preset action according to the motion data;

The processing module is further configured to fuse the feature data with the image data after the supervised learning to obtain the fused data, and use the fused data to train the preset network model.
The device of claim 18, wherein

The first acquiring module is configured to convert the motion data into frequency domain data when the feature data corresponding to the current action is acquired according to the motion data, to use the frequency domain data as The feature data corresponding to the current action.
Device according to claims 16-19, characterized in that

The network model includes a neural network.
A motion recognition method, comprising:

Acquiring motion data detected by an external device for the current motion;

Converting the motion data to frequency domain data and using the frequency domain data to identify an action corresponding to the frequency domain data.
The method according to claim 21, wherein the using the frequency domain data to identify an action corresponding to the frequency domain data comprises:

The frequency domain data is input to a network model to identify an action corresponding to the frequency domain data by the network model.
The method of claim 33, wherein

The network model includes a neural network.
The method according to claim 22 or 23, wherein the method further comprises:

Acquiring motion data detected by an external device for a preset motion;

And converting the motion data corresponding to the preset action into frequency domain data, and training the network model by using the frequency domain data corresponding to the preset action and the preset action.
The method according to any one of claims 21 to 24, wherein the method further comprises:

The frequency domain data is regularized to reduce over-fitting of the frequency domain data.
The method according to any one of claims 21 to 25, wherein the method further comprises:

The motion data is normalized.
A method according to any one of claims 21-26, characterized in that

The motion data includes data obtained by sampling and acquiring data output by the motion sensor of the external device in a preset time period, or sampling the data output by the motion sensor of the external device by a preset number of times. The data.
The method according to any one of claims 21 to 27, wherein after the using the frequency domain data to identify an action corresponding to the frequency domain data, the method further comprises:

The aircraft is controlled according to the identified action.
A motion recognition method, comprising:

Acquiring motion data detected by the external device for the current motion, and acquiring feature data corresponding to the current motion according to the motion data;

Acquiring an image acquired for the current action, and processing the image to obtain image recognition data corresponding to the current action;

Combining the feature data corresponding to the current action and the image identification data to obtain the fused data;

Identifying an action corresponding to the fused data.
The method according to claim 29, wherein the identifying an action corresponding to the fused data comprises:

The fused data is input to a network model to identify an action corresponding to the fused data by the network model.
The method of claim 30 wherein:

The network model includes a neural network.
The method according to any one of claims 29 to 31, wherein the acquiring the feature data corresponding to the current action according to the motion data comprises:

Converting the motion data into frequency domain data to use the frequency domain data as feature data corresponding to the current action;

The merging the feature data corresponding to the current action and the image identification data to obtain the fused data includes:

The frequency domain data and the image recognition data are fused to obtain fused data.
The method according to claim 30 or 31, wherein the method further comprises:

Obtaining motion data detected by the external device for the preset action, and acquiring feature data corresponding to the preset action according to the motion data;

Acquiring an image acquired for the preset action, and processing the acquired image to obtain image recognition data corresponding to the preset action;

Combining the image identification data corresponding to the preset action with the feature data corresponding to the preset action to obtain fusion data corresponding to the preset action;

The network model is trained using the fused data corresponding to the preset action and the preset action.
A method according to any of claims 29-33, wherein

The motion data includes a motion sensor output to the external device for a preset period of time The data is sampled to obtain the obtained data, or the data output by the motion sensor outputted by the motion sensor of the external device is obtained by sampling a preset number of times.
The method according to any one of claims 29 to 34, wherein after the identifying the action corresponding to the fused data, the method further comprises:

The aircraft is controlled according to the identified action.
A network training method based on motion recognition, characterized in that it comprises:

Acquiring motion data detected by an external device for a preset motion;

Acquiring an image acquired for the preset action, and processing the image to obtain image recognition data corresponding to the preset action;

Identifying an action corresponding to the motion data;

The image recognition data is supervised and learned by the identified action, and the preset network model is trained based on the supervised learning image recognition data.
The method according to claim 36, wherein said supervising learning said image identification data by said identified said action comprises:

Using depth learning, the image recognition data is taken as an input, and the action is output as a target for supervised learning.
The method of claim 36 or 37, wherein the method further comprises:

Acquiring feature data corresponding to the preset action according to the motion data;

The training based on the image recognition data after the supervised learning to the preset network model includes:

Merging the feature data with the supervised and learned image recognition data to obtain fusion data;

The preset network model is trained using the fusion data.
The method of claim 38, wherein said obtaining from said motion data Taking feature data corresponding to the current action, including:

Converting the motion data into frequency domain data to use the frequency domain data as feature data corresponding to the current action.
Method according to claims 36-39, characterized in that

The network model includes a neural network.
A motion recognition device, comprising: a processor and a communication interface, wherein the processor is connected to the communication interface; wherein

The communication interface is configured to acquire motion data detected by an external device for a current motion;

The processor is configured to convert the motion data into frequency domain data, and use the frequency domain data to identify an action corresponding to the frequency domain data.
The device according to claim 41, wherein

The processor is specifically configured to input the frequency domain data into a network model to identify an action corresponding to the frequency domain data by using the network model.
The device according to claim 43, wherein

The network model includes a neural network.
A device according to claim 42 or 43, wherein

The communication interface is further configured to acquire motion data detected by the external device for the preset action;

The processor is further configured to convert the motion data corresponding to the preset action into frequency domain data, and use the frequency domain data corresponding to the preset action and the preset action to the network model Train.
Apparatus according to any of claims 41-44, wherein

The processor is further configured to perform regularization processing on the frequency domain data to reduce the frequency domain Over-fitting of the data.
Apparatus according to any of claims 41-45, characterized in that

The processor is further configured to perform normalization processing on the motion data.
Apparatus according to any of claims 41-46, wherein

The motion data includes data obtained by sampling and acquiring data output by the motion sensor of the external device in a preset time period, or sampling the data output by the motion sensor of the external device by a preset number of times. The data.
Apparatus according to any of claims 41-47, wherein

The processor is further configured to control the aircraft according to the identified action.
A motion recognition device, comprising: a processor, a communication interface, and an image acquisition device, wherein the processor is respectively connected to the image acquisition device and the communication interface, wherein

The communication interface is configured to acquire motion data detected by an external device for a current motion;

The image obtaining device is configured to collect an image for the current motion;

The processor is configured to acquire an image acquired by the image acquiring device for the current action, and process an image acquired by the image acquiring device to obtain image recognition data corresponding to the current action, according to the motion The data is acquired by the feature data corresponding to the current action, and the feature data corresponding to the current action and the image recognition data are merged to obtain the merged data, and the action corresponding to the merged data is identified.
The device according to claim 49, wherein

The processor is specifically configured to input the merged data into a network model to identify an action corresponding to the merged data by using the network model.
The device according to claim 50, characterized in that

The network model includes a neural network.
Apparatus according to any of claims 49-51, characterized in that

The processor is specifically configured to convert the motion data into frequency domain data, and fuse the frequency domain data and the image identification data to obtain fused data.
A device according to claim 50 or 51, wherein

The communication interface is further configured to acquire motion data detected by the external device for the preset action;

The image obtaining device is further configured to collect an image for the preset action;

The processor is further configured to acquire an image acquired by the image acquiring device for the current action, and process an image acquired by the image acquiring device to obtain image recognition data corresponding to the preset action; And acquiring the feature data corresponding to the preset action, and combining the image identification data corresponding to the preset action with the feature data corresponding to the preset action to obtain a corresponding action corresponding to the preset action Converging data; training the network model with the fused data corresponding to the preset action and the preset action.
Apparatus according to any of claims 49-53, wherein

The motion data includes data obtained by sampling and acquiring data output by the motion sensor of the external device in a preset time period, or sampling the data output by the motion sensor of the external device by a preset number of times. The data.
Apparatus according to any of claims 49-54, wherein

The processor is further configured to control the aircraft according to the identified action.
A network training device based on motion recognition, comprising: an image acquisition device, a processor, and a communication interface, wherein the processor is respectively connected to the image acquisition device and the communication interface, wherein

The communication interface is configured to acquire motion data detected by an external device for a preset motion;

The image obtaining device is configured to collect an image for the preset action;

The processor is configured to acquire an image that is collected by the image acquiring device for the preset action, And processing the image acquired by the image acquiring device to obtain image recognition data corresponding to the preset action; identifying an action corresponding to the motion data; and supervising the image recognition data by using the identified action Learning, and training the preset network model based on the image recognition data after the supervised learning.
The device according to claim 56, wherein

The processor is specifically configured to use the depth learning to take the image recognition data as an input, and use the action as a target output to perform supervised learning.
A device according to claim 56 or 57, wherein

The processor is further configured to: acquire feature data corresponding to the preset action according to the motion data; and fuse the feature data with the image recognition data after the supervised learning to obtain fusion data; The fusion data trains the preset network model.
Apparatus according to any of claims 56-58, wherein

The processor is specifically configured to convert the motion data into frequency domain data to use the frequency domain data as feature data corresponding to the current action.
Device according to claims 56-59, characterized in that

The network model includes a neural network.
An aircraft characterized in that

a power system for powering the aircraft;

The motion recognition device according to any one of claims 41 to 48, wherein the motion is identified.
An aircraft characterized in that

a power system for powering the aircraft;

The motion recognition device according to any one of claims 49 to 55, wherein the motion is identified.
An aircraft characterized in that

a power system for powering the aircraft;

A motion recognition based network training device according to any one of claims 56-60 for training a network model of motion recognition.