CN117312935A

CN117312935A - Action category identification method, device, computer equipment and storage medium

Info

Publication number: CN117312935A
Application number: CN202210704904.7A
Authority: CN
Inventors: 杨伟明; 王少鸣; 郭润增; 唐惠忠; 张菁芸
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-06-21
Filing date: 2022-06-21
Publication date: 2023-12-29

Abstract

The application relates to an action category recognition method, an action category recognition device, a computer device, a storage medium and a computer program product. The method comprises the following steps: acquiring an action radar data sequence, wherein the action radar data sequence comprises a plurality of radar data arranged according to acquisition time sequence; dividing the action radar data sequence into data subsequences with the same time span, and respectively carrying out feature extraction on each data subsequence to obtain respective subsequence features of each data subsequence; performing action category identification processing on each sub-sequence feature respectively, and determining action category sub-tags matched with each sub-sequence feature respectively; and determining the action category indicated by the target sub-label meeting the screening condition in the action category sub-label as an action category identification result of the action radar data sequence. The method can be applied to the field of automatic driving, and accuracy of the action category recognition result can be improved by adopting the method.

Description

Action category identification method, device, computer equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence technology, and in particular, to a method, an apparatus, a computer device, a storage medium, and a computer program product for identifying an action category.

Background

With the development of artificial intelligence technology, specific actions can be directly used as a human-computer interaction mode to transmit information which is wanted to be expressed by human beings in specific occasions, so that communication between the human beings and the computer is realized. The action recognition may be used to enable confirmation of the behavior generated by the user or to enable more entertainment applications such as intelligent sightseeing, virtual reality games, etc. in combination with virtual reality technology.

However, the current motion category recognition method is realized based on recognition of motion image data, and has weak generalization capability, so that accuracy of motion category recognition results is low.

Disclosure of Invention

In view of the foregoing, it is desirable to provide an action category recognition method, apparatus, computer device, computer-readable storage medium, and computer program product that can improve the accuracy of action category recognition.

In a first aspect, the present application provides a method for identifying an action category. The method comprises the following steps:

acquiring an action radar data sequence, wherein the action radar data sequence comprises a plurality of radar data arranged according to acquisition time sequence;

dividing the action radar data sequence into data subsequences with the same time span, and respectively carrying out feature extraction on each data subsequence to obtain respective subsequence features of each data subsequence;

Performing action category identification processing on each sub-sequence feature respectively, and determining action category sub-tags matched with each sub-sequence feature respectively;

and determining the action category indicated by the target sub-label meeting the screening condition in the action category sub-label as an action category identification result of the action radar data sequence.

In a second aspect, the present application further provides an action category recognition device. The device comprises:

the radar data acquisition module is used for acquiring an action radar data sequence, wherein the action radar data sequence comprises a plurality of radar data arranged according to acquisition time sequence;

the characteristic extraction module is used for dividing the action radar data sequence into data subsequences with the same time span, and respectively extracting the characteristics of each data subsequence to obtain the respective subsequence characteristics of each data subsequence;

the sub-tag identification module is used for respectively carrying out action category identification processing on each sub-sequence feature and determining action category sub-tags matched with each sub-sequence feature;

and the action category determining module is used for determining the action category indicated by the target sub-tag meeting the screening condition in the action category sub-tag as an action category identification result of the action radar data sequence.

In one embodiment, the feature extraction module is further configured to extract, for each data subsequence, a time domain feature of the data subsequence, and perform fourier transform processing on the data subsequence to obtain a frequency domain feature of the data subsequence; and carrying out feature fusion on the time domain features and the frequency domain features to obtain the respective corresponding subsequence features of each data subsequence.

In one embodiment, the feature extraction module is further configured to perform time sequence feature learning on each data subsequence based on a recurrent neural network of a convolutional recurrent neural network model, so as to obtain respective time sequence features of each data subsequence; and respectively performing deep feature learning on each time sequence feature based on a convolutional neural network of the convolutional cyclic neural network model to obtain each subsequence feature of each data subsequence.

In one embodiment, the recurrent neural network is composed of at least two bidirectional recurrent neural network layers, and the number of the recurrent units in each bidirectional recurrent neural network layer is equal to the number of the data subsequences.

In one embodiment, the action category recognition device further includes a model training module, configured to obtain a sample radar data sequence carrying an action category tag, where the sample radar data sequence includes a positive sample carrying a target action tag and a negative sample carrying a non-target action tag; dividing the sample radar data sequence into a plurality of sample data subsequences according to the same time span for each sample radar data sequence; marking the action category label of the sample radar data sequence as the action category label of the sample data subsequence; and carrying out parameter training on the convolutional neural network model based on the sample subsequence characteristics of each sample data subsequence and the action category labels carried by the sample subsequence characteristics until the training termination condition of the convolutional neural network model is met.

In one embodiment, the action category recognition device further includes a data smoothing module, configured to perform data smoothing on the action radar data sequence according to an arrangement relationship of radar data in the action radar data sequence to obtain a smoothed radar data sequence, and the feature extraction module is further configured to divide the smoothed radar data sequence into data subsequences with the same time span.

In one embodiment, the feature extraction module is further configured to determine a window size and a sliding step size of a sliding window that matches the dividing parameter based on the dividing parameter of the action radar data sequence; and sliding the sliding window in the action radar data sequence according to the sliding step length, and determining radar data contained in the sliding window after each sliding process as a data subsequence.

In one embodiment, the action category determining module is further configured to screen, based on an accumulated result of each action category sub-tag, a target sub-tag whose accumulated result meets a screening condition from the action category sub-tags; and determining the action category indicated by the target sub-tag as an action category identification result of the action radar data sequence.

In one embodiment, the action category determining module is further configured to perform accumulation processing on the weight data of the same action category sub-tag to obtain an accumulated weight of each action category sub-tag; and screening the target sub-label with the largest accumulated weight from the action category sub-labels.

In one embodiment, the action radar data sequence comprises a gesture radar data sequence; the action category recognition device further comprises a gesture detection signal sending module, which is used for responding to the gesture detection event and sending a gesture detection signal to the radar sensor; the radar data acquisition module is also used for acquiring a gesture radar data sequence acquired after the radar sensor receives the gesture detection signal.

In one embodiment, the action category recognition result includes a gesture category recognition result; the action category recognition device further comprises a prompt information display module, a data interaction trigger event detection module and a gesture detection module, wherein the prompt information display module is used for responding to the data interaction trigger event of a target user, determining a data interaction confirmation gesture matched with the data interaction trigger event and triggering a gesture detection event of the target user; displaying prompt information aiming at the data interaction confirmation gesture; and when the gesture type recognition result is the same as the data interaction confirmation gesture, executing a data interaction flow aiming at the target user.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:

In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

In a fifth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:

According to the action type recognition method, the device, the computer equipment, the storage medium and the computer program product, the acquired action radar data sequence comprises a plurality of radar data arranged according to the acquisition time sequence, the rapid and accurate division of the data subsequences with the same time span can be conveniently realized, the subsequence characteristics of each data subsequence are further obtained efficiently, the action type sub-label matched with each subsequence characteristic is determined by respectively carrying out action type recognition processing on each subsequence characteristic, so that the action type recognition can be carried out respectively according to the radar data with the same time span and different time periods, the action type sub-labels corresponding to different time periods are obtained, the action type recognition result of the action radar data sequence is determined by carrying out the target sub-label screening mode based on the screening condition, and the accuracy of the action type recognition result can be effectively improved by converting the action type recognition result into the processing mode of scattered labels and the label screening.

Drawings

FIG. 1 is a diagram of an application environment for a method of action class identification in one embodiment;

FIG. 2 is a flow chart of a method for identifying action categories in one embodiment;

FIG. 3 is a schematic diagram of different types of gestures in one embodiment;

FIG. 4 is a schematic diagram of a convolutional recurrent neural network model in one embodiment;

FIG. 5 is a schematic diagram of a circulation unit of a recurrent neural network in one embodiment;

FIG. 6 is a schematic diagram of a convolutional recurrent neural network model in another embodiment;

FIG. 7 is a schematic diagram of different action categories defined in one embodiment;

FIG. 8 is a flow chart of a method for identifying action categories in one embodiment;

FIG. 9 is a flowchart of a method for identifying action categories according to another embodiment;

FIG. 10 is a flow chart of a method of action class identification in yet another embodiment;

FIG. 11 is a schematic diagram of a scenario in which a game task is controlled in a game scenario in one embodiment;

FIG. 12 is a schematic view of a scenario in which motion of a vehicle model is controlled in one embodiment;

FIG. 13 is a block diagram of an action class recognition device in one embodiment;

FIG. 14 is an internal block diagram of a computer device in one embodiment;

fig. 15 is an internal structural view of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and other directions.

Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.

With research and progress of artificial intelligence technology, research and application of artificial intelligence technology are being developed in various fields, such as common smart home, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, autopilot, unmanned, robotic, smart medical, smart customer service, car networking, autopilot, smart transportation, etc., and it is believed that with the development of technology, artificial intelligence technology will be applied in more fields and will be of increasing importance.

The action category identification method provided by the embodiment of the application relates to an artificial intelligence machine learning technology, and can be particularly applied to an application environment shown in fig. 1. Wherein the radar data acquisition device 102 communicates with the processor 104 via a network. The data storage system may store data that the processor 104 needs to process. The data storage system may be integrated on the processor 104 or may reside on the cloud or other processor or server. The processor 104 acquires an action radar data sequence composed of a plurality of radar data which are uploaded by the radar data acquisition device 102 and arranged according to the acquisition time sequence, divides the action radar data sequence into data subsequences with the same time span, and respectively performs feature extraction on each data subsequence to obtain the respective subsequence feature of each data subsequence; respectively carrying out action category identification processing on each sub-sequence feature, and determining action category sub-tags matched with each sub-sequence feature; and determining the action category indicated by the target sub-label meeting the screening condition in the action category sub-label as an action category identification result of the action radar data sequence.

The radar data acquisition device 102 may be, but is not limited to, a device with a radar echo data acquisition function, such as a radar sensor, or various desktop computers, notebook computers, smart phones, tablet computers, internet of things devices, portable wearable devices, and the like integrated with the device, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The processor 104 may be integrated with the radar data collection device 102 in the same device, or may be implemented as a stand-alone server or a server cluster comprising a plurality of servers.

In a possible implementation manner, the method for identifying the action category is applied to an intelligent payment platform, wherein the radar data acquisition device 102 can be integrated in self-service cashing equipment, when a user uses the self-service cashing equipment, the radar data acquisition device 102 on one side of the self-service cashing equipment can acquire an action radar data sequence of the user within a period of time, the action radar data sequence comprises a plurality of radar data arranged according to acquisition time sequence, the radar data acquisition device 102 uploads the action radar data sequence to the processor 104 through a transmission network, the processor 104 can perform feature extraction and operation on each data subsequence in the action radar data sequence through a machine learning model, an action category label corresponding to each data subsequence is identified, finally, an action category identification result of the action radar data sequence is determined, and a data processing flow corresponding to the action category identification result is executed. For example, when the action category recognition result matches a confirmation preset action for characterizing a user's payment operation, a data processing flow for confirming the payment is performed.

In another possible implementation, the action category identification method is applied to a touch-free trigger scenario instead of the conventional touch key confirmation, where the radar data acquisition device 102 may be a millimeter wave radar sensor. Taking an elevator taking as an example, a millimeter wave radar sensor can be arranged at the position of an elevator key panel, the millimeter wave radar sensor collects gesture radar data sequences of a user within a period of time, the gesture radar data sequences comprise a plurality of gesture radar data sequences which are arranged according to a collection time sequence, the millimeter wave radar sensor uploads the gesture radar data sequences to a processor 104 through a transmission network, the processor 104 can perform feature extraction and operation on each data subsequence in the gesture radar data sequences through a machine learning model, a gesture type label corresponding to each data subsequence is identified, finally, the identification result of the gesture radar data sequences is determined, and elevator key operation represented by the gesture identification result is executed.

In one embodiment, as shown in fig. 2, an action category recognition method is provided, and the method is applied to the processor in fig. 1 for illustration, and includes the following steps:

step 202, an action radar data sequence is acquired, wherein the action radar data sequence comprises a plurality of radar data arranged according to acquisition time sequence.

The radar data is radar echo data acquired by a device with a radar echo data acquisition function, and the device with the radar echo data acquisition function can be a radar sensor or a radar receiver. The radar sensor is a sensor which continuously emits preset waveforms, receives echo signals reflected by an object by a radar, sends the echo signals to an acquisition board, and finishes sampling the echo signals by the acquisition board to obtain radar echo data carrying acquisition time. The radar echo data of a plurality of radar echo data arranged according to the acquisition time sequence is an action radar data sequence. The radar sensor comprises a transmitter, a transmitting antenna, a receiver, a receiving antenna, a processing part, a display, a power supply device, a data input device, anti-interference auxiliary devices and the like. The working principle is that the transmitter emits electromagnetic wave energy to a certain direction of space through the antenna, the object in the direction reflects the electromagnetic wave, the radar antenna receives the reflected wave, and the reflected wave is sent to the receiving equipment for processing, and the information such as distance, distance change rate, azimuth, altitude and the like of the object to the radar is extracted. The wave radar sensor has the characteristics of strong capability of penetrating fog, smoke and dust and strong anti-interference capability. The radar sensor can be a centimeter wave radar sensor or a millimeter wave radar sensor, and is a radar working in a centimeter wave band for detection, wherein the centimeter wave refers to electromagnetic waves in a 3-30 GHz frequency domain (the wavelength is 1-10 cm); the millimeter wave radar sensor works in the radar detected by the millimeter wave band, and the millimeter wave refers to electromagnetic waves in the frequency domain of 30-300 GHz (the wavelength is 1-10 mm). Compared with a centimeter wave radar sensor, the millimeter wave radar sensor has the characteristics of small volume, easy integration and high spatial resolution. For convenience of description, the following embodiments will take a radar sensor as an example of a radar data acquisition device.

In particular, the radar data acquired by the radar sensor may be time series data acquired for actions made on the target object. The actions performed by the target object refer to a series of consecutive actions performed by a specified part of the human body, such as a hand, a head, etc. Wherein the target object may be determined based on a specific application scenario, for example, in a shopping scenario, the target object may be a shopper; in a game scenario, the target object may be a game user; in a live scenario, the target object may be a host. In different application scenarios, the actions for performing category recognition may be the same or different. For example, the action to be subjected to category recognition may be a head action such as opening mouth, blinking, shaking head, or the like, or a hand action such as a gesture of a gesture, or the like. Specifically, as shown in fig. 3, the gesture to be recognized may be a gesture that is maintained statically or may be a gesture that changes dynamically.

The action radar data sequence can be radar data which is acquired by the radar sensor after being started and reported to the processor, or radar data which is acquired by the radar sensor after receiving an action detection instruction issued by the processor and reported to the processor.

In a specific application, when the radar sensor is in a non-starting state, such as a closing state or a sleeping state, and when the processor starts to act as a detection instruction to the sensor, the sensor is awakened to collect radar data and report the radar data to the processor, and when the processor finishes identifying the action type, the sensor closing instruction is issued to the sensor so that the sensor is in the closing state or the sleeping state, so that the use time of the sensor is shortened, and the electricity consumption of terminal equipment where the sensor is located is reduced.

In another specific application, the radar sensor is in a starting state after being electrified and collects real-time radar data, and after receiving an action detection instruction issued by the processor, the radar data collected in real time are reported to the processor, so that the processor only receives the radar data collected after the issued action detection instruction, invalid occupation of storage space caused by receiving excessive unnecessary data is avoided, and the utilization rate of the storage space is improved.

And 204, dividing the action radar data sequence into data subsequences with the same time span, and respectively extracting features of each data subsequence to obtain respective subsequence features of each data subsequence.

Wherein the data subsequence is a sequence of a portion of consecutive radar data in the sequence of action radar data. In the case where the time intervals between the radar data in the motion radar data sequence are the same, the data subsequences having the same time span may be sequences composed of the same number of consecutive radar data selected from the motion radar data sequence. In order to ensure analysis of each radar data in the action radar data sequence, the action radar data sequence is a union of the data subsequences obtained by division. For example, the sequence of action radar data acquired by the sensor is 50 frames of continuous radar data with a time interval of 0.1s, and the processor may divide the sequence of action radar data into 5 or more data subsequences with a time span of 1s, for example, an action radar data sequence composed of 10 continuous radar data with a time span of 1 s.

Specifically, the division of the data subsequences includes two cases, the first is that when the respective data subsequences are arranged in the acquisition timing, there is no repeated radar data in the former data subsequence and the latter data subsequence. The last radar data in the previous data subsequence and the first radar data in the next data subsequence may be adjacent radar data of the action radar data sequence, that is, the acquisition time of the last radar data in the previous data subsequence is earlier than the acquisition time of the first radar data in the next data subsequence, and the time interval between the last radar data and the first radar data is the time interval between any two adjacent radar data of the action radar data sequence. For example, the sequence number of 50 frames of radar data in the action radar data sequence is 1 to 50, the processor may divide 1 to 10 into a first data sub-sequence, 11 to 20 into a second data sub-sequence, and so on, resulting in 5 data sub-sequences.

And the second is that when each data subsequence is arranged according to the acquisition time sequence, the previous data subsequence and the next data subsequence have partially repeated radar data, namely the acquisition time of the last radar data in the previous data subsequence is later than the acquisition time of the first radar data in the next data subsequence. For example, the sequence number of 50 frames of radar data in the action radar data sequence is 1 to 50, the processor may divide 1 to 10 into a first data sub-sequence, 5 to 15 into a second data sub-sequence, 10 to 20 into a third data sub-sequence, and so on, resulting in 10 data sub-sequences.

Under the condition that the time intervals among radar data in the action radar data sequence are different, the processor acquires data subsequences with the same time span from the action radar data sequence according to the time stamp carried by each radar data.

A subsequence feature is a data representation of a data feature used to describe a data subsequence. The feature extraction of the data subsequence can be implemented by adopting a feature extraction mode matched with the data type based on the data type of the radar data, for example, the radar data is reflection wave data of electromagnetic waves, and the data subsequence is a signal waveform diagram. The processor may perform waveform feature extraction on the signal waveform graph to obtain sub-sequence features. For another example, when the radar data is infrared data, the data subsequence is an infrared spectrogram. The processor can perform feature extraction on the infrared spectrogram to obtain subsequence features.

Specifically, the processor acquires time distribution intervals of radar data in the action radar data sequence, divides the action radar data sequence into data subsequences with the same time span according to the time distribution intervals, and respectively performs feature extraction on the data subsequences to obtain respective subsequence features of each data subsequence. The feature extraction process of each data subsequence may be performed synchronously or asynchronously. In a specific implementation, the processor can sequentially perform feature extraction processing on each data subsequence through one thread, so as to reduce occupation of data processing resources at the same time. The processor can also perform feature extraction processing on the data subsequence through at least two threads in parallel so as to improve the feature extraction speed. The number of threads for performing feature extraction processing can be equal to the number of data subsequences, so that feature extraction can be completed on all the data subsequences at one time, or the number of threads for performing feature extraction on all the data subsequences can be smaller than the number of the data subsequences, and feature extraction is performed on all the data subsequences according to batches, so that both data processing efficiency and occupation of data processing resources at the same time are considered.

In a specific application, the processor can input the action radar data sequence into a machine learning model, divide the action radar data sequence into data subsequences with the same time span based on the machine learning model, and then respectively perform feature extraction on each data subsequence based on a network structure of the machine learning model to obtain respective subsequence features of each data subsequence so as to realize rapid and accurate extraction of the subsequence features, wherein the machine learning model can be a model obtained by performing supervised training based on sample data.

Step 206, performing action category recognition processing on each sub-sequence feature, and determining action category sub-tags matched with each sub-sequence feature.

The action category recognition processing refers to a data processing process of analyzing the characteristics of radar data corresponding to a specified action to recognize the action category to which the specified action belongs. The action category recognition processing can be implemented based on a machine learning model, and specifically, the machine learning model for performing the action category recognition processing can be obtained by acquiring a training sample corresponding to a target action category and performing parameter training on the model based on the target action category required to be recognized in a specific application scene. Machine learning models used in different application scenarios may be implemented based on training samples with different action category labels. Specifically, the action category identification process may be implemented by a convolutional recurrent neural network model (Convolutional Recurrent Neural Network, abbreviated as CRNN), or may be implemented by a recurrent neural network (Recurrent Neural Network, abbreviated as RNN) or a convolutional neural network (Convolutional Neural Network, abbreviated as CNN).

Specifically, the processor determines a machine learning model matched with a scene identifier based on the scene identifier matched with a sensor for acquiring the action radar data sequence, and performs action category identification processing on sub-sequence features through the machine learning model, so as to obtain action category sub-tags matched with each sub-sequence feature.

In a specific application, taking a specific application scene as an example of a shopping settlement scene, taking the matched payment confirmation action under the shopping settlement scene as a first gesture, canceling the payment action as a second gesture, and the machine learning model realizes parameter training of the machine learning model by carrying positive sample data of a first gesture type identifier, a second gesture type identifier and negative sample data of other gesture types, so that the trained machine learning model can carry out gesture type recognition processing in the application process, and determining gesture type sub-labels with matched subsequence characteristics.

And step 208, determining the action category indicated by the target sub-label meeting the screening condition in the action category sub-label as an action category identification result of the action radar data sequence.

The action type sub-labels are labels matched with each data sub-sequence, and because more than one action can be made by a user in the action radar data sequence, the sub-label prediction is respectively carried out on the data sub-sequences in the action radar data sequence by segmentation, and the accurate identification of the main action type in the action radar data sequence can be realized by setting screening conditions, so that the accuracy of the action type identification result is improved.

Specifically, the screening condition for screening out the target sub-label may be specifically screening out the action category sub-label with the largest accumulated result, or screening out the action category sub-label with the largest numerical value after the number of occurrences is weighted, where the specific screening condition may be set according to the actual application scenario. The weight data adopted by the weight calculation can be obtained based on the prediction probability of a machine learning model, and can also be obtained based on the distribution position of the data subsequence represented by the subsequence characteristics in the whole action radar data sequence.

According to the action type recognition method, the acquired action radar data sequence comprises a plurality of radar data arranged according to the acquisition time sequence, the rapid and accurate division of the data subsequences with the same time span can be conveniently realized, the subsequence characteristics of each data subsequence are further obtained efficiently, the action type sub-tags matched with each subsequence characteristic are determined through the action type recognition processing of each subsequence characteristic, the action type recognition can be respectively carried out according to the radar data with the same time span and different time periods, the action type sub-tags corresponding to different time periods are obtained, the action type recognition result of the action radar data sequence is determined through the mode of carrying out target sub-tag screening based on the screening condition, and the accuracy of the action type recognition result can be effectively improved through the processing mode of changing the whole into the scattered and tag screening.

In one embodiment, feature extraction is performed on each data subsequence to obtain a respective subsequence feature of each data subsequence, including: extracting time domain features of the data subsequences aiming at each data subsequence, and performing fast Fourier transform processing on the data subsequences to obtain frequency domain features of the data subsequences; and carrying out feature fusion on the time domain features and the frequency domain features to obtain the subsequence features corresponding to each data subsequence.

The data subsequence may be waveform signal data, the time domain feature refers to a feature of a change of the waveform signal along with a change of time, and the frequency domain feature refers to a feature of a change of the amplitude of the waveform signal along with a change of frequency. In a specific application, the waveform signal may be transformed into frequency spectrum and amplitude by a fast fourier transform, thereby transforming time domain data into frequency domain data.

In particular, the time domain features include zero crossing rate features that are used to characterize the rate of sign change of the signal, e.g., the waveform signal changes from positive to negative, or from positive to negative, which can be used to describe the signal frequency features in the time domain, with greater zero crossing rates characterizing higher frequency approximations of the waveform data. In a specific application, the processor may extract zero-crossing rate data based on the waveform signal data, and binary encode the zero-crossing rate data to obtain the zero-crossing rate feature.

The temporal features include at least one of flatness features and spectral centroid features. In a specific implementation, the processing procedure for acquiring the time domain features by the processor includes: and under the condition that the order of magnitude (fft Size) of the fast Fourier transform is 1024, extracting the flatness data of the data subsequence, and performing feature normalization processing on the flatness data to obtain the flatness feature of the data subsequence. And under the condition that the order of magnitude of the fast Fourier transform is 1024, extracting the spectrum centroid data of the data subsequence, and carrying out feature normalization processing on the spectrum centroid feature to obtain the spectrum centroid feature of the data subsequence.

In a specific application, the processor performs sub-sequence feature extraction on each data sub-sequence, and the feature extraction process includes: based on the waveform diagram of the data subsequence, extracting the time domain feature of the data subsequence, performing fast Fourier transform processing on the data subsequence to obtain a spectrogram of the data subsequence, and based on the spectrogram, extracting the frequency domain feature of the data subsequence, performing feature fusion on the time domain feature and the frequency domain feature by a processor to obtain the subsequence feature corresponding to each data subsequence. The feature fusion of the time domain features and the frequency domain features can be realized by means of feature stitching or feature data operation.

In the embodiment, the time domain features and the frequency domain features of the data subsequence are obtained to determine the subsequence features of the data subsequence, so that the method has good noise immunity, and feature representation can be performed from different data dimensions, so that accurate expression of the features is realized, and the accuracy of feature expression is improved.

In one embodiment, feature extraction is performed on each data subsequence to obtain a respective subsequence feature of each data subsequence, including: based on a convolutional cyclic neural network of a convolutional cyclic neural network model, respectively performing time sequence feature learning on each data subsequence to obtain respective time sequence features of each data subsequence; and (3) respectively performing deep feature learning on each time sequence feature based on a convolutional neural network of the convolutional cyclic neural network model to obtain each subsequence feature of each data subsequence.

The convolutional cyclic neural network model (Convolutional Recurrent Neural Network, CRNN for short) is a novel neural network architecture that integrates feature extraction, sequence modeling and transcription into a unified framework. In one specific application, the convolutional recurrent neural network model comprises an input layer, a recurrent neural network, a convolutional neural network, a global average pooling layer, a fully connected layer and an output layer which are connected in sequence. Based on the convolutional neural network model obtained through training according to the sample data, the recognition of the action category represented by the action radar data sequence can be realized.

Further, the recurrent neural network (Recurrent Neural Network, abbreviated as RNN) is a recurrent neural network which takes sequence data as input, performs recursion in the evolution direction of the sequence, and connects each recurrent unit in a chained manner. Specifically, the cyclic neural network may be a bidirectional cyclic neural network (Bidirectional RNN, abbreviated as Bi-RNN) or a Long Short-term memory network (Long Short-Term Memory networks, abbreviated as LSTM). The main structure of the bidirectional circulating neural network is the combination of two unidirectional circulating neural networks. For example, at a certain moment, input data is simultaneously provided to two cyclic neural networks with opposite directions, each cyclic neural network generates state and output data at the moment, the output data of the bidirectional cyclic neural network is simply spliced by two unidirectional networks, and other structures are identical except that directions are different. The two network structures are symmetrical, and any cyclic structure such as RNN, LSTM and the like can be selected by any cyclic unit of any network.

In a specific embodiment, the recurrent neural network is composed of at least two layers of bi-directional recurrent neural networks, and the number of recurrent units in each layer of bi-directional recurrent neural network is equal to the number of data subsequences. Specifically, taking the case that the bidirectional circulating neural network includes a first layer bidirectional circulating neural network and a second bidirectional circulating neural network as an example, each circulating unit in the first layer bidirectional circulating neural network corresponds to each circulating unit in the second layer bidirectional circulating neural network one by one, and input data of each circulating unit in the second layer bidirectional circulating neural network is output data of a corresponding circulating unit in the first layer bidirectional circulating neural network. In the embodiment, the time sequence feature extraction is performed through more than two layers of superimposed bidirectional circulating neural networks, so that the scene of motion category identification in continuous time can be identified, more accurate and reliable time sequence features are extracted, and accurate judgment of the motion category is facilitated.

Further, the convolutional neural network (Convolutional Neural Network, abbreviated as CNN) is a feedforward neural network with a depth structure and comprising convolutional calculation, the convolutional neural network has a characteristic learning (representation learning) capability, can carry out translation invariant classification on input information according to a hierarchical structure, and comprises hidden layers, wherein the convolutional nuclear parameter sharing and the sparsity of interlayer connection in the hidden layers enable the convolutional neural network to carry out latticed characteristics with smaller calculation amount.

The method comprises the steps of dividing an action radar data sequence into a plurality of data subsequences by an input layer of a convolutional cyclic neural network model, inputting the data subsequences into each cyclic unit in the cyclic neural network one by one, carrying out time sequence feature processing on the data subsequences to obtain time sequence features, inputting the time sequence features into a deep convolutional neural network, carrying out deep feature learning on the time sequence features through the deep convolutional neural network to obtain deep features, carrying out global average pooling processing on the deep features based on a global average pooling layer to obtain pooled data, carrying out category division on the pooled data based on a full connection layer, determining action category sub-tags matched with the data subsequences, carrying out aggregation on the action category sub-tags matched with the data subsequences, and determining action categories matched with the action radar data sequence.

In a specific application, the model structure of the convolutional neural network model is shown in fig. 4, and the convolutional neural network model includes an input layer, a convolutional neural network, a fully-connected layer and an output layer which are sequentially connected. The input layer divides the action radar data sequence into a plurality of data subsequences based on the data subsequence dividing parameter, wherein the number of the data subsequences obtained by division is the same as the number of the circulating units of each layer in the circulating neural network, as shown in fig. 5, and the number of the data subsequences and the number of the circulating units of each layer in the circulating neural network are 256. Each layer in the cyclic neural network is a bidirectional cyclic neural network, the input data of the next layer of bidirectional cyclic neural network is the output of the previous layer of bidirectional cyclic neural network, and the output data of each cyclic unit in each layer of bidirectional cyclic neural network is the concatenation of forward characteristic data and reverse characteristic data. The output data of the two-way cyclic neural network of the last layer of the cyclic neural network is the output result of the last cyclic unit of the layer. The output result is the input data of the convolutional neural network.

In one specific application, as shown in fig. 6, the convolutional neural network is a one-dimensional deep convolutional neural network, which includes a first Conv1D (one-dimensional convolutional) layer, a first batch norm layer, a first pralu (excitation function) layer, a Dropout (random inactivation) layer, a second Conv1D layer, a second batch norm layer, and a second pralu layer, which are sequentially connected. The first Conv1D layer includes 64 convolution kernels, each of which has a size of 3*1, a stride (convolution step) in the convolution parameter of 1, and a pad (feature map filling width) of 1. The second Conv1D layer comprises 32 convolution kernels, and the size and convolution parameters of the second Conv1D layer are the same as those of the first Conv1D layer. The ratio of Dropout layers may be set to 0.5. For example, the number of the first layer network neurons is 100, the output values of the activation functions are y1, y2, y3, … and y100, the dropout ratio is selected to be 0.5, and after the layer neurons are subjected to dropout processing, the values of 128 neurons in 256 neurons are set to be 0. Specifically, the input data of the first Conv1D layer is the output data of the last layer of the bidirectional cyclic neural network of the cyclic neural network, the output data of the upper layer of the convolutional neural network is the input data of the lower layer, and the output data of the second pralu layer is the input data of the full-connection layer. The fully-connected layer comprises a plurality of nodes, each node is connected with all nodes of the upper layer and used for integrating the features extracted by the front edge, and the output value of the fully-connected layer is transmitted to the softmax layer, and the softmax layer adopts softmax logistic regression (softmax regression) for classification.

In this embodiment, feature extraction processing is performed based on the cyclic neural network, so that the sensitivity of the cyclic neural network to time sequence data can be utilized, accurate time sequence features can be effectively extracted, one-dimensional output data is used as input data of the cyclic neural network, and compared with a processing mode of directly using an action radar data sequence as input, the overall data processing capacity of the cyclic neural network model can be effectively simplified, the requirement on terminal load is reduced, and the application scene of the cyclic neural network model is expanded.

In one embodiment, the training process of the convolutional recurrent neural network model comprises: acquiring a sample radar data sequence carrying an action category label, wherein the sample radar data sequence comprises a positive sample carrying a target action label and a negative sample carrying a non-target action label; dividing the sample radar data sequence into a plurality of sample data subsequences according to the same time span for each sample radar data sequence; marking the action category label of the sample radar data sequence as the action category label of the sample data subsequence; and carrying out parameter training on the convolutional neural network model based on the sample subsequence characteristics of each sample data subsequence and the action category labels carried by the sample subsequence characteristics until the training termination condition of the convolutional neural network model is met.

The number of tag categories of the target action tag may be one or a plurality of. The number of tag categories of non-target action tags is one, and in a specific implementation, all actions that do not belong to a target action may be marked as non-target action tags. Taking the hand gesture as an example, as shown in fig. 7, the target actions may include eight types, namely, G1 waving left, G2 waving right, G3 moving upward, G4 moving downward, G5 waving upward, G6 waving downward, G7 turning from palm to back and G8 turning from back to palm, where the radar data sequences corresponding to the 8 gestures are all positive sample radar data sequences, and the radar data sequences corresponding to the other gestures except the 8 gestures are all negative sample radar data sequences. Through collecting positive sample radar data sequences of 8 different gestures, and respectively marking class labels for each collected sample radar data sequence, different gesture classes can be identified after parameter training is carried out on the convolutional cyclic neural network model.

The same target action can distinguish left and right hands, or can not distinguish left and right hands, and can be specifically set according to actual scene requirements. For example, in a scene with high control precision requirements, the actions of the left hand and the right hand can be distinguished, in a specific implementation, when sample data is acquired, the labels of the actions made by the left hand and the labels of the same actions made by the right hand can be distinguished through different representations, so that the convolutional cyclic neural network model can perform supervised learning based on different labels, and further the specific action category of the left hand and the specific action category of the right hand can be identified. For another example, in other situations, the left hand and the right hand may not be distinguished, and in a specific implementation, when sample data is collected, the actions made by the left hand and the actions made by the right hand may be represented by the same label, so that the convolutional recurrent neural network model can supervise and learn the same gesture types of the left hand and the right hand without distinguishing them, and further can identify the specific gesture type made by any hand.

Specifically, in a sample data acquisition stage, a processor acquires a sample radar data sequence carrying an action category tag, and determines a sample tag of the sample radar data sequence according to an action represented by the sample radar data sequence, wherein the sample tag comprises a target action tag representing a target action category and a non-target action tag representing a non-target action category, the sample radar data sequence marked with the target action tag is a positive sample, and the sample radar data sequence marked with the non-target action tag is a negative sample. Because the convolutional cyclic neural network model needs to identify the types of the divided data subsequences first, before model training, the action type label of each sample data subsequence needs to be determined, and the action type label of each sample data subsequence belonging to the same sample radar data sequence is the same as the action type label of the sample radar data sequence, that is, when the action type label is marked for the sample radar data sequence, the action type label of each data subsequence after the division of the sample radar data sequence is determined. In order to carry out model parameter training based on sample data subsequences carrying action type labels, in the application process of the model, the action type recognition processing can be respectively carried out based on the convolutions of the convolutional neural network model after the training is completed and the subsequence characteristics of each data subsequence, so that the action type sub-labels corresponding to each data subsequence can be obtained, and the accuracy of the convolutions of the convolutional neural network model after the training to the action type prediction results of each data subsequence can be improved.

In one embodiment, the action category identification method further comprises: and carrying out data smoothing processing on the action radar data sequence according to the arrangement relation of all radar data in the action radar data sequence to obtain a smooth radar data sequence.

Further, dividing the action radar data sequence into data subsequences with the same time span, including: the smooth radar data sequence is divided into data sub-sequences of the same time span.

The purpose of the data smoothing process is to remove noise data in the action radar data sequence, specifically, the data smoothing process may perform smoothing process on time domain data in the time domain, or may perform smoothing process on frequency domain data in the frequency domain. Specifically, the smoothing in the time domain may be implemented by gray-scale transformation or time domain filtering, and the smoothing in the frequency domain may be implemented by frequency domain filtering, such as one or more of high-pass filtering, low-pass filtering, band-stop filtering, and the like.

Specifically, the data smoothing process may be implemented by identifying data such as a repetition value, a missing value, an outlier, and the like, and performing data cleaning on the data such as the repetition value, the missing value, the outlier, and the like. In a specific implementation, the processor identifies repeated values in the action radar data sequence based on whether the numerical difference is zero by comparing the numerical differences of adjacent time points in the action radar data sequence, identifies abnormal values in the action radar data sequence based on the comparison of the numerical difference with a preset threshold, and can traverse the action radar data sequence to determine whether missing values exist in the action radar data sequence based on the time intervals of time data carried by each radar data in the action radar data sequence. After the processor identifies the repeated value, the missing value and the abnormal value, the action radar data sequence is optimized through data cleaning, so that the data subsequences with the same time span are divided based on the smooth radar data sequence in the subsequent processing process, and the data effectiveness of the data subsequences and the accuracy of the subsequence characteristics extracted based on the data subsequences are improved.

In one embodiment, dividing the action radar data sequence into data sub-sequences of the same time span comprises: determining a window size and a sliding step length of a sliding window matched with the dividing parameters based on the dividing parameters of the action radar data sequence; and sliding the sliding window in the action radar data sequence according to the sliding step length, and determining radar data contained in the sliding window after each sliding process as a data subsequence.

The dividing parameters of the action radar data sequence comprise the number of divided data subsequences and the number of radar data contained in each data subsequence. Specifically, when the action category is identified by training the completed convolutional neural network model, the dividing parameters of the action radar data sequence need to correspond to the input data of the convolutional neural network model. Thus, the partitioning parameters of the action radar data sequence may be determined based on model parameters of the convolutional recurrent neural network model.

The sliding window is a tool for extracting a data sub-sequence from an action radar data sequence, the window size of the sliding window corresponds to the number of radar data contained in the data sub-sequence, and the sliding step length of the sliding window is used for representing whether repeated radar data exist in two adjacent data sub-sequences and the number of the repeated radar data exist.

In one specific application, the individual radar data in the sequence of action radar data are distributed over equidistant time axes. The dividing parameter of the action radar data sequence comprises extracting 256 data subsequences from the action radar data sequence, wherein the time span of radar data contained in each data subsequence is 0.1 second, and the repeated radar data of 0.03 seconds exist in two adjacent data subsequences. Based on the dividing parameters and the time interval of adjacent radar data in the action radar data sequence being 0.01 seconds, the window size of the sliding window is determined to span 0.1 seconds on the time axis, the sliding step length of the sliding window on the time axis is 0.03 seconds, the sliding window slides on the time axis according to the sliding step length of 0.03 seconds, and the radar data contained in the sliding window after each sliding is determined to be a data subsequence. In this embodiment, the data sub-sequence is divided by the sliding window or other modes for the action radar data sequence, so that the acquisition speed of the division result of the data sub-sequence can be effectively improved, and the data processing efficiency is improved.

In one embodiment, determining the action category indicated by the target sub-tag satisfying the screening condition in the action category sub-tag as the action category identification result of the action radar data sequence includes: screening target sub-labels with accumulated results meeting screening conditions from all the action category sub-labels based on the accumulated results of all the action category sub-labels; and determining the action type indicated by the target sub-label as an action type identification result of the action radar data sequence.

Specifically, each data sub-sequence has an action category sub-tag, and the action category sub-tags of different data sub-sequences may represent the same action category or different action categories. The accumulated result of the action category sub-tags may specifically be the accumulated number of the action category sub-tags, or may be the accumulated result of the action category sub-tags according to weights.

In a specific application, the processor respectively performs sub-tag number statistics on the same action type sub-tags, determines the accumulated number of the action type sub-tags of each type, then screens out the action type sub-tags with the largest accumulated number from different action type sub-tags as target sub-tags, and then determines the action type indicated by the target sub-tags as an action type identification result of the action radar data sequence. Specifically, the target sub-label can be determined by directly passing through the accumulated data of the sub-label, the processing process is simple, and the data processing speed can be effectively improved.

In one embodiment, based on the accumulated result of each action category sub-label, the target sub-label with the accumulated result meeting the screening condition is screened from the action category sub-labels, which comprises the following steps: and accumulating the weight data of the sub-labels of the same action category to obtain the accumulated weight of each sub-label of the action category, and screening the target sub-label with the largest accumulated weight from the sub-labels of the action category.

The weight data of the action type sub-label is probability data given by the convolutional cyclic neural network model for classifying the action type sub-label of the data sub-sequence, the value range of the weight data of the single action type sub-label is [0,1], and the larger the probability value is, the greater the probability that the action represented by the data sub-sequence is identical to the action represented by the action type sub-label is. The processor can obtain the accumulated weight of each type of action category sub-label by accumulating the weight data of the same action category sub-label.

Specifically, the recognition processing of the action type sub-tag is performed on the data subsequence, and the obtained result is that 5 first-type action type sub-tags, 4 second-type action type sub-tags and 1 third-type action type sub-tag exist, wherein the weight data of each of the 5 first-type action type sub-tags is respectively 0.8, 0.7, 0.8 and 0.7,5, the weight data accumulated value of each of the 3.7,4 second-type action type sub-tags is respectively 0.9, 1.0 and 1.0,4, and the weight data accumulated value of each of the second-type action type sub-tags is 3.9, and although the number of the first-type action type sub-tags is 5 and 4, the weight data accumulated value of each of the second-type action type sub-tags is 3.9 and 3.7, and the weight data accumulated value of each of the second-type action type sub-tags is more than that of the first-type action type sub-tags, and the processor determines the second-type action type sub-tags as target sub-tags.

In this embodiment, the processor determines the action type recognition result of the action radar data sequence by using the action type sub-tag with the largest accumulated weight as the target sub-tag, so that the probability value of the action type sub-tag corresponding to each data sub-sequence can be fully considered, and the screening of the target sub-tag can be performed by combining the accumulated result, thereby effectively improving the accuracy of the target sub-tag.

In one embodiment, the sequence of action radar data comprises a sequence of gesture radar data; the action category identification method further comprises the following steps: responsive to the gesture detection event, sending a gesture detection signal to the radar sensor;

further, acquiring an action radar data sequence, comprising: and acquiring a gesture radar data sequence acquired after the radar sensor receives the gesture detection signal.

The gesture detection event refers to a triggered event indicating that a gesture radar data sequence of the hand needs to be acquired. The triggering of the gesture detection event can be automatically triggered based on a preset triggering condition, for example, the gesture detection event enters a designated data processing flow, for example, when entering a payment flow after entering commodity information to be purchased in an autonomous shopping scene, when the elevator closes a flow of entering floor selection in an elevator-riding scene, whether a subsequent processing flow is triggered is further determined by collecting a gesture radar data sequence of a user and identifying a gesture type, for example, whether payment confirmation is completed in the autonomous shopping scene, a floor indicating stop is confirmed in the elevator-riding scene, and the like.

Specifically, the processor responds to the gesture detection event and sends a gesture detection signal to the radar sensor, so that the radar sensor can respond to the gesture detection signal and upload a gesture radar data sequence, and further the processor can acquire the gesture radar data sequence after the radar sensor receives the gesture detection signal.

In one particular application, the radar sensor may continuously transmit radar waves, and upon receipt of the gesture detection signal, collect echo data of the radar waves after reaching the target object to reduce the amount of echo data received by the radar sensor. The radar sensor can also continuously send radar waves and continuously collect echo data of the radar waves after reaching a target object, and the echo data are uploaded to the processor when receiving gesture detection signals so as to reduce the quantity of the echo data received by the processor. The radar sensor can also transmit radar waves after receiving gesture detection signals and receive echo data of the radar waves after reaching a target object, so that the working time of the radar sensor is shortened, and the service time of the radar sensor is prolonged. In other application scenarios, the specific working mode of the radar sensor may be set according to the actual scenario requirements.

In one embodiment, the action category recognition result includes a gesture category recognition result; the action category identification method further comprises the following steps: responding to a data interaction triggering event aiming at a target user, determining a data interaction confirmation gesture matched with the data interaction triggering event, and triggering a gesture detection event aiming at the target user; displaying prompt information aiming at the data interaction confirmation gesture; and when the gesture type recognition result is the same as the data interaction confirmation gesture, executing a data interaction flow aiming at the target user.

The data interaction triggering event refers to a triggering event entering a data interaction flow, and the data interaction triggering event can be specifically triggered by a target user or by a processor under the condition that the current data meets the set triggering condition. For example, when the information input of the commodity to be purchased is judged to be completed in the autonomous shopping scene, the data interaction triggering event for confirming payment is triggered, and for example, when the elevator is closed in the elevator riding scene, the data interaction triggering event for confirming the stop floor is triggered. The data interaction confirming gesture refers to a designated gesture for triggering and executing a data interaction flow for a target user, and when the result of gesture type recognition on a gesture radar data sequence for the target user is the same as the data interaction confirming gesture, the data interaction flow for the target user is triggered and executed.

In this embodiment, by confirming the data interaction confirmation gesture, the prompt information for the data interaction confirmation gesture is displayed, so that the target user can make a corresponding gesture based on the prompt information to implement interaction, and the gesture detection event for the target user is triggered, so that the radar sensor uploads the collected gesture radar data sequence to the processor to perform gesture type recognition, when the result of gesture type recognition is the same as the data interaction confirmation gesture, the data interaction flow for the target user is triggered and executed, the data interaction flow with the target user can be triggered in a contactless manner, the interaction convenience is improved, and the non-contact interaction is performed in public places, so that unnecessary contact can be effectively avoided, and the sanitary safety degree is improved.

The application scene of gesture recognition is also provided, and the application scene applies the action category recognition method. Specifically, the application of the action category recognition method in the gesture recognition application scene is as follows:

gesture recognition is a solution in which a user transmits gesture signals to an algorithm to predict after making corresponding gestures through a millimeter wave sensor. Under the shopping settlement scene, the gesture type recognition is integrated with the shopping settlement equipment, and a more convenient payment mode is carried out after the corresponding gesture is recognized. The method for identifying the action category in the scheme aims at improving the gesture identification speed, is different from the conventional algorithm, introduces a convolutional neural network model in the scheme, improves the algorithm identification precision of the convolutional neural network model, is beneficial to improving the generalization capability of gesture identification, and realizes accurate identification of gesture categories.

In a specific application, the data processing flow of gesture type recognition is shown in fig. 8 and 9, and includes the following 5 processing procedures, respectively:

the first part is gesture definition and annotation.

In particular, the predefined target gestures may include a left swipe gesture and a back swipe gesture, wherein the left swipe gesture characterizes confirmation of payment and the right swipe gesture characterizes cancellation of payment. Different scenes can define different gestures and different numbers of target gesture categories, and the setting can be specifically performed according to actual scene requirements. For scenes with more flow categories needing triggering, more target gesture categories can be set.

The second part is data collection.

The data collection process mainly comprises the steps of acquiring gesture data from a device end, wherein a millimeter wave radar sensor can be specifically adopted for acquiring the gesture data, and the specific model of the gesture data can be TI AWR63X or an English flight Ling Chuangan device and the like. As shown in fig. 10, the millimeter wave radar sensor includes a radar wave transmitter and a radar echo receiver, and transmits the acquired analog signals to a processor for further data processing by the processor.

After determining the target gesture category, a certain amount of positive sample gesture data is collected for each target gesture category, and target gesture category labeling is performed. Based on the target gesture category, a negative sample gesture, such as a side-to-side shake, is determined, and the number of negative sample gestures may be the same as or less than the number of positive sample gesture data. The sample data is a radar data sequence, and in order to improve the accuracy of the sample data, the sample data is collected and subjected to data cleaning by searching for repeated values, searching for missing values, searching for abnormal values and the like, so that an optimized radar data sequence is obtained.

The third part is feature engineering.

The feature engineering mainly comprises configuration models, and meaningful features and attributes are acquired, processed and extracted from collected sample data by utilizing a signal data processing technology, so that effective training of the models is facilitated. Specifically, the feature engineering may include median filtering and cleaning of the gesture data, then partitioning the original sample data into a plurality of blocks with a certain time length (adjustable, 5 ms) by adopting a partitioning and segmenting mechanism, and labeling each block with a label corresponding to the sample data, wherein the partitioning mechanism is implemented by adopting model parameter training. And performing time domain feature extraction and frequency domain feature extraction processing on the blocks until all sample data are processed, introducing a voting mechanism, and determining a gesture label which is finally output.

And configuring a convolutional cyclic neural network model, wherein the convolutional cyclic neural network model comprises an input layer, a cyclic neural network, a convolutional neural network, a full-connection layer and an output layer which are sequentially connected. The input layer divides the action radar data sequence into a plurality of data subsequences based on the data subsequence dividing parameters, wherein the number of the data subsequences and the number of the circulating units of each layer in the circulating neural network are 256. Each layer in the cyclic neural network is a bidirectional cyclic neural network, the input data of the next layer of bidirectional cyclic neural network is the output of the previous layer of bidirectional cyclic neural network, and the output data of each cyclic unit in each layer of bidirectional cyclic neural network is the concatenation of forward characteristic data and reverse characteristic data. The output data of the two-way cyclic neural network of the last layer of the cyclic neural network is the output result of the last cyclic unit of the layer. The output result is the input data of the convolutional neural network. The convolutional neural network is a one-dimensional deep convolutional neural network, and the one-dimensional deep convolutional neural network comprises a first Conv1D (one-dimensional convolutional) layer, a first BatchNorm (batch normalization) layer, a first PRelu (excitation function) layer, a Dropout (random inactivation) layer, a second Conv1D layer, a second BatchNorm layer and a second PRelu layer which are sequentially connected. The first Conv1D layer includes 64 convolution kernels, each of which has a size of 3*1, a stride (convolution step) in the convolution parameter of 1, and a pad (feature map filling width) of 1. The second Conv1D layer comprises 32 convolution kernels, and the size and convolution parameters of the second Conv1D layer are the same as those of the first Conv1D layer. The ratio of Dropout layers may be set to 0.5. For example, the number of the first layer network neurons is 100, the output values of the activation functions are y1, y2, y3, … and y100, the dropout ratio is selected to be 0.5, and after the layer neurons are subjected to dropout processing, the values of 128 neurons in 256 neurons are set to be 0. Specifically, the input data of the first Conv1D layer is the output data of the last layer of the bidirectional cyclic neural network of the cyclic neural network, the output data of the upper layer of the convolutional neural network is the input data of the lower layer, and the output data of the second pralu layer is the input data of the full-connection layer. The fully-connected layer comprises a plurality of nodes, each node is connected with all nodes of the upper layer and used for integrating the features extracted by the front edge, and the output value of the fully-connected layer is transmitted to the softmax layer, and the softmax layer adopts softmax logistic regression (softmax regression) for classification.

The fourth part is model training.

Specifically, based on a sample radar data sequence carrying an action category tag, the sample radar data sequence includes a positive sample carrying a target action tag and a negative sample carrying a non-target action tag; dividing the sample radar data sequence into a plurality of sample data subsequences according to the same time span for each sample radar data sequence; marking the action category label of the sample radar data sequence as the action category label of the sample data subsequence; and carrying out parameter training on the convolutional neural network model based on the sample subsequence characteristics of each sample data subsequence and the action category labels carried by the sample subsequence characteristics until the training termination condition of the convolutional neural network model is met.

The fifth part is the model application.

The convolutional neural network model after training can be applied to different application scenes. An application scenario in which a game character is controlled by a gesture is shown in fig. 11, and an application scenario in which movement of an automobile model is controlled by a gesture is shown in fig. 12. For different application scenes, the control process corresponding to the gesture can be executed by defining the control signals expressed by the specific gesture, collecting the gesture of the user, and identifying and matching the gesture category.

It should be understood that, although the steps in the flowcharts related to the above embodiments are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides an action category recognition device for realizing the above related action category recognition method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of the one or more action category recognition devices provided below may be referred to the limitation of the action category recognition method hereinabove, and will not be repeated here.

In one embodiment, as shown in fig. 13, there is provided an action category recognition apparatus including: a radar data acquisition module 1302, a feature extraction module 1304, a sub-tag identification module 1306, and an action category determination module 1308, wherein:

a radar data acquisition module 1302, configured to acquire an action radar data sequence, where the action radar data sequence includes a plurality of radar data arranged according to an acquisition time sequence;

the feature extraction module 1304 is configured to divide the action radar data sequence into data subsequences with the same time span, and perform feature extraction on each data subsequence to obtain a subsequence feature of each data subsequence;

a sub-tag identification module 1306, configured to perform an action category identification process for each sub-sequence feature, and determine an action category sub-tag that each sub-sequence feature matches;

an action category determining module 1308 is configured to determine, as an action category identification result of the action radar data sequence, an action category indicated by a target sub-tag that satisfies a screening condition in the action category sub-tags.

In one embodiment, the feature extraction module is further configured to extract, for each of the data subsequences, a time domain feature of the data subsequence, and perform fourier transform processing on the data subsequence to obtain a frequency domain feature of the data subsequence; and carrying out feature fusion on the time domain features and the frequency domain features to obtain the respective corresponding subsequence features of each data subsequence.

In one embodiment, the recurrent neural network is composed of at least two bi-directional recurrent neural network layers, and the number of recurrent units in each bi-directional recurrent neural network layer is equal to the number of data subsequences.

In one embodiment, the action category recognition device further includes a model training module for acquiring a sample radar data sequence carrying an action category tag, the sample radar data sequence including a positive sample carrying a target action tag and a negative sample carrying a non-target action tag; dividing the sample radar data sequence into a plurality of sample data subsequences according to the same time span for each sample radar data sequence; marking the action category label of the sample radar data sequence as the action category label of the sample data subsequence; and carrying out parameter training on the convolutional neural network model based on the sample subsequence characteristics of each sample data subsequence and the action category labels carried by the sample subsequence characteristics until the training termination condition of the convolutional neural network model is met.

In one embodiment, the sequence of action radar data comprises a sequence of gesture radar data; the action category recognition device further comprises a gesture detection signal sending module, which is used for responding to the gesture detection event and sending a gesture detection signal to the radar sensor; the radar data acquisition module is also used for acquiring a gesture radar data sequence acquired after the radar sensor receives the gesture detection signal.

The respective modules in the above-described action category identification device may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 14. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is for storing radar data. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of action class identification.

In one embodiment, a computer device is provided, which may be a terminal, and an internal structure diagram thereof may be as shown in fig. 15. The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input means. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface, the display unit and the input device are connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a method of action class identification. The display unit of the computer equipment is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device, wherein the display screen can be a liquid crystal display screen or an electronic ink display screen, the input device of the computer equipment can be a touch layer covered on the display screen, can also be a key, a track ball or a touch pad arranged on a shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structures shown in fig. 14 and 15 are block diagrams of only some of the structures associated with the present application and are not intended to limit the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.

In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data are required to comply with the related laws and regulations and standards of the related countries and regions.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims

1. A method of action category identification, the method comprising:

2. The method according to claim 1, wherein the feature extraction is performed on each of the data subsequences to obtain a respective subsequence feature of each of the data subsequences, and the method comprises:

extracting time domain features of the data subsequences aiming at each data subsequence, and carrying out Fourier transform processing on the data subsequences to obtain frequency domain features of the data subsequences;

and carrying out feature fusion on the time domain features and the frequency domain features to obtain the respective corresponding subsequence features of each data subsequence.

3. The method according to claim 1, wherein the feature extraction is performed on each of the data subsequences to obtain a respective subsequence feature of each of the data subsequences, and the method comprises:

based on a convolutional cyclic neural network of a convolutional cyclic neural network model, respectively performing time sequence feature learning on each data subsequence to obtain respective time sequence features of each data subsequence;

And respectively performing deep feature learning on each time sequence feature based on a convolutional neural network of the convolutional cyclic neural network model to obtain each subsequence feature of each data subsequence.

4. A method according to claim 3, wherein the recurrent neural network is formed by at least two bi-directional recurrent neural network layers, the number of recurrent units in each of the bi-directional recurrent neural network layers being equal to the number of data subsequences.

5. A method according to claim 3, wherein the training process of the convolutional recurrent neural network model comprises:

acquiring a sample radar data sequence carrying an action category label, wherein the sample radar data sequence comprises a positive sample carrying a target action label and a negative sample carrying a non-target action label;

dividing the sample radar data sequence into a plurality of sample data subsequences according to the same time span for each sample radar data sequence;

marking the action category label of the sample radar data sequence as the action category label of the sample data subsequence;

and carrying out parameter training on the convolutional neural network model based on the sample subsequence characteristics of each sample data subsequence and the action category labels carried by the sample subsequence characteristics until the training termination condition of the convolutional neural network model is met.

6. The method according to claim 1, wherein the method further comprises:

performing data smoothing processing on the action radar data sequence according to the arrangement relation of radar data in the action radar data sequence to obtain a smooth radar data sequence;

the dividing the action radar data sequence into data subsequences with the same time span comprises the following steps:

dividing the smooth radar data sequence into data subsequences with the same time span.

7. The method of claim 1, wherein the dividing the action radar data sequence into data sub-sequences of the same time span comprises:

determining the window size and the sliding step length of a sliding window matched with the dividing parameters based on the dividing parameters of the action radar data sequence;

and sliding the sliding window in the action radar data sequence according to the sliding step length, and determining radar data contained in the sliding window after each sliding process as a data subsequence.

8. The method according to claim 1, wherein determining the action category indicated by the target sub-tag satisfying the screening condition in the action category sub-tag as the action category identification result of the action radar data sequence includes:

Screening target sub-labels of which the accumulated results meet screening conditions from the action category sub-labels based on the accumulated results of the action category sub-labels;

and determining the action category indicated by the target sub-tag as an action category identification result of the action radar data sequence.

9. The method of claim 8, wherein the selecting a target sub-label from the action category sub-labels that satisfies a selection condition based on the accumulated result of each of the action category sub-labels comprises:

accumulating the weight data of the same action category sub-label to obtain the accumulated weight of each action category sub-label;

and screening the target sub-label with the largest accumulated weight from the action category sub-labels.

10. The method of any one of claims 1 to 9, wherein the sequence of action radar data comprises a sequence of gesture radar data;

the method further comprises the steps of:

responsive to the gesture detection event, sending a gesture detection signal to the radar sensor;

the acquiring the action radar data sequence comprises the following steps:

and acquiring a gesture radar data sequence acquired after the radar sensor receives the gesture detection signal.

11. The method of claim 10, wherein the action category recognition result comprises a gesture category recognition result; the method further comprises the steps of:

responding to a data interaction triggering event aiming at a target user, determining a data interaction confirmation gesture matched with the data interaction triggering event, and triggering a gesture detection event aiming at the target user;

displaying prompt information aiming at the data interaction confirmation gesture;

and when the gesture type recognition result is the same as the data interaction confirmation gesture, executing a data interaction flow aiming at the target user.

12. An action category recognition device, the device comprising:

13. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 11 when the computer program is executed.

14. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 11.

15. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any one of claims 1 to 11.