CN117762250A

CN117762250A - Virtual reality action recognition method and system based on interaction equipment

Info

Publication number: CN117762250A
Application number: CN202311647127.8A
Authority: CN
Inventors: 王新国
Original assignee: 4u Beijing Technology Co ltd
Current assignee: 4u Beijing Technology Co ltd
Priority date: 2023-12-04
Filing date: 2023-12-04
Publication date: 2024-03-26
Anticipated expiration: 2043-12-04
Also published as: CN117762250B

Abstract

The invention discloses a virtual reality action recognition method and a system based on interaction equipment, comprising the following steps: when a user enters an action acquisition area designated by the interactive equipment, the system can recognize the trigger action through a pre-trained trigger action recognition model. If the identification result shows that the user has a trigger action, the system acquires action data corresponding to the user action and determines an action execution result. Finally, the action execution result is displayed in a virtual reality interaction area appointed by the interaction equipment. Through the design, various actions of the user in the virtual reality environment can be accurately identified and analyzed through the pre-trained trigger action identification model, so that interaction is smoother and more natural; and because the system can determine the action execution result according to the action data of the user and display the result in the virtual reality interaction area in real time, more visual and more real virtual reality experience can be provided for the user.

Description

Virtual reality action recognition method and system based on interaction equipment

Technical Field

The invention relates to the technical field of virtual reality interaction, in particular to a virtual reality action recognition method and system based on interaction equipment.

Background

With the continuous development of virtual reality technology, how to accurately identify and parse the actions of a user to achieve a high-quality interactive experience becomes an important problem. Conventional motion recognition methods rely primarily on physical devices, such as handles, trackers, etc., to capture the motion of the user. However, these methods often require the user to wear special equipment and can be cumbersome to operate. In addition, these methods may not accurately capture all actions of the user, thereby affecting the interactive effect of the virtual reality. Therefore, there is a need for a more efficient and accurate virtual reality motion recognition method.

Disclosure of Invention

The invention aims to provide a virtual reality action recognition method and system based on interaction equipment.

In a first aspect, an embodiment of the present invention provides a virtual reality action recognition method based on an interaction device, including:

responding to the fact that the current user enters an action acquisition area appointed by the interactive equipment, and identifying the triggering action of the current user based on a pre-trained triggering action identification model;

acquiring action data corresponding to the current user action on the basis that the target identification result aiming at the current user is characterized as the existence of a trigger action, and determining an action execution result corresponding to the current user action;

And displaying the action execution result to a virtual reality interaction area appointed by the interaction equipment.

In a second aspect, an embodiment of the present invention provides a server system, including a server, configured to perform the method of the first aspect.

Compared with the prior art, the invention has the beneficial effects that: the invention discloses a virtual reality action recognition method and a system based on interaction equipment. If the identification result shows that the user has a trigger action, the system acquires action data corresponding to the user action and determines an action execution result. Finally, the action execution result is displayed in a virtual reality interaction area appointed by the interaction equipment. Through the design, various actions of the user in the virtual reality environment can be accurately identified and analyzed through the pre-trained trigger action identification model, so that interaction is smoother and more natural; and because the system can determine the action execution result according to the action data of the user and display the result in the virtual reality interaction area in real time, more visual and more real virtual reality experience can be provided for the user.

Drawings

In order to more clearly illustrate the technical solution of the embodiments of the present invention, the drawings that are required to be used in the embodiments will be briefly described. It is appreciated that the following drawings depict only certain embodiments of the invention and are therefore not to be considered limiting of its scope. Other relevant drawings may be made by those of ordinary skill in the art without undue burden from these drawings.

Fig. 1 is a schematic step flow diagram of a virtual reality action recognition method based on an interactive device according to an embodiment of the present invention;

fig. 2 is a schematic block diagram of a computer device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It will be apparent that the described embodiments are some, but not all, embodiments of the invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

The following describes specific embodiments of the present invention in detail with reference to the drawings.

In order to solve the foregoing technical problems in the background art, fig. 1 is a schematic step flow diagram of a virtual reality motion recognition method based on an interactive device according to an embodiment of the disclosure, and the virtual reality motion recognition method based on the interactive device is described in detail below.

Step S201, responding to the current user entering an action acquisition area designated by the interactive equipment, and identifying the triggering action of the current user based on a pre-trained triggering action identification model;

step S202, on the basis that the target recognition result aiming at the current user is characterized as the existence of a trigger action, action data corresponding to the current user action is obtained, and an action execution result corresponding to the current user action is determined;

step S203, the action execution result is displayed to the virtual reality interaction area appointed by the interaction equipment.

In an exemplary embodiment of the present invention, in a virtual reality fitness application, a user enters a designated motion capture area in preparation for a series of gymnastics exercises. The user wears a virtual reality helmet and glove, and stands in an area where a plurality of infrared sensors are provided. Through the pre-trained trigger action recognition model, the system can recognize the current trigger action of the user for subsequent data processing and analysis. When the user lifts his arm and pushes it forward, the system captures this action through the sensor and uses the trigger action recognition model to determine if the action is a push action. Based on the target recognition result characterization, the system may determine whether the user's actions meet the expected training requirements. If the system determines that the user's action is a valid push action, it will consider the target recognition result to be characterized as having a trigger action. The system acquires action data corresponding to the current action of the user so as to perform further data analysis and feedback. When the system confirms that the user performs effective pushing action, the system can record key data such as the angle, the speed, the gesture and the like of the action. And displaying the action execution result to the user in a virtual reality mode, so that the user can intuitively observe the action effect of the user. In the virtual reality fitness application, when a user successfully completes one-time pushing action, the system can display a fluorescent ball to move upwards from the bottom of the arm of the user in the virtual environment, so that the action execution result of the user is good. At the same time, the user's score and training progress are displayed on the screen to encourage the user to continue exercising.

In another implementation of the embodiment of the present invention, it is assumed that in a virtual reality game, a player needs to enter a designated area while performing a certain task. This area may be a specific room or a specific location in the game. After entering the area, the system will begin to identify the user's actions. If an action in this virtual reality game requires a player to trigger, such as using a gesture to release a magic attack. The system will recognize the user's trigger actions through a pre-trained model to determine if the user performed the correct gesture. In a virtual reality game, a player may need to complete a series of tasks. If, after recognizing the trigger action of the user, the system determines that the user has successfully completed a task, the system characterizes the target recognition as having the trigger action. In a virtual reality game, the actions of the user may include walking, jumping, attacking, etc. When the system recognizes that a user has triggered a particular action, specific data of the user for executing the action, such as start time, end time, speed, etc., are recorded. In the virtual reality game, after a user completes a specified action of a certain task, the system determines whether the user has completed execution of the action according to action data of the user. If the user's action data meets the preset criteria, the system determines that the user has successfully performed the action. In the virtual reality game, after completing a certain task, the system displays the action execution result of the user. This presentation may be a text prompt, an animation effect, or other form of feedback in the game scene that presents the user with the results of their performance of the action. By the design, the recognition and feedback of specific actions in the virtual reality environment can be facilitated for the user, and interaction experience is improved

In the embodiment of the present invention, the aforementioned step S201 may be implemented by performing the following manner.

(1) Based on a pre-trained trigger action recognition model, acquiring a frequency spectrum feature vector containing an interference action video as a first action recognition feature;

(2) Performing a cyclic update operation for the first action recognition feature, wherein the cyclic update operation is: determining an interference motion characteristic vector according to the first motion recognition characteristic, obtaining interference elimination task details corresponding to the current circulation time according to the past interference motion characteristic vector determined for the past interference-containing motion video, and performing interference elimination operation on the interference motion characteristic vector according to the interference elimination task details to obtain a undetermined characteristic vector corresponding to the current circulation time, wherein the interference motion characteristic vector is the first motion recognition characteristic in the first-round circulation updating operation, and is obtained by combining the first motion recognition characteristic and the determined undetermined characteristic vector in the circulation updating operation except the first-round circulation;

(3) Taking the undetermined feature vector determined by the last round of cyclic updating as a demand feature vector, extracting and determining interference elimination parameters according to structural information aiming at the demand feature vector, and performing interference elimination operation on the first action recognition feature to obtain a second action recognition feature;

(4) And according to the second action recognition characteristic, performing recognition of the trigger action to obtain a target recognition result of the video containing the interference action.

In an embodiment of the invention, a user performs exercise training, such as rope skipping, using an interactive device, illustratively in a virtual reality fitness application. The system may acquire video data containing interfering actions (e.g., actions that are not relevant to others around) through a pre-trained trigger action recognition model and convert it into spectral feature vectors as the first action recognition feature. In a virtual reality game, a player needs to perform a specific boxing action. And determining possible interference action characteristic vectors according to the first action recognition characteristics in each circulation by the system through circulation updating operation, and combining the past interference action characteristic vectors to obtain interference elimination task details corresponding to the current circulation. And then, performing interference elimination operation on the interference action feature vector according to the details of the interference elimination task to obtain a pending feature vector. The cycle thus proceeds until the end of the last cycle. And in the virtual reality game, after the last round of circulation is finished, the system takes the undetermined feature vector determined by the last round of circulation update as a demand feature vector. And the system also extracts structural information according to the demand feature vector to obtain interference elimination parameters. Then, the system performs an interference cancellation operation on the first motion recognition feature to obtain a second motion recognition feature. In a virtual reality fitness application, recognition of the trigger action is performed according to the second action recognition feature. For example, after a particular set of boxing actions is completed, the system may determine whether the user has completed the correct boxing action based on the second action recognition feature. In addition, the system can also ensure that the user is not interfered by other irrelevant interference actions when executing the actions by analyzing the target identification result of the video containing the interference actions. By the design, through cyclic updating and interference elimination operation, accuracy of motion recognition is improved, and target motion performed by a user is recognized in a video containing interference motion. This may provide more accurate user feedback and guidance for fields such as virtual reality fitness applications.

In the embodiment of the present invention, the step of obtaining the spectral feature vector of the video including the interference action may be performed in the following manner.

(1) Performing wavelet transformation on the video containing the interference action to obtain frequency distribution of the video containing the interference action;

(2) And filtering the frequency distribution of the video containing the interference action to obtain a corresponding frequency spectrum feature vector.

In an embodiment of the present invention, a user performs hand motion recognition using an interactive device, for example, in a virtual reality interactive application. The system can acquire a hand motion video of a user through a camera, and then wavelet transform the video. The wavelet transform may transform the signal from the time domain to the frequency domain, which may extract frequency-dependent features. In the virtual reality interaction application, after the wavelet transformation obtains the frequency distribution of the video containing the interference action, the system can filter the frequency distribution. The filtering operation may remove the interfering portion of the spectrum according to a predefined threshold or other rule, retaining only features related to the target action. This results in a corresponding spectral feature vector that reflects the frequency characteristics of the target motion. By the design, the frequency spectrum characteristics related to the target action are extracted from the video containing the interference action through wavelet transformation and frequency distribution filtering operation. These features can be used for subsequent processes such as motion recognition and interference cancellation, thereby improving the accuracy of motion recognition and reducing the impact of interference on the user experience.

In an embodiment of the present invention, the foregoing cyclic update operation may include the following manner.

(1) If the first motion recognition feature is the first round of cyclic updating operation, taking the first motion recognition feature as an interference motion feature vector, obtaining corresponding interference elimination task details according to corresponding past interference motion feature vectors determined for past interference-containing motion video, and performing interference elimination operation on the interference motion feature vector according to the interference elimination task details to obtain a pending feature vector corresponding to the first round of cyclic updating operation;

(2) If the operation is the second round of cyclic updating operation, merging the first action recognition feature and the first round of cyclic updating operation of the current round of cyclic updating operation to obtain a pending feature vector which is used as an interference action feature vector, obtaining corresponding interference elimination task details according to the corresponding past interference action feature vector determined for the past interference-containing action video, and performing interference elimination operation on the interference action feature vector according to the interference elimination task details to obtain the pending feature vector corresponding to the second round of cyclic updating operation;

(3) If the operation is the third round of cyclic updating operation, the first action recognition feature and the two undetermined feature vectors obtained by the previous two rounds of cyclic updating operation of the current round of cyclic updating operation are combined to form interference action feature vectors, corresponding interference elimination task details are obtained according to the corresponding past interference action feature vectors determined for the past interference-containing action video, interference elimination operation is carried out on the interference action feature vectors according to the interference elimination task details, and undetermined feature vectors corresponding to the third round of cyclic updating operation are obtained.

In an embodiment of the present invention, a user may need to perform a series of yoga actions, for example, in a gesture recognition application. The system acquires the first action recognition feature through a pre-trained trigger action recognition model and takes the first action recognition feature as an interference action feature vector. And then, according to the corresponding past interference motion feature vector determined by the past interference motion video, obtaining details of the interference removal task, and performing interference elimination operation on the interference motion feature vector to obtain a undetermined feature vector corresponding to the first round of cyclic updating operation. Continuing with the gesture recognition application described above, in the second round of cyclic update operations, the system merges the first motion recognition feature and one pending feature vector obtained in the previous round of cyclic update operations of the current round of cyclic passes into an interfering motion feature vector. And then, according to the corresponding past interference motion feature vector determined by the past interference motion video, obtaining details of the interference removal task, and performing interference elimination operation on the interference motion feature vector to obtain a undetermined feature vector corresponding to the second round of cyclic updating operation. Continuing with the gesture recognition application described above, in the third round of cyclic update operation, the system merges the first motion recognition feature and the two pending feature vectors obtained from the previous two rounds of cyclic update operation of the current round of cyclic pass into an interfering motion feature vector. And then, according to the corresponding past interference motion feature vector determined by the past interference motion video, obtaining details of the interference removal task, and performing interference elimination operation on the interference motion feature vector to obtain a undetermined feature vector corresponding to the third round of cyclic updating operation. By the design, the undetermined feature vector can be continuously accumulated and updated in each round of cyclic updating operation, so that the influence of interference actions is reduced to the greatest extent. This may improve accuracy of recognition and improve user experience for fields such as gesture recognition.

In a more detailed embodiment, the user is performing a training action of the side panel support, illustratively in a virtual reality fitness application. The system acquires the action video of the user through the camera and extracts the interference action feature vector by using the first action recognition feature. And then, according to the corresponding past interference motion characteristic vector determined by the past interference motion video, the system obtains details of the interference elimination task. Then, the system obtains the undetermined feature vector corresponding to the first round of cyclic updating operation by performing interference elimination operation on the interference action feature vector. After the first round of cyclic update operation is finished, a second round of cyclic update operation is entered. The system combines the first motion recognition feature and a pending feature vector obtained from a previous cycle update operation into an interfering motion feature vector. The corresponding past disturbance motion feature vectors determined by the past disturbance-containing motion video are then used to obtain the disturbance-free task details. Then, by entering a third round of the loop update operation after the second round of the loop update operation ends. The system combines the first motion recognition feature and the two undetermined feature vectors obtained in the previous two-round cyclic updating operation into an interference motion feature vector. The corresponding past disturbance motion feature vectors determined by the past disturbance-containing motion video are then used to obtain the disturbance-free task details. Then, the system obtains the undetermined feature vector corresponding to the third round of cyclic updating operation by performing interference elimination operation on the interference action feature vector. And performing interference elimination operation on the interference action feature vector, and obtaining the undetermined feature vector corresponding to the second round of cyclic updating operation by the system. Through the cyclic updating operation, each round can adjust the interference action characteristic vector according to the result of the previous round, and more accurate undetermined characteristic vector is extracted through interference task detail removal and interference elimination operation. The method can gradually reduce the influence of the interference action and improve the accuracy of action recognition. In the fields of virtual reality fitness and the like, the method can provide more accurate action guidance and feedback for users.

In the embodiment of the present invention, the above-mentioned combining manner of the interference motion feature vectors is at least implemented by one of the following manners.

(1) According to a preset arrangement sequence, performing an integration operation on each determined undetermined feature vector and the first action recognition feature to obtain an interference action feature vector;

(2) According to the determined merging adjustment factors respectively associated with the undetermined feature vectors and the merging adjustment factors associated with the first action recognition features, weighting and merging the undetermined feature vectors and the first action recognition features to obtain interference action feature vectors; each merging adjustment factor characterizes the contribution degree of the corresponding undetermined feature vector or the first motion recognition feature to the interference motion feature vector.

In a more detailed embodiment, the user performs training of squat movements via a camera, for example, in a athletic monitoring application. And extracting an interference motion characteristic vector by the system according to the user motion video acquired by the camera by using the first motion recognition characteristic. And according to the preset arrangement sequence, the system executes an integration operation, and the first action recognition feature and the determined undetermined feature vector are combined in sequence. For example, the first motion recognition feature is combined with the pending feature vector corresponding to the first round of cyclic update operation in a sequential order. In this way, the system obtains an interference motion feature vector, which includes information of the first motion identification feature and the undetermined feature corresponding to the first round of the cyclic update operation. In addition, the system can also carry out weighted combination processing on each undetermined feature vector and the first action recognition feature through the associated combination adjustment factors to obtain another interference action feature vector. Each merge adjustment factor characterizes a degree of contribution of a corresponding pending feature vector or first motion recognition feature to the interfering motion feature vector. For example, in the second round of cyclic update operation, the system performs weighted combination processing on the first action recognition feature and the undetermined feature vector corresponding to the previous round of cyclic update operation according to the associated combination adjustment factor. The interference motion feature vector thus obtained reflects the degree of contribution of each feature vector in interference cancellation, and integrates the information of a plurality of feature vectors. By the method, the system can combine the determined undetermined feature vectors and the first action recognition features according to the preset arrangement sequence or by using the associated combination adjustment factors, so that the interference action feature vectors are obtained. The method can consider the importance and contribution degree of different eigenvectors in the interference elimination process, and improves the accuracy and effect of interference elimination. In athletic monitoring applications, this approach may provide more accurate and reliable motion recognition results, helping users to perform proper gesture training and optimizing motor skills.

In a more detailed embodiment, the user is training to squat in a posture assessment application, for example. The system acquires the action video of the user through the camera and extracts the interference action feature vector by using the first action recognition feature. Then, according to the preset arrangement sequence, the system combines the first action recognition feature and the determined undetermined feature vector according to a specific sequence.

For example, assume that the arrangement order is to first merge the first action recognition feature and then merge the pending feature vectors resulting from the first round of loop update operation. The system sequentially combines the features to form an interference motion feature vector.

In exercise assisting applications, the user is performing training of squat actions. The system acquires the motion video of the user through the camera and extracts the first motion recognition feature as part of the interference motion feature vector. Then, the system performs weighted combination processing on the feature vectors according to the combination adjustment factors associated with each undetermined feature vector and the combination adjustment factors associated with the first action recognition feature.

For example, assume that the undetermined feature vector that the system has determined and its corresponding merge adjustment factor are feature a and weight a, feature B, and weight B, respectively. In addition, the combined adjustment factor of the first action recognition feature is the weight c. The system performs weighted combination on the feature A, the feature B and the first motion recognition feature according to the respective weights to obtain an interference motion feature vector.

By the implementation of the two modes, the system can combine the undetermined feature vector and the first action recognition feature according to a preset arrangement sequence or by using an associated combination adjustment factor, so as to obtain the interference action feature vector. The merging operation can consider the importance and contribution degree of different feature vectors, so that the interference removing accuracy and effect are improved. In applications such as posture assessment and exercise assistance, the method can provide more accurate and reliable feature vector combination results, so that more accurate action recognition and posture assessment are realized.

In the embodiment of the present invention, the step of performing the interference cancellation operation on the first action recognition feature to obtain the second action recognition feature by performing the interference cancellation operation on the interference cancellation parameter determined by performing the structured information extraction on the demand feature vector may be implemented in the following manner.

(1) Performing feature space mapping on the demand feature vector to obtain a process feature vector with feature space and first action recognition feature adaptation;

(2) Performing standardization operation on the process feature vector, and taking the output of the standardization operation as an interference elimination parameter;

(3) And according to the interference elimination parameters, scalar multiplication operation is carried out on the first action recognition feature, and a second action recognition feature is obtained.

In an exemplary embodiment of the present invention, a user uses a camera to perform a push-up motion training in a motion analysis application. And the system performs interference elimination operation by using the first action recognition feature according to the user action video acquired by the camera so as to acquire a second action recognition feature. The system needs to extract structured information related to interference from the push-up action video of the user. For example, a curve metric value of the back of the user, an angle of the arm, etc. are detected. This structured information can be regarded as a demand feature vector for extracting parameters needed for interference removal. The system converts the demand feature vector into a feature space by a specific mapping algorithm. For example, the detected dorsum curve metric values and arm angles are mapped to a unified feature space. In this way, the system obtains a process feature vector that contains information that is adapted to the first motion recognition feature. The system performs a normalization operation on the process feature vector, converting it into a normalized form with a mean value of 0 and a variance of 1. In this way, the system obtains a parameter for removing the interference, i.e. a normalized process feature vector. The system performs a scalar multiplication operation on the first action recognition feature using the interference-free parameter. For example, the first motion recognition feature is dot product-computed with a normalized process feature vector. In this way, the system obtains a second motion recognition feature that is subject to interference cancellation processing. Through the steps, the system can extract structural information according to the required feature vector, and then obtains a disturbance-free parameter through feature space mapping and standardization operation. Finally, the system uses the parameter to perform scalar multiplication operation on the first action recognition feature to obtain a second action recognition feature subjected to interference elimination processing. This way, information related to the disturbance can be extracted and utilized to more accurately identify the user's motion. In push-up training applications, the method can help the system to more accurately judge the gesture and quality of the actions completed by the user, and provide more accurate feedback and advice.

In the embodiment of the present invention, the step of performing the recognition of the trigger motion according to the second motion recognition feature to obtain the target recognition result of the video including the interference motion may be performed in the following manner.

(1) Performing action recognition feature analysis on the second action recognition feature to obtain recognition confidence that the triggering action exists in the second action recognition feature;

(2) And when the identification confidence coefficient reaches a preset trigger threshold, determining that a trigger action exists in the video containing the interference action.

In an exemplary embodiment of the present invention, in a motion monitoring application, the system performs trigger motion recognition according to the second motion recognition feature subjected to the interference cancellation process, so as to obtain a target recognition result including an interference motion video. This process includes the steps of: assume that the user is performing a training of squat actions. The system uses the interference-canceled second motion recognition feature to perform motion recognition feature analysis. The system will analyze the feature for patterns, gestures, or other features associated with the triggering action and calculate an identification confidence. The system presets an identification threshold for the trigger action. When the recognition confidence obtained by the action recognition feature analysis is higher than the threshold value, the system judges that the trigger action exists in the video containing the interference action. For example, when it is analyzed that the user's squat gesture reaches a certain criteria or meets a particular pattern, the system may determine that a trigger action is present in the video. Through the above steps, the system performs recognition of the trigger action using the second action recognition feature subjected to the interference cancellation process. The system analyzes the information of the action mode, the gesture and the like in the characteristics, and judges whether triggering action exists or not according to a preset triggering threshold value. The method can accurately identify the actions according to the characteristics after interference elimination processing, and judge whether the actions which the user wants to trigger exist in the video. In the application of motion monitoring, the method can help the system to timely identify the target action completed by the user and provide accurate feedback and result analysis.

In the embodiment of the present invention, the trigger recognition model is obtained in the following manner.

(1) Acquiring a training example array, wherein each training example in the training example array comprises a first sample action recognition feature extracted for one sample interference-containing action video and a corresponding target value, the target value at least comprises a sample trigger action mark, and the sample trigger action mark represents whether a trigger action really exists in the corresponding sample interference-containing action video;

(2) Selecting a training example from the training example array, and inputting corresponding first sample action recognition features into a trigger action recognition model to obtain an observation trigger action mark aiming at trigger action recognition;

(3) And optimizing model coefficients in the trigger action recognition model at least according to errors between the observed trigger action markers and the corresponding actual trigger action markers.

In the embodiment of the invention, the system is assumed to train a trigger action recognition model, for example, to recognize squat actions when a user performs squat actions. The system will collect a series of samples of video containing interfering actions and extract therefrom the first sample action recognition features, such as gestures, motion trajectories, etc. For each training instance, the system will also annotate a target value indicating whether the sample contains trigger action markers. The system selects one training example from the training example array, and inputs the first sample action recognition characteristic into the trigger action recognition model to predict. For example, the extracted squat motion gesture feature is input into the model to determine whether the sample contains squat motion. The system compares the error between the observed trigger action markers and the actual trigger action markers. If the predicted result does not accord with the actual result, the system adjusts and optimizes the model coefficient in the trigger action recognition model. By constantly iterating and updating the model coefficients, the system can gradually increase the accuracy and performance of the trigger action recognition model. Through the steps, the system can optimize the trigger action recognition model by acquiring the training example array and using the error between the observed trigger action mark and the actual trigger action mark. This approach can improve the accuracy and robustness of trigger action recognition by training sample data and model optimization. In an action recognition system, this approach can help the system better recognize the user's specific trigger actions and provide accurate feedback and result analysis.

In the embodiment of the invention, the model coefficients comprise a cancellation factor for interference cancellation operation and a detection parameter for expected probability detection; the aforementioned observation trigger action flag for trigger action recognition is acquired in the following manner.

(1) Performing interference elimination operation on the first sample action recognition feature according to the elimination factor to obtain a corresponding second sample action recognition feature;

(2) And according to the detection parameters, performing action recognition feature analysis on the second sample action recognition feature to obtain an expected probability evaluation value of the trigger action, and according to check feedback between the expected probability evaluation value and a preset trigger action threshold, obtaining an observation trigger action mark.

In an embodiment of the present invention, it is exemplary that the system needs to recognize squat actions when a user performs a squat action and eliminate interference from other actions. The system will use model coefficients including cancellation factors for interference cancellation and detection parameters for expected probability detection. These coefficients can help the system eliminate interference for a particular action and evaluate the expected probability of triggering an action. The system identifies features from previously extracted first sample actions and uses cancellation factors to cancel interference therein. For example, the system may generate a second sample motion recognition feature by eliminating feature signals from other non-squat motions, retaining only feature information related to the squat motion. The system uses the detection parameters to analyze the second sample motion recognition characteristics after interference elimination processing. By analyzing information such as patterns, gestures, etc. in the features, the system can evaluate an expected probability assessment of the existence of a trigger action. The system then compares the expected probability assessment value to a preset trigger action threshold. If the expected probability is above the threshold, the system will give a corresponding observation trigger action flag. Through the steps, the system performs interference elimination operation on the first sample action recognition feature by using the elimination factors in the model coefficients, and analyzes and predicts the probability evaluation on the second sample action recognition feature after interference elimination processing by using the detection parameters. By checking the feedback and trigger action threshold, the system can determine whether an observed trigger action flag is present. The method realizes the elimination of interference and the expected probability detection of triggering actions, thereby improving the accuracy and the robustness of the action recognition system. In the system, this method can be used to identify specific trigger actions and provide accurate feedback and result analysis.

In a more detailed embodiment, it is assumed that there is a camera-based motion recognition system that is intended to recognize the number of push-ups performed by the user. The system performs motion recognition by: the model coefficients include the cancellation factor and the detection parameter: elimination factor: for interference cancellation operations to attenuate effects from other actions or noise. Detection parameters: for expected probability detection to evaluate the probability of occurrence of a trigger action. The observation trigger action mark acquisition process for trigger action recognition is as follows: interference cancellation operation: the system performs an interference cancellation operation on the first sample motion recognition feature using the cancellation factor to remove feature signals that are not associated with push-ups. For example, the system may filter out background noise, motion trajectories of other actions, etc., and only retain characteristic information related to the push-up action. The second sample motion recognition feature processed in this way can more highlight the motion feature of the user for push-up. Action recognition feature analysis: based on the second sample motion recognition feature after interference elimination, the system uses the detection parameter to perform motion recognition feature analysis. By analyzing the information of the movement pattern, the body posture, etc. in the features, the system evaluates the expected probability evaluation value that there is a trigger action (i.e., push-up). For example, the system may examine the characteristics of the arm bending and unbending, the chest approaching the ground, etc., to determine if a complete push-up motion has occurred. Expected probability assessment and threshold verification: by comparing the expected probability assessment value with a preset trigger action threshold, the system can determine the presence or absence of an observed trigger action marker. If the expected probability assessment value is higher than the set threshold value, the system gives a corresponding observation trigger action mark, which indicates that the user has completed a push-up action. In conclusion, the technical scheme utilizes the cancellation factor to perform interference cancellation operation and uses the detection parameter to perform expected probability detection, so that the accuracy and the robustness of the action recognition system in a complex environment can be improved. In a scene, by applying the method, the system can more reliably identify and count the times of push-up actions of a user and provide accurate feedback and analysis results.

In the embodiment of the invention, the target value further comprises a standard motion recognition feature extracted from the video containing the interference motion for the corresponding sample, wherein the standard motion recognition feature is a spectrum feature vector without interference, and the model coefficient comprises a cancellation factor for interference cancellation operation and a detection parameter for expected probability detection; the step of optimizing the model coefficients in the trigger action recognition model based at least on the error between the observed trigger action markers and the corresponding actual trigger action markers described above may be implemented as follows.

(1) Optimizing the cancellation factor according to a first error between the second sample motion recognition feature and the corresponding standard motion recognition feature;

(2) Obtaining a second error between the observed trigger action mark and the corresponding actual trigger action mark, and obtaining a correlation error according to a correlation coefficient between the first error and the second error;

(3) And respectively optimizing the elimination factor and the detection parameter according to the association error.

In an embodiment of the present invention, exemplary, standard action recognition features: the non-interfering spectral feature vectors used to train the model, i.e. the action features that do not contain any interference. Model coefficients: including cancellation factors for interference cancellation operations and detection parameters for expected probability detection. Optimizing the cancellation factor: the cancellation factor is optimized based on a first error between the second sample motion recognition feature and the corresponding standard motion recognition feature. This means that the system will analyze the differences between the second sample motion recognition features and the standard motion recognition features to improve the effect of the cancellation factor. Calculating an association error: the system calculates a second error between the observed trigger action mark and the corresponding actual trigger action mark, and obtains a correlation error according to a correlation coefficient between the first error and the second error. The correlation coefficient reflects the correlation between the first error and the second error and is used to trade off the contribution of both to model coefficient optimization. Optimizing the elimination factor and the detection parameter: based on the correlation error, the cancellation factor and the detection parameter are optimized, respectively. The system adjusts the values of these model coefficients so that the error between the observed trigger motion markers and the actual trigger motion markers is minimized. Through the steps, the system utilizes standard action recognition characteristics, elimination factors and detection parameters to perform action recognition, and improves accuracy by optimizing model coefficients. In a scenario, the method may be used to identify a push-up motion and reduce identification errors due to interference or other factors. By continuously optimizing model coefficients, the system can continuously improve the accuracy and performance of motion recognition.

In the following, a more detailed implementation is provided assuming a Virtual Reality (VR) game scenario, a player may need to play using VR devices such as a head mounted display and a handheld controller. The action of "fencing" will be described as a trigger action. When a player enters a device-designated action collection area, such as standing at a particular location in a VR game, the system will begin identifying whether the player has performed a fencing action via a pre-trained trigger action identification model. If the player waves a controller in the hand to simulate a sword strike, the system can detect this action through the model. Upon confirming that the player has made a fencing action, the system will obtain relevant data for the player's action and determine the outcome of the action based on such data. For example, the system may determine the effect of the sword stroke, such as the strength of the attack, hit the target, etc., based on the speed, strength, and direction of the player's gesture. After determining the action execution result, the system may display the result in a virtual reality interaction area specified by the VR device. For example, if the player successfully hits a target, the VR device may display corresponding visual and audible effects, such as the target being hit and sounding. The pre-trained model first needs to recognize the user's actions. This involves obtaining a spectral feature vector, which is obtained by subjecting the user's motion data to fourier transform or wavelet transform. These transformations may extract time and frequency information in the motion data in order to better analyze and identify the user's motion.

Then, in order to ensure the accuracy of the result, an interference cancellation operation is required. In a virtual reality environment, various forms of interference may exist, such as ambient noise, device errors, and the like. Signal processing techniques (e.g., filters) or machine learning algorithms (e.g., neural networks) can be employed to remove these disturbances, improving the accuracy of motion recognition.

The model then performs motion recognition based on the extracted features. For example, a fencing action may have certain cadence and speed characteristics from which the model determines whether the user has made the fencing action. In addition, pattern matching can be performed by deep learning and other methods, so that the model can accurately identify and interpret complex action sequences.

After the action is identified and the execution result is determined, the system can display the action effect in the virtual reality interaction area. For example, in a fencing game, if a user's action is identified as a valid fencing attack, the virtual character will perform a corresponding attack animation and inflict harm to the enemy based on the attack effect.

In general, by the method, the actions of the user can be accurately identified and fed back in the virtual reality environment, so that the naturalness and immersion of virtual reality interaction are improved.

In order to accurately identify the motion of 'swinging sword', a pre-trained trigger motion identification model is needed to acquire a spectrum feature vector containing a motion video which can be interfered. For example, the player may make some unrelated gestures, such as adjusting glasses or swinging the body, while performing a "sword" action. These actions may be mistaken as part of a "sword swing" action and therefore require the identification and removal of these disturbances by extracting spectral feature vectors.

The process of extracting the frequency spectrum feature vector comprises the steps of carrying out wavelet transformation on the video containing the interference action to obtain the frequency distribution of the video, and then carrying out filtering operation on the frequency distribution to obtain the frequency spectrum feature vector. Wavelet transformation is a commonly used signal processing technique that can convert time series data into frequency distribution data so that important patterns or features can be more easily identified.

After the spectrum characteristic vector is obtained, the cyclic updating operation can be performed. In this process, the current undetermined feature vector is determined according to the current and past interference motion feature vectors, and the interference cancellation operation is performed. For example, in the first round of updating, an interference motion feature vector is determined based on the current spectrum feature vector (i.e., the first motion recognition feature), and details of the interference-free task corresponding to the current round are obtained according to the feature vector determined by the past interference motion video. And then, performing interference elimination operation on the interference action feature vector according to the task details to obtain the undetermined feature vector of the first round. The following is about how the user's actions can be more finely identified by a pre-trained trigger action recognition model. In this process, the interference motion feature vector is processed through multiple rounds of cyclic update operation, and finally the required feature vector for motion recognition is obtained. First, by performing wavelet transform on a video including an interference motion, the frequency distribution of the video can be acquired. For example, when a player performs a fencing action, the frequency distribution of this action may have a particular pattern or law, which can be captured by the model and used for action recognition. In the first round of cyclic updating operation, the model uses the spectrum feature vector obtained in the last step as a first motion recognition feature, namely an initial interference motion feature vector. Then, according to the feature vector and past data, the model generates a disturbance eliminating task detail, and then, according to the task detail, the disturbance eliminating operation is carried out on the disturbance action feature vector, so as to obtain a pending feature vector after the first round of cyclic updating. In the second and subsequent round of updating operations, the model combines the pending feature vector of the previous round with the first motion recognition feature and then performs the same interference cancellation operation. This process continues until all loop update operations are completed. And finally, the model takes the undetermined feature vector determined by the last round of cyclic updating as a demand feature vector. This feature vector will be used in the final motion recognition process. In the process, each round of cyclic updating improves and optimizes the original action recognition characteristics so as to more accurately reflect the actual actions of the user. In addition, through multiple rounds of updating operation, the model can gradually eliminate various interference factors, so that the accuracy of motion recognition is improved. In the subsequent cyclic update operation, the system combines the first motion recognition feature and the undetermined feature vector obtained in the previous cyclic update operation to form an interference motion feature vector. The merging process can be integrated through a preset arrangement sequence, and can also be weighted and merged according to the contribution degree corresponding to each feature vector. And then, performing interference elimination operation again to obtain the undetermined feature vector of the current round. After all the cyclic updating operations are completed, the undetermined feature vector determined in the last round is taken as the demand feature vector. And then, according to the interference elimination parameters determined by carrying out structural information extraction on the demand feature vector, carrying out interference elimination operation on the first action recognition feature, thereby obtaining a second action recognition feature. According to the second action recognition feature, the model can recognize the trigger action to obtain a target recognition result containing the interference action video. For example, the system may analyze the feature vectors to calculate a confidence in the identification of a "sword swing" action. If this confidence level reaches a preset threshold, then it can be determined that the user did make a "sword swing" action. The following section is about how to use the extracted demand feature vector for motion recognition. According to the demand feature vector, firstly, performing feature space mapping on the demand feature vector to obtain a process feature vector matched with the first motion recognition feature. This is a transformation step that maps the original feature vector into a new feature space so that under the new feature space, various actions can be better distinguished. The process feature vector is then normalized and the output of the normalization operation is used as the de-interference parameter. Normalization is a common data preprocessing operation that can eliminate the dimensional impact of data so that different features are comparable. At the same time, normalization may also make the data satisfy the assumption of some algorithms, such as some machine learning algorithms, that require the input data to be zero mean and unit variance. With the interference removal parameters, the interference removal operation can be performed on the first action recognition feature to obtain a second action recognition feature. By such processing, the accuracy of motion recognition can be further improved. And finally, recognizing the trigger action according to the second action recognition characteristic. Specifically, the model analyzes the second action recognition feature to obtain a recognition confidence level in which the trigger action exists. If this confidence level reaches a preset threshold, then the user is deemed to have actually performed the trigger action. The above is the whole process of performing motion recognition using the demand feature vector. In general, this process involves multiple steps of feature extraction, feature transformation, normalization, interference cancellation, and motion recognition, each of which contributes to the final motion recognition result. To obtain a trigger action recognition model, a series of training instances first need to be collected. Examples include various sample videos that contain a user's action and their corresponding target values that indicate whether the action is present in the video. For example, a series of video clips may be collected from a player while performing a "sword swing" action, and then a corresponding target value may be given based on the performance of the action in each clip. Then, the model selects partial data from the training examples, and corresponding action recognition features are input into the model to obtain a predicted action label. At the same time, the model optimizes model parameters based on the error between the predicted tag and the actual tag. For example, if the model predicts that there is a "sword swing" in a certain video segment, but in fact the user does not do so, then the model parameters need to be adjusted to reduce this error. In addition, the model includes cancellation factors for interference cancellation operations and detection parameters for expected probability detection. Both of these parameters are obtained by learning the training data. The cancellation factor may help the model reduce the impact of various disturbances on motion recognition, while the detection parameters may be used to evaluate the expected probability of motion. Finally, according to the elimination factor, the model performs interference elimination operation on the motion recognition feature to obtain a second motion recognition feature. And then analyzing the characteristic according to the detection parameter to obtain an expected probability evaluation value of the triggering action. If this evaluation exceeds a preset threshold, then a trigger action is considered to be present in the video. In the process, the model can more accurately identify the actions of the user by continuously optimizing the model parameters and adjusting the action identification strategy, so that the interaction performance of the virtual reality system is improved. Next, how the trigger motion recognition model is obtained through training is discussed. This is a very important step, since the quality of the model will directly influence the effect of the final motion recognition. First, a training instance array needs to be acquired. Each training instance includes a first sample motion recognition feature and a corresponding target value for one sample containing the interfering motion video extraction. For example, data may be collected from the actual user's operations in the game, and then corresponding training examples may be generated according to the user's operations. Examples may include a variety of different actions, as well as corresponding target values, such as fencing, avoidance, etc. Then, an instance is selected from the training instance array, and the corresponding first sample motion recognition feature is input into the trigger motion recognition model. The model will output the corresponding observation trigger action markers. For example, if a user makes a fencing action, the model should be able to correctly output the corresponding markers. Then, the error between the observed trigger action markers and the actual trigger action markers is calculated, and the parameters of the model are optimized according to the error. This is a typical supervised learning process, and the performance of the model can be gradually improved by continuous training and optimization. Finally, a trained trigger action recognition model is obtained. This model may be used to identify new user actions and determine if the user has performed a trigger action.

In the embodiment of the present invention, the aforementioned step S202 may be implemented by performing the following manner.

(1) Acquiring action data corresponding to the current user action;

(2) Inputting motion data corresponding to the current user motion into a target instruction content motion instruction judgment model to obtain a predicted instruction content category corresponding to the current user motion; the target instruction content action instruction judging model is obtained after the basic action instruction judging model is trained according to supervised action data and unsupervised action data sets, the supervised action data are action data corresponding to user actions of pre-configured instruction content, the action data in the unsupervised action data sets are action data corresponding to user actions of non-configured instruction content, and each action data in the unsupervised action data sets is similar action data;

when the predicted instruction content category corresponding to the current user action indicates the target instruction content of the representation of the current user action, matching action data corresponding to the current user action with a service scene database to obtain a current service scene associated with the current user action;

and determining an action execution result corresponding to the current user action according to the current service scene.

In an embodiment of the present invention, the user performs a series of actions, such as jumping, rotating, etc., through the interactive apparatus, for example. The interaction device may obtain data related to the user action, such as acceleration, angle, etc. For example, in a body-sensory dance game, a user may perform various dance steps. The interactive device acquires dance motion data of the user and transmits the dance motion data to the motion instruction judgment model. The model can predict the content category of dance instructions of the current user after training, such as dancing, rotating and the like. For example, in an athletic training application, a user may perform a shooting action. The action instruction determination model predicts that the user's action is "shooting". The interaction device matches the current user action data with a business scenario database to find business scenarios related to "shooting", such as basketball courts, shooting training equipment, etc. For example, in a somatosensory game, a user may need to defeat monsters in a virtual reality environment. When the action instruction judging model predicts that the action of the user is 'attack', and the current service scene is a monster combat scene, the interactive device judges whether the attack of the user is successful or not according to the action data and the service rule of the user, and updates the monster state in the virtual reality environment correspondingly. Through the steps, the actions of the user can be classified and associated by utilizing the action instruction judging model and the service scene database according to the action data of the user, and the execution result of the actions is determined according to the service scene. This may provide a more intelligent and realistic virtual reality training experience.

In one detailed embodiment, assume that a user is participating in a somatosensory dance game. They wear sensor devices that can acquire user motion data such as acceleration, angle and position information. When the user starts dancing, the sensor device will collect and record corresponding motion data. In the dance game described above, there has been a previously trained action instruction determination model. The model is trained using supervised and unsupervised motion data sets. The supervised action data are action instruction contents and corresponding action data configured in advance by a dancer, and the unsupervised action data are dance action data of which the instruction contents are not configured by a user. When the user dances, the interactive device inputs its motion data into the motion instruction decision model. The model predicts the instruction content category corresponding to the current user action, such as "left turn", "right turn" or "forward jump", based on the existing training data. In a virtual reality dance game, there is a business scenario database in which scenario information associated with different instruction content categories is stored. After the action instruction judging model predicts the instruction content category corresponding to the current user action, the interactive equipment matches the instruction content with the service scene database. For example, if the prediction result is "skip forward", the interactive device will find a scene related to "skip forward" according to the service scene database, such as a specific location on the stage or a special effect show. Once the business scenario with which the current user action is associated is determined, the interactive device may determine the outcome of the execution of the action based on the rules and conditions of the scenario. Continuing with the dance game as an example, assume that the user has completed a jump motion and is identified as "jump forward". The interactive device may determine whether the jump height, gesture of the user are correct according to rules and conditions of the dance game, and give corresponding feedback, such as bonus points or display of dance ratings. Through the steps, the virtual reality action recognition method based on the interaction equipment can accurately capture the actions of the user, forecast the instruction content category of the user, further correlate to corresponding business scenes, and determine the execution results of the actions according to scene rules. This can provide a more accurate and personalized virtual reality experience.

In the embodiment of the invention, the method further comprises the following steps:

(1) Acquiring a plurality of pieces of motion data to be supervised, and extracting motion feature vectors corresponding to each piece of motion data to be supervised respectively;

(2) Determining to-be-processed action data from each piece of to-be-supervised action data;

(3) Matching target similar motion data corresponding to the motion data to be processed from the first motion data set according to the motion feature vector corresponding to the motion data to be processed to obtain matching output;

(4) When the matching output is the matching of the target similar motion data corresponding to the motion data to be processed, the motion data to be processed is added to the first motion data set, and when the matching output is the matching of the target similar motion data not corresponding to the motion data to be processed, the motion data to be processed is respectively added to the first motion data set and the second motion data set;

(5) Obtaining next motion data to be supervised from each motion data to be supervised as motion data to be processed, and repeatedly executing the steps of matching target similar motion data corresponding to the motion data to be processed from the first motion data set according to motion feature vectors corresponding to the motion data to be processed to obtain matching output until all the motion data to be supervised are matched;

(6) And configuring instruction content of the user action corresponding to each piece of to-be-supervised action data in the finally determined second action data set to obtain a plurality of pieces of supervised action data.

In an embodiment of the present invention, illustratively, in this scenario, the user performs a series of actions, such as jumping, rotating, etc., through the interactive device. The system collects a plurality of motion data to be supervised, and extracts corresponding motion feature vectors from each motion data to be supervised. The feature vector may include information of acceleration, angle, time, etc. of the action. And the system determines the motion data to be processed according to the motion characteristic vector extracted from the motion data to be supervised. And then, matching target similar motion data corresponding to the motion data to be processed from the first motion data set to obtain matching output. For example, a certain player performs a spin action, and the action data is determined as action data to be processed. The system finds target similar rotation motion data matched with the motion data to be processed from the first motion data set. And if the matching output is the matching of the target similar motion data corresponding to the motion data to be processed, adding the motion data to be processed into the first motion data set. And if the matching output is not matched with the target similar action data corresponding to the action data to be processed, adding the action data to be processed into the first action data set and the second action data set respectively. Continuing with the somatosensory game as an example, when a player performs a spin motion and matches the target like spin motion data, the system adds the spin motion data to the first motion data set. However, if the target similar rotation motion data is not matched, the system can simultaneously add the rotation motion data to be processed to the first motion data set and the second motion data set. The system acquires the next action data to be supervised from each action data to be supervised, and repeats the previous steps until all the action data to be supervised are matched. In the second finally determined action data set, each piece of action data to be supervised is associated with an action of the user. The system can configure instruction content for the action data to be supervised, namely, each action is marked with a corresponding label or category for training a model. Through the steps, the system can utilize the motion data to be supervised and the motion feature vectors to match and classify, add the motion data to be processed into a proper motion data set, and finally generate a plurality of supervised motion data for instruction content configuration and training of a model. Thus, the recognition accuracy and training effect of the motion sensing game to the user action can be improved.

In the embodiment of the present invention, the step of matching target similar motion data corresponding to the motion data to be processed from the first motion data set according to the motion feature vector corresponding to the motion data to be processed to obtain the matching output may be implemented by the following example execution.

(1) According to the vector distance between the motion characteristic vector corresponding to the motion data to be processed and the motion characteristic vector of each motion data in the first motion data set, determining initial similar motion data corresponding to the motion data to be processed from each motion data contained in the first motion data set;

(2) When the similarity measurement of the associated service feature set between the to-be-processed action data and the corresponding initial similar action data is not lower than a preset measurement threshold, taking the initial similar action data corresponding to the to-be-processed action data as target similar action data corresponding to the to-be-processed action data, and obtaining matching output;

(3) And when the similarity measurement of the associated service feature set between the to-be-processed action data and the corresponding initial similar action data is lower than a preset measurement threshold value, or the to-be-processed action data does not have the corresponding initial similar action data, determining that the matching output is not matched with the target similar action data corresponding to the to-be-processed action data.

In an embodiment of the present invention, a player performs a jump motion, and the system calculates a vector distance between a motion feature vector corresponding to the jump motion data and a motion feature vector of each motion data in the first motion data set. For example, in a sports training application, when the motion feature vector of the jumping motion has a smaller vector distance from the motion feature vector of a certain motion data in the first motion data set, the system determines the motion data as initial similar motion data corresponding to the motion data to be processed. For example, in the exercise training application, if the similarity between the associated service feature set of the jumping motion and the associated service feature set of the corresponding initial similar motion data is not lower than the preset measurement threshold, the system may determine the initial similar motion data as the target similar motion data corresponding to the motion data to be processed. For example, in a sports training application, if the similarity of the associated set of business features of the jumping motion to the corresponding set of associated sets of business features of the initial same class of motion data is below a preset metric threshold, or there is no initial same class of motion data in the first set of motion data that is similar to the jumping motion, the system may determine that the matching output does not match the target same class of motion data. Through the steps, the technical scheme calculates vector distances and performs similarity measurement according to the motion data to be processed and the corresponding motion feature vectors so as to determine the similar motion data of the target and generate matching output. This may improve the accuracy of the matching and guiding of user actions in the athletic training application.

acquiring a supervised action data set and an unsupervised action data set;

training the to-be-determined action instruction judging model according to the supervised action data set to obtain a transitional action instruction judging model;

inputting each piece of unsupervised action data in the unsupervised action data set into a transitional action instruction judgment model to obtain a predicted instruction content category corresponding to each piece of unsupervised action data respectively; the content category of the prediction instruction corresponding to the unsupervised action data set is used as the generation category corresponding to the unsupervised action data set;

screening each piece of unsupervised action data according to the generated category to obtain a target unsupervised action data set;

performing similar conversion on each target non-supervision action data in the target non-supervision action data set respectively to obtain similar action data corresponding to each target non-supervision action data respectively;

and forming the target non-supervision action data and the corresponding similar action data into a non-supervision action data set to obtain a plurality of non-supervision action data sets.

In an exemplary embodiment of the present invention, the system may obtain two different sets of data, one being a supervised action data set that has been annotated with instruction content and the other being an unsupervised action data set that has not been annotated with instruction content. The system trains the to-be-determined action instruction judging model by using the supervised action data set so as to obtain a transitional action instruction judging model capable of classifying instruction contents. The system will input each unsupervised motion data into the transition motion command decision model that has been trained to predict the command content category of each unsupervised motion data. These predicted instruction content categories will be used as generation categories for the unsupervised action data set. Then, the system screens each unsupervised action data according to the generated category, thereby obtaining a target unsupervised action data set. These target unsupervised action data are data with specific instruction content categories that are screened from the unsupervised action data. The system will perform a generic transformation on each target unsupervised motion data. In other words, the system will generate similar type of motion data in the motion feature space as each of the target unsupervised motion data. And forming the target unsupervised action data and the corresponding similar action data into an unsupervised action data set. Each unsupervised action data set is composed of target unsupervised action data and corresponding similar action data. Through the steps, the technical scheme utilizes the supervised action data to train a model, then uses the model to predict and screen the unsupervised action data, and finally generates a plurality of unsupervised action data sets. Therefore, the classification and generation of the unsupervised action data in the automatic generation application of the motion can be realized, and more abundant action choices are provided for the user.

In a detailed embodiment, the supervised action data set includes action data that has been annotated with instruction content, such as a data set containing different actions, each with corresponding instruction content. The unsupervised action data set is an action data set not marked with instruction content. The system uses the supervised action data set to train a transitional action instruction decision model. This model can predict actions that are not annotated with instructional content by learning the relationship between actions in the supervised action data set and instructional content. The system will input each unsupervised action data from the unsupervised action data set into the trained transitional action instruction decision model to predict its corresponding instruction content category. This instruction content category may be expressed as a numerical value or as a label describing the instruction content of the unsupervised action data. The system then screens the unsupervised action data for generation categories. Specifically, the system groups instruction content categories together with unsupervised action data for the same instruction content category. For each target unsupervised action data, the system will make a generic transformation. This means that the system will generate similar kind of motion data as each target unsupervised motion data in the motion feature space. For example, if the target unsupervised motion data is a running motion, the system may generate other variations of running motion. And forming the target unsupervised action data and the corresponding similar action data into an unsupervised action data set. Each unsupervised action data set contains one target unsupervised action data and its cognate action data, which may provide a more varied choice to the user. Through the steps, the technical scheme trains a transitional motion instruction judging model by using the supervised motion data, and predicts and screens the unsupervised motion data by using the model. The system may then provide the user with multiple unsupervised action data sets by transforming and assembling the same kind of unsupervised action data sets to enrich the action choices in the motion auto-generation application.

In the embodiment of the present invention, the step of screening each unsupervised action data according to the generation category to obtain the target unsupervised action data set may be implemented in the following manner.

(1) Counting the number of target categories and non-target categories in each generated category to obtain the number of target categories and the number of non-target categories;

(2) Acquiring fewer category numbers from the target category numbers and the non-target category numbers as demand category numbers, and taking instruction content categories corresponding to the demand category numbers as demand categories;

(3) Acquiring a first number of unsupervised action data from each unsupervised action data having a demand category as target unsupervised action data; the first number is lower than the number of demand categories;

(4) Acquiring a second number of unsupervised action data from each unsupervised action data not having a demand category as target unsupervised action data;

(5) Obtaining a target non-supervision action data set according to each target non-supervision action data

In an embodiment of the present invention, the system counts the number of target categories and non-target categories in each generated category based on the predicted instruction content categories. For example, a certain generation category may contain 10 target categories and 20 non-target categories. The lesser number of categories is selected from the number of target categories and the number of non-target categories as the number of demand categories. This number of demand categories will be used to screen the unsupervised action data. Meanwhile, the corresponding instruction content category will be regarded as the demand category. From each unsupervised action data having a demand category, a first number of unsupervised action data is obtained as target unsupervised action data. For example, if the number of demand categories is 5, the system will select the top 5 data from each of the unsupervised action data with demand categories as the target unsupervised action data. It should be noted that the first number actually available may be lower than the number of demand categories. Meanwhile, the system can acquire a second number of unsupervised action data from each unsupervised action data without the requirement category as target unsupervised action data. These data without the demand category may be used for other purposes. Through the steps, the technical scheme screens and classifies the non-supervision action data by utilizing the generation category and the demand category, so that a target non-supervision action data set is formed. This may enable grouping and selection of unsupervised action data in video classification applications so that users can more quickly find their desired relevant content.

In the embodiment of the present invention, the step of performing the same-class conversion on each target unsupervised action data in the target unsupervised action data set to obtain the same-class action data corresponding to each target unsupervised action data respectively may be implemented by the following example execution.

(1) And respectively carrying out reverse conversion on each target non-supervision action data in the target non-supervision action data set to obtain reverse conversion action data corresponding to each target non-supervision action data, and taking the reverse conversion action data corresponding to the target non-supervision action data set as similar action data corresponding to the target non-supervision action data set.

In an embodiment of the present invention, the system may reverse-convert each target unsupervised action data in the target unsupervised action data set, for example. Reverse conversion refers to reverse manipulation of the original motion data, for example, conversion from normal walking to backward walking. By reverse conversion, reverse conversion motion data similar in motion feature space to the target unsupervised motion data can be generated. And taking the reverse conversion action data corresponding to the target unsupervised action data set as the similar action data. That is, the reverse-transformed action data has the same instruction content category as the target unsupervised action data, but the action features are reversed. For example, if the target unsupervised motion data is a running motion, its reverse conversion motion data will be a reverse running motion. Thus, the system can generate similar motion data similar to the target unsupervised motion data in motion feature space. And forming the reverse conversion action data corresponding to the target non-supervision action data set and the corresponding target non-supervision action data into a similar action data set. Each of the homogeneous action data sets is composed of target unsupervised action data and its corresponding inverse transformation action data. Through the steps, the technical scheme carries out reverse conversion on the target unsupervised action data, and similar action data similar to the target unsupervised action data in action feature space is generated. Thus, action selection in the automatic generation application of the sports can be enriched, and more diversified sports options are provided for users.

(1) Respectively inputting first action data and second action data in the supervised action data and the unsupervised action data set into a basic action instruction judgment model to obtain the content category of the prediction instruction corresponding to the supervised action data, the first action data and the second action data;

(2) Obtaining a first cost according to the deviation between the predicted instruction content category corresponding to the supervised action data and the supervised instruction content category, and obtaining a second cost according to the deviation between the predicted instruction content category corresponding to the first action data and the second action data in the unsupervised action data set;

(3) And updating the model structure parameters of the basic action instruction judgment model according to the first cost and the second cost until the model structure parameters accord with a preset cost parameter function, and obtaining the target instruction content action instruction judgment model.

In the present embodiment, it is assumed, by way of example, that there is a virtual exercise coaching application based on motion instructions that require training and generating a target instruction content motion instruction decision model based on user-supplied supervised motion data and unsupervised motion data sets. The system inputs first motion data and second motion data in the supervised motion data and the unsupervised motion data sets, respectively, into the base motion instruction determination model. The basic action instruction judging model predicts the supervised action data, the first action data and the second action data respectively, and obtains the corresponding predicted instruction content category. These predicted instruction content categories may be represented as numerical values or labels that describe the instruction content of each datum. And calculating the first cost according to the deviation between the predicted instruction content category corresponding to the supervised action data and the actual supervised instruction content category. Similarly, a second cost is calculated based on a deviation between the category of predicted instruction content corresponding to the first action data and the second action data in the unsupervised action data set. Based on the first cost and the second cost, the system updates model structure parameters (e.g., weights and deviations) of the base action instruction decision model. The steps are iterated, and model structure parameters are updated continuously until a preset cost parameter function is met. The cost parameter function may be expressed as a minimization error or other measure of model performance. After repeated iterative updating, the obtained basic action instruction judging model becomes a target instruction content action instruction judging model and is used for judging the instruction content of unknown action data. Through the steps, the technical scheme utilizes the supervised action data and the unsupervised action data set to train and generate the target instruction content action instruction judgment model. Thus, the accurate recognition and feedback capability of the action instruction content of the user in the application of the virtual body-building coach can be improved, and more accurate personalized body-building guidance can be provided.

In a detailed embodiment, it is first necessary to prepare both supervised and unsupervised action data sets. The supervised action data is action data provided by the user and tagged with instruction content categories. The unsupervised action data set contains action data without a category of instruction content marked. And respectively inputting the first action data and the second action data in the supervised action data group and the unsupervised action data group into a basic action instruction judgment model for prediction. The base action instruction determination model may be a classification model that predicts the corresponding instruction content category based on the entered action data. The first cost is calculated by comparing the deviation between the predicted instruction content category and the actual supervised instruction content category. This deviation may be calculated using different loss functions or distance measures. With the calculation results of the first cost and the second cost, the model structure parameters, such as weights and deviations, of the basic action instruction determination model can be updated by using an optimization algorithm such as gradient descent. By iteratively updating these parameters, the model can be adjusted stepwise to reduce the error in instruction content class prediction. And repeatedly executing the steps 2, 3 and 4, and continuously and iteratively updating the model structure parameters until a preset cost parameter function is met. The cost parameter function may be an indicator of the performance of the evaluation model, such as a cross entropy loss function or a mean square error. After repeated iterative updating, the obtained basic action instruction judging model becomes a target instruction content action instruction judging model. This model has been trained with supervised and unsupervised data and is able to predict the corresponding instruction content categories from the entered action data. Through the steps, the technical scheme can effectively train and generate the target instruction content action instruction judgment model. The model can be used in intelligent systems, virtual coaches or other scenarios to provide more accurate instruction content understanding and personalized feedback by identifying and categorizing motion data.

In a detailed embodiment, the step of obtaining the first cost according to the deviation between the category of the predicted instruction content and the category of the supervised instruction content corresponding to the supervised action data may be implemented by the following example.

(1) Filtering supervised action data of which the numerical value of the predicted instruction content class is larger than a preset numerical value threshold value in each supervised action data;

(2) And in each piece of the rest supervised action data, obtaining a first cost according to the deviation between the predicted instruction content category and the supervised instruction content category corresponding to the same supervised action data.

In the present embodiment, the hypothetical application is an exercise training application based on motion instructions, and the user performs personalized motion instruction content classification by providing supervised motion data. The user provides a series of supervised action data that has been tagged with instruction content categories. At the same time, there is a set of unsupervised action data. Firstly, the supervised action data is input into a basic action instruction judging model for prediction, and the content category of the prediction instruction corresponding to each supervised action data is obtained. Next, in each supervised action data, data of the predicted instruction content class having a value greater than a preset value threshold is filtered out. This means that only data with high prediction accuracy are reserved, and data with insufficient credibility of the prediction result is removed. And calculating the rest supervised action data according to the deviation between the corresponding predicted instruction content category and the actual supervised instruction content category. This deviation may be calculated using different loss functions or distance measures, such as cross entropy loss functions or mean square error, etc. Through the steps, the predicted instruction content category of each piece of supervised action data is obtained, and data with unreliable predicted results is filtered out. Meanwhile, a first cost is calculated according to the reserved data and is used for evaluating the performance of the model and optimizing the training process.

In a detailed embodiment, the user passes through an application that records a series of supervised motion data that contains different kinds of athletic movements and has been tagged with corresponding instruction content categories. At the same time, there is also an unsupervised set of action data that is not tagged with instruction content categories. Firstly, the supervised action data is input into a basic action instruction judging model for prediction, and the content category of the prediction instruction corresponding to each supervised action data is obtained. Next, in each supervised action data, a preset value threshold is set according to the value of the predicted instruction content class. Only the data with the content category value of the predicted instruction larger than the threshold value is reserved, namely the predicted result can reach a certain credibility. For example, if the preset value threshold is set to 0.8, then for a supervised action data, if the value of its predicted instruction content class is greater than 0.8, then that data is retained; otherwise it is filtered out. For each supervised action data remaining, a first cost is calculated by comparing the deviation between the predicted instruction content category and the actual supervised instruction content category. This deviation may be calculated using different loss functions or distance measures. As an example, assume that a user provides a set of supervised motion data, including squat, push-up, pull-up, and the like, different athletic movements. In the prediction stage, the model predicts the corresponding instruction content category for each action data, and obtains the following results: content category of squat prediction instruction: 0.9; push-up prediction instruction content category: 0.6; pull-up predicted instruction content category: 0.85; based on a preset value threshold (e.g., 0.8), the predicted instruction content category values for squat and pull-up are found to be greater than the threshold, thus preserving both motion data. And the numerical value of the predicted instruction content category of the push-up is smaller than the threshold value, so that the push-up is filtered. Next, in the squat and pull-up hold motion data, the deviation between the predicted and actual supervised instruction content categories of these data is calculated, resulting in a first cost.

Through the above steps, the first cost is calculated according to the method in the technical solution, and data with insufficient credibility of the prediction results are filtered out. Thus, the accuracy and the reliability of the model can be improved, and more accurate action instruction content classification and personalized training suggestions can be provided for the user.

In the embodiment of the invention, the preset value threshold is not lower than the standard value, and the standard value is determined by dividing the value range corresponding to the predicted instruction content category according to the category classification number corresponding to the supervised instruction content category; the preset numerical threshold has a forward association relationship with the model training epoch.

In an embodiment of the present invention, the user passes through an application that records a series of supervised action data that has been tagged with instruction content categories, for example. At the same time, there is a set of unsupervised action data. Firstly, the supervised action data is input into a basic action instruction judging model for prediction, and the content category of the prediction instruction corresponding to each supervised action data is obtained. Next, a standard value is determined based on the number of category classifications corresponding to the supervised instruction content categories. For example, if the supervised instruction content categories are divided into three categories (e.g., squat, push-up, pull-up), then the value range of the predicted instruction content category may be divided into three intervals. And setting a preset numerical threshold according to the determined standard numerical value, and ensuring that the preset numerical threshold is not lower than the standard numerical value. Doing so may allow the predicted instruction content category to better match the actual instruction content category. For example, if three supervised instruction content classes (squat, push-up, pull-up) are used in the previous scenario, and the value range of the predicted instruction content class is divided into three sections: according to the technical scheme, the preset numerical threshold has a positive association relation with the model training epoch, which means that the preset numerical threshold can be increased or changed along with the increase of the training epoch, and the assumed model carries out 10 training epochs.

In the embodiment of the present invention, the step of obtaining the second cost according to the deviation between the content categories of the prediction instruction corresponding to the first action data and the second action data in the unsupervised action data set may be implemented by the following example.

(1) Filtering, in each of the unsupervised action data sets, the unsupervised action data set having a value of the predicted instruction content class within a predetermined range; the predetermined range is a central segment in a value domain corresponding to the predicted instruction content category;

(2) And in each of the rest of the non-supervision action data sets, obtaining a second cost according to the deviation between the predicted instruction content categories respectively corresponding to the first action data and the second action data in the same non-supervision action data set.

In an embodiment of the present invention, the user passes through an application that records a series of supervised action data that has been tagged with instruction content categories, for example. At the same time, there is a set of unsupervised action data. And inputting the supervised action data into a basic action instruction judging model for prediction, and calculating the deviation between the predicted instruction content category and the actual instruction content category in each supervised action data, so as to obtain a first cost. For example, assume that a user provides a set of supervised motion data including squat, push-up, pull-up, and the like. In the prediction phase, the model predicts the corresponding instruction content category for each motion data, and calculates the deviation from the actual instruction content category. According to the method in the solution, the second cost is calculated using a deviation between the content categories of the predicted instruction corresponding to the first and second motion data in each unsupervised motion data set. First, in each unsupervised action data set, a predetermined range, i.e., a center segment in the class value field, is determined according to the value field corresponding to the predicted instruction content class. The unsupervised action data set with the predicted instruction content class within the predetermined range is then filtered. For example, if the value range of the predicted instruction content class is [0,1] in a certain unsupervised action data set, the predetermined range may be (0.2, 0.8). Unsupervised action data sets outside this range will be filtered out. Next, in each of the remaining unsupervised action data sets, a second cost is calculated based on the deviation between the predicted instruction content categories corresponding to the first and second action data in the same unsupervised action data set. Through the steps, the second cost is calculated according to the method in the technical scheme, and the unsupervised action data set of the prediction result in the preset range is filtered out. Thus, the accuracy and the reliability of the model can be further improved, and more accurate action instruction content classification and personalized training suggestions can be provided for the user.

In the embodiment of the invention, the predicted instruction content category includes probabilities corresponding to the target category and the non-target category respectively; the step of obtaining the second cost according to the deviation between the content categories of the prediction instructions corresponding to the first action data and the second action data in the unsupervised action data set can be implemented through the following example.

(1) Obtaining category cost according to deviation between the content categories of the prediction instructions respectively corresponding to the first action data and the second action data in the unsupervised action data set;

(2) Performing cross entropy calculation according to each probability contained in the same prediction instruction content category to respectively obtain cross entropy corresponding to each unsupervised action data set;

(3) Obtaining information entropy cost cross entropy cost according to the cross entropy corresponding to each unsupervised action data set;

(4) And obtaining a second cost according to the category cost and the cross entropy cost.

In an embodiment of the present invention, the user passes through an application that records a series of supervised action data that has been tagged with instruction content categories, for example. At the same time, there is a set of unsupervised action data. And inputting the supervised action data into a basic action instruction judgment model to predict, and obtaining the probability of each predicted instruction content category on the target category and the non-target category. For example, if the model predicts that the corresponding instruction content categories are squat and push-up for the first and second motion data of a certain unsupervised motion data set, respectively, and gives corresponding probabilities (e.g., 0.8 and 0.2), then a probability of 0.8 may be considered as the probability of the target category for the target category and 0.2 may be considered as the probability of the non-target category for the non-target category. According to the method in the technical scheme, category costs are calculated according to deviations between the categories of the content of the predicted instructions corresponding to the first and second action data in the unsupervised action data set. For example, if the predicted instruction content categories corresponding to the first and second motion data are squat and push-up, and the deviation therebetween is large, the category cost will be correspondingly high. According to the probability of the content category of the prediction instruction corresponding to each unsupervised action data set, the cross entropy corresponding to each unsupervised action data set can be calculated respectively. For example, if the probability distribution of a predicted instruction content class for a certain unsupervised action dataset is [0.8,0.2], then cross entropy may be used to measure the variability of the probability distribution from the target class. According to the method in the technical scheme, category cost and cross entropy cost are combined to obtain final second cost. Through the steps, the second cost is calculated according to the method in the technical scheme, and the category cost and the cross entropy cost are comprehensively considered. In this way, the performance and accuracy of the model can be more comprehensively evaluated, and more accurate action instruction content classification and personalized training suggestions can be provided for the user.

In the embodiment of the present invention, the step of pairing the action data corresponding to the current user action with the service scenario database to obtain the current service scenario associated with the current user action may be implemented through the following example execution.

(1) Acquiring a business knowledge graph and a business description; the business knowledge graph is generated according to a business scene database, and the business description comprises business features with distinction between each business attribute in the business scene database;

(2) Determining target service characteristics from action data corresponding to the current user action;

(3) When the target service characteristics are matched in the service description, matching a target associated service characteristic set corresponding to the target service characteristics with a service knowledge graph; the target associated service feature set corresponding to the target service feature comprises a target service feature and a strong associated service feature set of the target service feature in the action data corresponding to the current user action;

(4) When a target associated service feature set corresponding to the target service feature is paired to a node in the service knowledge graph, taking a service scene corresponding to the paired node as a current service scene associated in the current user action;

(5) And acquiring the next service feature from the action data corresponding to the current user action as a target service feature, and repeatedly executing the step of pairing the target associated service feature set corresponding to the target service feature and the service knowledge graph when the target service feature is matched in the service description until all the service features in the action data corresponding to the current user action participate in pairing, so as to obtain each current service scene associated with the current user action.

In the embodiment of the invention, the user is assumed to play a virtual reality based somatosensory dance game, and a game operation is performed by wearing a virtual reality head display and a glove provided with a motion capture sensor. The system can pair according to the action data of the user and the service scene database to determine the current service scene of the user. The system firstly acquires a pre-constructed business knowledge graph which contains the relations between different dance actions and dance scenes. Meanwhile, the system also extracts business characteristics between dance movements and scenes, such as dance postures, dance music rhythms and the like. The system extracts the action characteristics of the current user according to the action data of the user, such as gestures, body gestures and the like. The system matches the user's motion characteristics with dance motion characteristics in the business description. If the matching is successful, the system can find other service feature sets which are strongly related to the target service feature, such as dance scenes, dance music styles and the like. And the system determines the specific dance scene where the user is currently located according to the matched nodes. For example, if the user's motion characteristics match the two target business characteristics of "left turn" and "hip-hop" and the business scenario node associated therewith is found, the system may determine that the user is currently in the business scenario of "hip-hop-turn". And acquiring the next service feature from the action data corresponding to the current user action as a target service feature, and repeating the steps of matching and pairing until all the service features participate in pairing to obtain each current service scene associated with the current user action. Through the steps, the system can accurately identify the actions of the user in the virtual reality environment and associate the actions with corresponding service scenes. In this way, the game can provide personalized services such as real-time feedback, scoring, game recommendation and the like according to the actions and service scenes of the user, and the participation feeling and game experience of the user are improved.

In a more detailed embodiment, first, the system obtains a business knowledge graph and a business description. The business knowledge graph is generated according to a business scene database, wherein the business knowledge graph comprises relations among different business scenes. The service description includes the distinction between each service attribute in the service scenario database, i.e., the service characteristics. Next, the system extracts target business features from the user's current action data. Such motion data may be captured by an interactive device (e.g., a handle, head display, sensor, etc.), such as a user's gestures, motion trajectories, etc.

The system matches the target business characteristics with the business description. The system compares each service feature in the target service feature and the service description one by one, and finds the matched feature. If a matching feature is found, the system proceeds to the next step.

And then, the system pairs the target associated service feature set corresponding to the target service feature with the service knowledge graph. The set of target associated business features includes other business features that are strongly related to the target business feature. The system will find the nodes in the traffic knowledge graph that match the set of target associated traffic features.

Once a matching node is found, the system takes the business scenario corresponding to the node as the current business scenario associated with the current user action. For example, if the user's motion characteristics match the two target business characteristics of "left turn" and "hip-hop" and the business scenario node associated therewith is found, the system may determine that the user is currently in the business scenario of "hip-hop-turn".

The system acquires the next service feature from the action data corresponding to the current user action as the target service feature, and repeatedly executes the steps. The system will continuously match the target business features according to the user's actions and business descriptions and find the business scenario associated with it.

Finally, when each service feature in the action data of the user participates in pairing, the system obtains each current service scene associated with the current action of the user.

By the method, the system can accurately judge the current virtual reality action scene of the user according to the pairing relation between the action of the user on the interactive equipment and the service scene database. This helps provide a more immersive, personalized virtual reality experience and enables corresponding services and recommendations to be provided according to the needs of the user.

In an embodiment of the present invention, the method further includes the following implementation.

(1) Acquiring a business linkage knowledge graph; the business linkage knowledge graph is generated according to a preset business linkage data set, wherein the preset business linkage data set comprises business linkage data identifiers of business linkage data corresponding to each business scene respectively corresponding to a business scene database; the service description comprises a service scene database and each service feature which is related in a preset service linkage data set and is different from the service scene database;

(2) When the pairing of the action data corresponding to the current user action and the business knowledge graph is unsuccessful, the action data corresponding to the current user action and the business linkage knowledge graph are paired to obtain each target business linkage data associated with the current user action;

(3) And taking the service scene corresponding to each target service linkage data as each current service scene associated with the current user action.

In the embodiment of the invention, an exemplary system acquires a preset service scene database and a service linkage data set. The service scene database contains the relations among different game scenes, and the service linkage data set contains service linkage data identifiers corresponding to the service scenes respectively. If the system cannot match the action data of the current user with the service knowledge graph, namely the service characteristics and scene nodes corresponding to the actions cannot be found, the system can try to match the action data with the service linkage knowledge graph. By pairing the action data of the current user with the business linkage knowledge graph, the system can obtain business scenes associated with each target business linkage data. The system will use the business scenario associated with the target business linkage data as the current business scenario associated with the current user action. For example, if the system cannot successfully match the business knowledge graph with the action data of the current user, but can find business scenes related to the action data through the business linkage knowledge graph, the business scenes are marked as the current business scene associated with the current user action. Through the steps, the system can determine the current business scene of the user and the related business linkage data by means of the business linkage knowledge graph under the condition that the business knowledge graph cannot be matched. The method is beneficial to realizing diversified business scenes of games and providing richer game experience, and further enhances the immersion and cooperative interaction effects of players.

In one detailed embodiment, suppose a user is playing a virtual reality-based collaborative shooting game to play a team with other players on a virtual battlefield. Each player wears a virtual reality head display and a glove equipped with motion capture sensors to perform game operations. The system firstly acquires a preset business scene database and a business linkage data set. The service scene database contains the relations among different game scenes, and the service linkage data set contains service linkage data identifiers corresponding to the service scenes respectively. For example, different game scenes such as "desert fight scene", "city street fight scene" and the like may be defined in the database and associated with corresponding business linkage data. If the system cannot match the action data of the current user with the service knowledge graph, namely the service characteristics and scene nodes corresponding to the actions cannot be found, the system can try to match the action data with the service linkage knowledge graph. This can extend the recognition capabilities of the system to accommodate more traffic scenarios. By pairing the action data of the current user with the business linkage knowledge graph, the system can find the business scenario associated with each target business linkage data. For example, if the user's action data cannot match a particular business feature in the business knowledge graph, but can match a particular business feature in the business linkage knowledge graph, then the business scenario associated with that business feature is considered to be the current business scenario associated with the user's current action. The system regards the business scenario associated with the target business linkage data as the current business scenario associated with the current user action. In a game, these business scenarios may represent different combat environments or mission scenarios. For example, if the user's action data cannot be matched with a specific business feature in a business knowledge graph, but a business scenario "desert fight scenario" related to the action data is found through the business linkage knowledge graph, then the scenario is marked as the current business scenario associated with the current user action. By means of the technical scheme, the system can accurately judge the current virtual reality action scene of the user according to the action data of the user on the interactive equipment and combining the pairing relation of the business knowledge graph and the business linkage knowledge graph, and correlate the virtual reality action scene with the associated business linkage data. In this way, the game can provide real-time personalized services such as cooperative shooting experience, task targets, game recommendation and the like according to the actions and business scenes of the users, the immersion of the players is enhanced, and the cooperative interaction effect among the players is promoted.

The embodiment of the invention provides a computer device 100, where the computer device 100 includes a processor and a nonvolatile memory storing computer instructions, and when the computer instructions are executed by the processor, the computer device 100 executes the foregoing virtual reality action recognition method based on the interactive device. As shown in fig. 2, fig. 2 is a block diagram of a computer device 100 according to an embodiment of the present invention. The computer device 100 comprises a memory 111, a processor 112 and a communication unit 113.

For data transmission or interaction, the memory 111, the processor 112 and the communication unit 113 are electrically connected to each other directly or indirectly. For example, the elements may be electrically connected to each other via one or more communication buses or signal lines.

The foregoing description, for purpose of explanation, has been presented with reference to particular embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the disclosure to the precise forms disclosed. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the disclosure and its practical application, to thereby enable others skilled in the art to best utilize the disclosure and various embodiments with various modifications as are suited to the particular use contemplated. The foregoing description, for purpose of explanation, has been presented with reference to particular embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the disclosure to the precise forms disclosed. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the disclosure and its practical application, to thereby enable others skilled in the art to best utilize the disclosure and various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. The virtual reality action recognition method based on the interaction equipment is characterized by comprising the following steps of:

responding to the fact that a current user enters an action acquisition area appointed by interaction equipment, and identifying trigger actions of the current user based on a pre-trained trigger action identification model;

acquiring action data corresponding to the current user action on the basis that the target identification result aiming at the current user is characterized as triggering action, and determining an action execution result corresponding to the current user action;

2. The method of claim 1, wherein the identifying the trigger action for the current user based on a pre-trained trigger action identification model comprises:

performing wavelet transformation on the video containing the interference action based on a pre-trained trigger action recognition model to obtain the frequency distribution of the video containing the interference action;

filtering the frequency distribution of the video containing the interference action to obtain a corresponding frequency spectrum feature vector serving as a first action recognition feature;

performing a cyclic update operation on the first action recognition feature, wherein the cyclic update operation is as follows: determining an interference motion feature vector according to the first motion recognition feature, obtaining interference elimination task details corresponding to a current circulation time according to the past interference motion feature vector determined for a past interference-containing motion video, and performing interference elimination operation on the interference motion feature vector according to the interference elimination task details to obtain a pending feature vector corresponding to the current circulation time, wherein in a first-round circulation updating operation, the interference motion feature vector is the first motion recognition feature, and in a circulation updating operation except for the first-round circulation, the interference motion feature vector is obtained by combining the first motion recognition feature and the determined pending feature vector;

Taking the undetermined feature vector determined by the last round of cyclic updating as a demand feature vector, and carrying out feature space mapping on the demand feature vector to obtain a process feature vector with feature space matched with the first action recognition feature;

performing a normalization operation on the process feature vector, and taking the output of the normalization operation as a disturbance elimination parameter;

according to the interference elimination parameter, scalar multiplication operation is carried out on the first action recognition feature, and a second action recognition feature is obtained;

performing action recognition feature analysis on the second action recognition feature to obtain recognition confidence that trigger action exists in the second action recognition feature;

and when the identification confidence coefficient reaches a preset trigger threshold, determining that the trigger action exists in the video containing the interference action.

3. The method of claim 2, wherein the loop update operation comprises:

if the first motion recognition feature is the first round of cyclic updating operation, taking the first motion recognition feature as an interference motion feature vector, obtaining corresponding interference elimination task details according to corresponding past interference motion feature vectors determined for past interference-containing motion video, and performing interference elimination operation on the interference motion feature vector according to the interference elimination task details to obtain a pending feature vector corresponding to the first round of cyclic updating operation;

If the operation is the second round of cyclic updating operation, merging the first action recognition feature and a pending feature vector obtained by the upper round of cyclic updating operation of the current round of cyclic updating operation into an interference action feature vector, obtaining corresponding interference elimination task details according to the corresponding past interference action feature vector determined for the past interference-containing action video, and performing interference elimination operation on the interference action feature vector according to the interference elimination task details to obtain the pending feature vector corresponding to the second round of cyclic updating operation;

and if the operation is the third round of cyclic updating operation, merging the first action recognition feature and two undetermined feature vectors obtained by the previous two rounds of cyclic updating operation of the current round of cyclic updating operation into interference action feature vectors, obtaining corresponding interference elimination task details according to the corresponding past interference action feature vectors determined for the past interference-containing action video, and carrying out interference elimination operation on the interference action feature vectors according to the interference elimination task details to obtain undetermined feature vectors corresponding to the third round of cyclic updating operation.

4. The method of claim 2, wherein the combining of the interference motion feature vectors is performed by at least one of:

According to a preset arrangement sequence, performing an integration operation on each determined undetermined feature vector and the first action recognition feature to obtain an interference action feature vector;

according to the determined merging adjustment factors respectively associated with the undetermined feature vectors and the merging adjustment factors associated with the first action recognition features, weighting and merging the undetermined feature vectors and the first action recognition features to obtain interference action feature vectors; and each merging adjustment factor characterizes the contribution degree of the corresponding undetermined feature vector or the first action recognition feature to the interference action feature vector.

5. The method of claim 2, wherein the trigger action recognition model is derived by:

obtaining a training example array, wherein each training example in the training example array comprises a first sample action identification feature extracted for one sample interference-containing action video and a corresponding target value, the target value at least comprises a sample trigger action mark, and the sample trigger action mark represents whether the trigger action really exists in the corresponding sample interference-containing action video;

Selecting a training example from the training example array, and inputting corresponding first sample action recognition features into the trigger action recognition model to obtain an observation trigger action mark aiming at trigger action recognition; the target value also comprises standard action recognition features extracted from the video containing the interference action for the corresponding sample, wherein the standard action recognition features are spectral feature vectors without interference;

optimizing the cancellation factor according to a first error between the second sample motion recognition feature and the corresponding standard motion recognition feature;

obtaining a second error between the observed trigger action mark and the corresponding actual trigger action mark, and obtaining a correlation error according to a correlation coefficient between the first error and the second error;

and respectively optimizing the elimination factor and the detection parameter according to the association error.

6. The method of claim 5, wherein the observed trigger action markers identified for the trigger action are obtained by:

performing interference elimination operation on the first sample action recognition feature according to the elimination factor to obtain a corresponding second sample action recognition feature;

And according to the detection parameters, performing action recognition feature analysis on the second sample action recognition feature to obtain an expected probability evaluation value of the trigger action, and according to check feedback between the expected probability evaluation value and a preset trigger action threshold, obtaining an observation trigger action mark.

7. The method according to claim 1, wherein the obtaining the action data corresponding to the current user action, and determining the action execution result corresponding to the current user action, comprises:

acquiring action data corresponding to the current user action;

inputting the motion data corresponding to the current user motion into a target instruction content motion instruction judgment model to obtain a predicted instruction content category corresponding to the current user motion; the target instruction content action instruction judging model is obtained after the basic action instruction judging model is trained according to supervised action data and unsupervised action data sets, the supervised action data are action data corresponding to user actions of pre-configured instruction content, the action data in the unsupervised action data sets are action data corresponding to user actions of non-configured instruction content, and each action data in the unsupervised action data sets is similar action data;

Acquiring a service knowledge graph and a service description when the predicted instruction content category corresponding to the current user action indicates the target instruction content of the current user action representation; the business knowledge graph is generated according to a business scene database, and the business description comprises business features with distinction between each business attribute in the business scene database;

determining target service characteristics from action data corresponding to the current user action;

when the service description is matched with the target service feature, matching a target associated service feature set corresponding to the target service feature with the service knowledge graph; the target associated service feature set corresponding to the target service feature comprises a target service feature and a strong associated service feature set of the target service feature in the action data corresponding to the current user action;

when a target associated service feature set corresponding to a target service feature is paired to a node in the service knowledge graph, taking a service scene corresponding to the paired node as a current service scene associated in the current user action;

acquiring a next service feature from the action data corresponding to the current user action as a target service feature, and repeatedly executing the step of pairing a target associated service feature set corresponding to the target service feature with the service knowledge graph when the target service feature is matched in the service description until all service features in the action data corresponding to the current user action participate in pairing, so as to obtain each current service scene associated with the current user action;

8. The method of claim 7, wherein the method further comprises:

acquiring a plurality of pieces of motion data to be supervised, and extracting motion feature vectors corresponding to each piece of motion data to be supervised respectively;

determining to-be-processed action data from each piece of to-be-supervised action data;

according to the vector distance between the motion characteristic vector corresponding to the motion data to be processed and the motion characteristic vector of each motion data in the first motion data set, determining initial similar motion data corresponding to the motion data to be processed from each motion data contained in the first motion data set;

when the similarity measurement of the associated service feature set between the to-be-processed action data and the corresponding initial similar action data is not lower than a preset measurement threshold, taking the initial similar action data corresponding to the to-be-processed action data as target similar action data corresponding to the to-be-processed action data, and obtaining matching output;

when the similarity measurement of the associated service feature set between the to-be-processed action data and the corresponding initial similar action data is lower than the preset measurement threshold value, or the to-be-processed action data does not have the corresponding initial similar action data, determining that the matching output is not matched with the target similar action data corresponding to the to-be-processed action data;

When the matching output is the target similar motion data corresponding to the motion data to be processed, the motion data to be processed is added to the first motion data set, and when the matching output is the target similar motion data not corresponding to the motion data to be processed, the motion data to be processed is respectively added to the first motion data set and the second motion data set;

obtaining next motion data to be supervised from each motion data to be supervised as motion data to be processed, repeatedly executing the vector distance between motion characteristic vectors corresponding to the motion data to be processed and respective motion characteristic vectors of each motion data in a first motion data set, and determining initial similar motion data corresponding to the motion data to be processed from each motion data contained in the first motion data set to the step of determining that the matching output is not matched with target similar motion data corresponding to the motion data to be processed until all the motion data to be supervised are matched;

and configuring instruction content of the user action corresponding to each piece of to-be-supervised action data in the finally determined second action data set to obtain a plurality of pieces of supervised action data.

9. The method of claim 7, wherein the method further comprises:

acquiring a supervised action data set and an unsupervised action data set;

training a to-be-determined action instruction judging model according to the supervised action data set to obtain a transitional action instruction judging model;

inputting each piece of unsupervised action data in the unsupervised action data set into the transition action instruction judgment model to obtain a predicted instruction content category corresponding to each piece of unsupervised action data respectively; the content category of the prediction instruction corresponding to the unsupervised action data set is used as the generation category corresponding to the unsupervised action data set;

counting the number of target categories and non-target categories in each generated category to obtain the number of target categories and the number of non-target categories;

acquiring fewer category numbers from the target category numbers and the non-target category numbers as demand category numbers, and taking instruction content categories corresponding to the demand category numbers as demand categories;

acquiring a first number of unsupervised action data from each unsupervised action data having the demand category as target unsupervised action data; the first number is lower than the number of demand categories;

Acquiring a second number of unsupervised action data from each unsupervised action data not having the demand category as target unsupervised action data;

obtaining a target non-supervision action data set according to each target non-supervision action data;

respectively performing reverse conversion on each target non-supervision action data in the target non-supervision action data set to obtain reverse conversion action data corresponding to each target non-supervision action data respectively, and taking the reverse conversion action data corresponding to the target non-supervision action data set as similar action data corresponding to the target non-supervision action data set;

forming an unsupervised action data set by the target unsupervised action data and corresponding similar action data to obtain a plurality of unsupervised action data sets;

the method further comprises the steps of:

respectively inputting first action data and second action data in the supervised action data group and the unsupervised action data group into the basic action instruction judgment model to obtain predicted instruction content categories corresponding to the supervised action data, the first action data and the second action data respectively;

filtering supervised action data of which the numerical value of the predicted instruction content class is larger than a preset numerical value threshold value in each supervised action data; the predicted instruction content category comprises probabilities corresponding to a target category and a non-target category respectively;

In each piece of rest supervised action data, obtaining a first cost according to the deviation between the predicted instruction content category corresponding to the same supervised action data and the supervised instruction content category, and obtaining a category cost according to the deviation between the predicted instruction content categories respectively corresponding to the first action data and the second action data in the unsupervised action data set;

performing cross entropy calculation according to each probability contained in the same prediction instruction content category to respectively obtain cross entropy corresponding to each unsupervised action data set;

obtaining information entropy cost cross entropy cost according to the cross entropy corresponding to each unsupervised action data set;

obtaining a second cost according to the category cost and the cross entropy cost;

updating model structure parameters of the basic action instruction judgment model according to the first cost and the second cost until the model structure parameters accord with a preset cost parameter function, and obtaining the target instruction content action instruction judgment model;

the method further comprises the steps of:

acquiring a business linkage knowledge graph; the business linkage knowledge graph is generated according to a preset business linkage data set, wherein the preset business linkage data set comprises business linkage data identifiers of business linkage data corresponding to each business scene respectively corresponding to the business scene database; the service description comprises a service characteristic which is different from each other and is associated with the service scene database and the preset service linkage data set;

When the pairing of the action data corresponding to the current user action and the business knowledge graph is unsuccessful, the action data corresponding to the current user action and the business linkage knowledge graph are paired to obtain each target business linkage data associated with the current user action;

and taking the service scene corresponding to each target service linkage data as each current service scene associated with the current user action.

10. A server system comprising a server for performing the method of any of claims 1-9.