CN116820250B

CN116820250B - User interaction method and device based on meta universe, terminal and readable storage medium

Info

Publication number: CN116820250B
Application number: CN202311093781.9A
Authority: CN
Inventors: 胡方扬; 魏彦兆; 唐海波
Original assignee: Xiaozhou Technology Co ltd
Current assignee: Xiaozhou Technology Co ltd
Priority date: 2023-08-29
Filing date: 2023-08-29
Publication date: 2023-11-17
Anticipated expiration: 2043-08-29
Also published as: CN116820250A

Abstract

The embodiment of the invention provides a user interaction method, device, terminal and readable storage medium based on a meta-universe, and belongs to the technical field of the meta-universe. The method comprises the following steps: constructing an interaction mode library and acquiring user data of at least one target object, wherein the interaction mode library comprises at least one interaction mode; detecting state information corresponding to a target object based on user data; determining the triggering probability corresponding to each interaction mode according to the state information; generating an interaction strategy corresponding to the target object based on the triggering probability corresponding to each interaction mode; and executing corresponding interaction operation according to the interaction strategy. According to the method, the user data of the target object is utilized to analyze, so that the interaction strategy corresponding to the target object is determined, the problem of low user experience in the prior art is solved, the most natural and matched social mode is recommended for the target object, smooth interaction experience is realized, and the user experience is improved.

Description

User interaction method and device based on meta universe, terminal and readable storage medium

Technical Field

The invention relates to the technical field of meta-universe, in particular to a user interaction method, device, terminal and readable storage medium based on meta-universe.

Background

The meta universe is used as an emerging artificial intelligence technology after virtual reality, the immersion experience of the whole body and mind of a user in a virtual scene is realized through means of holographic projection, brain-computer interfaces and the like, and an ultra-reality three-dimensional virtual world is created. In contrast to virtual reality technology, the meta universe has no visual angle limitation, and users can freely move and interact. This makes it more suitable for achieving a clustered and socialized experience.

Taking a bar as an example, an important social place carries rich social interaction, cultural exchange and leisure and entertainment functions. The virtual representation of the digital bar can be an ideal choice for the metauniverse platform to realize complex social interactions and cultural communication. However, the virtual bar scene established in the existing meta-universe virtual world cannot efficiently realize effective interaction and cooperation of different groups, so that each user participating in the virtual world cannot obtain an immersive bar experience, and the user requirements cannot be effectively met.

Disclosure of Invention

The invention mainly aims to provide a meta-universe-based user interaction method, a meta-universe-based user interaction device, a meta-universe-based user interaction terminal and a meta-universe-based user interaction readable storage medium, and aims to solve the problems that users cannot effectively interact under a virtual scene established by a virtual world in the prior art, so that the users cannot obtain the experience of being in the scene, and the user experience is reduced.

In a first aspect, an embodiment of the present invention provides a meta-universe-based user interaction method, including:

constructing an interaction mode library and acquiring user data of at least one target object, wherein the interaction mode library comprises at least one interaction mode;

detecting state information corresponding to a target object based on the user data;

determining the triggering probability corresponding to each interaction mode according to the state information;

generating an interaction strategy corresponding to the target object based on the triggering probability corresponding to each interaction mode;

and executing corresponding interaction operation according to the interaction strategy.

In a second aspect, an embodiment of the present invention provides a meta-universe-based user interaction device, including:

the data acquisition module is used for constructing an interaction mode library and acquiring user data of at least one target object, wherein the interaction mode library comprises at least one interaction mode;

the data processing module is used for detecting state information corresponding to a target object based on the user data;

the data analysis module is used for determining the triggering probability corresponding to each interaction mode according to the state information;

the strategy determining module is used for generating an interaction strategy corresponding to the target object based on the triggering probability corresponding to each interaction mode;

And the strategy execution module is used for executing corresponding interaction operation according to the interaction strategy.

In a third aspect, an embodiment of the present invention further provides a terminal device, where the terminal device includes a processor, a memory, a computer program stored on the memory and executable by the processor, and a data bus for implementing a connection communication between the processor and the memory, where the computer program, when executed by the processor, implements the steps of any of the metauniverse-based user interaction methods provided in the present specification.

In a fourth aspect, an embodiment of the present invention further provides a storage medium, for storing a computer readable storage, where the storage medium stores one or more programs, where the one or more programs are executable by one or more processors to implement any of the steps of the metauniverse-based user interaction method as provided in the present specification.

The embodiment of the invention provides a user interaction method, a device, a terminal and a readable storage medium based on a meta universe, wherein the method comprises the steps of constructing an interaction mode library after a virtual environment corresponding to the meta universe is built, wherein the interaction mode library comprises at least one interaction mode such as clapping and beating; further obtaining user data of a target object in the virtual environment; detecting and analyzing according to the user data to obtain state information corresponding to the target object, and determining triggering probability of the target object in the interaction mode library corresponding to each interaction mode according to the state information; and finally, generating an interaction strategy corresponding to the target object based on the triggering probability corresponding to each interaction mode, and executing corresponding interaction operation on the target object according to the interaction strategy, so that the problem that users cannot effectively interact under a virtual scene established by a virtual world in the prior art, further, the users cannot obtain the experience of being in the scene, and the user experience is reduced is solved. The state information of the target object is obtained through analysis, the triggering probability corresponding to each interaction mode is obtained according to the state information, and then the target interaction mode matched with the target object under the state information is obtained, so that the target object can effectively interact, more real experience is obtained, and user satisfaction and experience sense are improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a user interaction method based on meta-universe provided by an embodiment of the application;

FIG. 2 is a flow chart of sub-step S104 of the meta-universe based user interaction method of FIG. 1;

FIG. 3 is a schematic diagram of a module of a meta-universe-based user interaction device according to the present embodiment;

fig. 4 is a schematic block diagram of a structure of a terminal device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The flow diagrams depicted in the figures are merely illustrative and not necessarily all of the elements and operations/steps are included or performed in the order described. For example, some operations/steps may be further divided, combined, or partially combined, so that the order of actual execution may be changed according to actual situations.

It is to be understood that the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

The embodiment of the application provides a user interaction method, device, terminal and readable storage medium based on meta universe. The meta-universe-based user interaction method can be applied to terminal equipment, wherein the terminal equipment can be a tablet personal computer, a notebook personal digital assistant, a wearable device or a server, and the server can be an independent server or a server cluster.

The embodiment of the application provides a user interaction method, a device, a terminal and a readable storage medium based on a meta universe, wherein the method comprises the steps of constructing an interaction mode library after a virtual environment corresponding to the meta universe is built, wherein the interaction mode library comprises at least one interaction mode such as clapping and beating; further obtaining user data of a target object in the virtual environment; detecting and analyzing according to the user data to obtain state information corresponding to the target object, and determining triggering probability of the target object in the interaction mode library corresponding to each interaction mode according to the state information; and finally, generating an interaction strategy corresponding to the target object based on the triggering probability corresponding to each interaction mode, and executing corresponding interaction operation on the target object according to the interaction strategy, so that the problem that users cannot effectively interact under a virtual scene established by a virtual world in the prior art, further, the users cannot obtain the experience of being in the scene, and the user experience is reduced is solved. The state information of the target object is obtained through analysis, the triggering probability corresponding to each interaction mode is obtained according to the state information, and then the target interaction mode matched with the target object under the state information is obtained, so that the target object can effectively interact, more real experience is obtained, and user satisfaction and experience sense are improved.

Some embodiments of the invention are described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.

The meta universe is used as an emerging artificial intelligence technology after virtual reality, the immersion experience of the whole body of a user is realized through means of holographic projection, brain-computer interfaces and the like, and an ultra-reality three-dimensional virtual world is created. In contrast to virtual reality technology, the meta universe has no visual angle limitation, and users can freely move and interact. This makes it more suitable for achieving a clustered and socialized experience.

On the other hand, the bar is an important social place and carries rich social interaction, cultural communication and leisure and entertainment functions. The virtual representation of the digital bar can be an ideal choice for the metauniverse platform to realize complex social interactions and cultural communication. However, how to construct a virtual bar scene with complete functions in a meta-universe environment, so as to realize effective interaction and collaboration of different groups, and enable each user to obtain an immersive bar experience, which is still a great technical problem.

Referring to fig. 1, fig. 1 is a flow chart of a user interaction method based on meta-universe according to an embodiment of the present invention.

As shown in fig. 1, the meta-universe-based user interaction method includes steps S101 to S105.

Step S101, an interaction mode library is built and user data of at least one target object is obtained, wherein the interaction mode library comprises at least one interaction mode.

Illustratively, after determining the metauniverse-based virtual scenario, a library of interaction patterns matching the virtual scenario is determined from the virtual scenario. And further collecting the interactive video data corresponding to the current virtual scene, wherein the interactive video data comprises different interaction groups, interaction depths and interaction processes under the interaction scene (such as normal performance scene, climax scene and the like). The interaction group comprises the step of collecting interaction videos of different groups, such as young groups, middle-aged groups, family groups and the like, in various virtual reality scenes, wherein the interaction modes, the communication contents and the interaction frequencies of the different groups are different; the interaction depth comprises the steps of collecting interaction videos with different intimacy degrees such as nodding, kissing, gigging, hugging, and the like, wherein the higher the interaction depth is, the richer the expressed emotion intention is; the normal performance scene comprises the steps of collecting audience interaction, such as clapping, singing, lifting hands and the like, in the normal performance process so as to learn the natural interaction mode and time of the audience in the scene; the climax scene comprises high interaction of spectators, such as jumping, shouting, hugging and the like, when the collected performances reach the climax, and the climax interaction can drive the whole-field atmosphere to achieve a certain interaction synchronization effect.

Optionally, an interaction mode space supporting the complex interaction scene is established by using a deep learning technology, and the intrinsic relations among different interaction groups, interaction depths and interaction scenes are mapped, so that an interaction mode library is formed.

For example, the group mode space reflects the difference and the association of interaction modes among different groups through clustering and mapping the representation of different interaction groups; the depth mode space reflects the common characteristics of high-depth interaction by gathering the characterization representing different interaction affinities according to different interaction depths; the scene mode space represents the representation of interaction modes under different interaction scenes by gathering, so that the inherent relevance of interaction under the same scene is embodied. After the three mode spaces are constructed, cross-space mapping is further established, corresponding relations among different interaction variables (such as groups, depths and scenes) are revealed, and the tendency and strategies of group interaction are summarized on the basis.

In an exemplary embodiment, the target object and the interaction mode corresponding to the target object are determined, when the interaction mode is single interaction, only the user data of the target object is required to be obtained, and when the interaction mode is multi-person interaction, the user data of the target object is required to be obtained, and meanwhile, the user data corresponding to other target objects possibly related to the interaction mode is required to be obtained.

For example, when the interaction mode is clapping or shouting, the user data of the target object itself only needs to be obtained at this time to support the subsequent function execution, but when the interaction mode is kissing, clapping, hugging, nodding and deliberate, the user data respectively corresponding to the target object and other target objects related to executing the interaction mode needs to be obtained, so as to determine whether the target object can execute the interaction mode with other target objects.

If the interaction strategy corresponding to the first target object is judged at this time, first user data corresponding to the first target object needs to be obtained, and then the possibility that the first target object shouts or claps the performer is obtained by analyzing the first user data; if the interaction strategy between the first target object and the second target object needs to be judged, such as kissing, hugging and the like, the first user data corresponding to the first target object and the second user data corresponding to the second target object need to be obtained respectively.

Step S102, detecting state information corresponding to the target object based on the user data.

Illustratively, the state information of the target object at the current plurality of places is determined by analysis according to the user data.

For example, the user data is obtained by collecting real-time interaction data of the user and the virtual environment, and then the user data is identified to obtain group characteristics, calculate interaction time and frequency, judge environment types and the like, and then various interaction elements in the scene are comprehensively tracked, so that the current state information of the target object is judged.

In an embodiment, the user data includes expression information, voice information, physiological information and/or distance information, and the detecting state information corresponding to the target object based on the user data includes: facial image data and voice data corresponding to the target object are obtained, feature analysis is carried out on the facial image data to obtain expression information, and voice analysis is carried out on the voice data to obtain voice information; collecting physiological signals corresponding to the target object, and performing signal analysis on the physiological signals to obtain physiological information corresponding to the target object; and/or calculating position information between the target objects and determining the distance information according to the position information; determining a state matching degree corresponding to the target object based on the expression information, the voice information, the physiological information and/or the distance information, and further determining the state information according to the state matching degree; wherein, calculate the state matching degree of the goal object according to the following formula: m=w1×s1+w2×s2+ + wn×sn, where i=1, 2,..n, wi represents a scoring item weight, and Si represents a scoring item score calculated from expression information, voice information, physiological information, and/or distance information corresponding to the target object.

For example, the target object is mounted with an imaging device when participating in the virtual scene, and the imaging device captures a face image of the target object to obtain the face image. And a sound receiving device is arranged for collecting the speaking content of the target object, so that the speaking content of the target object is collected according to the sound receiving device to obtain voice data.

For example, when a target object participates in a virtual scene, an RGB camera is mounted on the head of the target object, and a face image of the target object is captured according to a capturing frequency to obtain face image data. And classifying the facial image data by using the expression classification model to obtain expression information corresponding to the facial image data.

For example, the capture frequency is 30 frames per second, and then the face image data is obtained by capturing the face image of the target object at a frequency of 30 frames per second from the RGB camera.

Illustratively, the expression classification model includes pleasure, happiness, difficulty, sadness and the like, so that the facial image data is classified according to the expression classification model to obtain expression information corresponding to the facial image data. And (3) pre-training according to a Facial Landmarks in the Wild (FLW) data set to obtain a facial feature point extraction model, and further carrying out feature point detection on the facial image data according to the facial feature point extraction model to obtain corresponding facial feature points in the facial image data. After obtaining the facial image data of the target object, screening the facial image at the moment t and the facial image at the moment t+1, extracting t moment characteristic points corresponding to the facial image at the moment t and t+1 moment characteristic points corresponding to the facial image at the moment t+1, and comparing the t moment characteristic points with the t+1 moment characteristic points, so that expression information corresponding to the target object is determined according to a comparison result.

For example, the characteristic point change mode is to compare the characteristic points of the face image of the target object at the 5 th second with the characteristic points of the face image at the 6 th second, and if the eyebrow rises by more than 3 points and the mouth angle rises by more than 5 points, the facial expression is judged to be a pleasant expression; if the eyes are opened more than 10 points and the mouth is opened more than 8 points, the expression of surprise is judged.

Optionally, the target objects include, but are not limited to, users for whom an interaction policy is to be determined and other users associated with the user interaction pattern.

For example, the BERT (Bidirectional Encoder Representation from Transformers) model is used to encode the voice data of the target object at the t moment, so as to obtain a 768-dimensional voice feature vector as the t moment feature vector, and calculate the feature vector corresponding to the pre-set time of the target object at the t moment, that is, the cosine similarity between the t-N moment feature vector and the t moment feature vector to obtain a similarity result, and then determine the language information corresponding to the target object according to the similarity result.

For example, the cosine similarity between the current feature vector and the feature vector before 5 seconds is calculated, and if the cosine similarity is smaller than 0.6, the voice feature is judged to have a large change, and the voice also shows excitation signs.

By way of example, the physiological detection device is installed on the target object, so that the physiological signal corresponding to the target object is collected, and the corresponding physiological information is obtained by analyzing the physiological signal.

For example, the physiological detection device is a heart rate monitoring device, and further, physiological signals corresponding to the target object are collected according to the heart rate monitoring device, and further, physiological information corresponding to the target object is determined according to heart rate variation of the target object.

For example, an electrocardiograph, a skin electrical activity response, an acceleration, and brain electrical signal data of the target object, in particular, gamma brain waves and theta brain wave changes, are acquired by using an Empatica E4 wearable device, and then the electrocardiograph information, the skin electrical activity response information, the acceleration information, the gamma brain wave information, and the theta brain wave information of the target object are used as physiological signals of the target object. And carrying out mean value or derivation processing on the information in the physiological signals so as to obtain the change rate corresponding to each physiological signal in the physiological signals, thereby classifying and determining the physiological information corresponding to the target object according to the change rate.

For example, denoising and extracting R peak characteristics of an electrocardiogram signal by using wavelet transformation, and judging that the heart rate is increased if the R-R interval is reduced by more than 15%; responding to the skin electric activity response signal, and judging that the skin electric activity is increased if the signal amplitude is increased by more than 20 micro volts within 1 second; for the brain electrical signals, focusing on gamma wave frequency band (25-100 Hz), if the relative power of gamma wave increases by more than 15% in 1 second, judging the sign of high excitation.

For example, when it is required to determine whether the target object interacts with other objects, distance information between the first target object and the second target object needs to be obtained, and taking the first position of the first target object in the target image and the second position of the second target object in the target image as an example, and further distance information between the first target object and the second target object is determined according to the first position and the second position; calculating distance information between the first target object and the second target object according to the following formula:

wherein d is distance information,representing the position information of the first target object corresponding to the t+1st frame target image, +.>Representing the position information of the first target object corresponding to the target image of the t frame,representing the position information of the second target object corresponding to the target image of the t+1st frame,and representing the position information of the second target object corresponding to the target image of the t frame.

For example, a continuous target image or video stream is acquired, which includes two or more interactive users, for example a first target object and a second target object. With a certain time interval between the target images (e.g. every 0.5 secondsOne frame) may be acquired using an RGB camera or a depth camera. And detecting and identifying the positions of the first target object and the second target object on every two frames of images, and obtaining the position information of the first target object and the second target object under an image pixel coordinate system. Among other things, the user location may be identified using a target detection algorithm, such as YOLO (You Only Look Once), SSD (Single Shot MultiBox Detector), etc. If the position of the first target object on the t-frame target image is The position of the second target object on the t-th frame target image is +.>. On the t+1st frame target image, the position of the first target object is +.>The position of the second target object is +.>. Thus, the distance information between the first target object and the second target object can be obtained as

In addition, d is obtained as distance information between the first target object and the second target object in the virtual environment, if distance information between the first target object and the second target object in the real bar environment is obtained, a size scaling relationship k between the image and actual size information of the first target object or the second target object (such as height of human body) can be calculated according to known size information of the first target object or the second target object in the target image, for example, the height of human body of the target object in the target image is H pixels, the actual height of human body of the target object is H meters, then the size scaling relationship is k=h/H, and further the distance information of the first target object and the second target object in the actual physical space is k×d.

In addition, in order to consider and calculate the moving distance of the first target object and the second target object during the image acquisition interval, so that the final distance information calculation result is closer to the distance information between the first target object and the second target object during the actual interaction, the distance information can be obtained in an accumulated manner for multiple times, and then the average value is used for calculating the distance information of the first target object and the second target object at the current moment.

Illustratively, the single person interaction mode and the multi-person interaction mode are included in the interaction mode library, such as shouting, clapping and the like, only the user information of the first target object is needed, but the multi-person interaction mode at least needs the user information of the first target object and the second target object, and the distance information is not needed to be considered in the single person interaction mode, but the distance information is needed to be considered in the multi-person interaction mode, such as kissing, hugging and the like. Therefore, a state matching degree calculating method for matching according to different selection of the interaction modes is required.

For example, in the case of a single interaction mode such as shouting or clapping, the state matching degree m=w1×s1+w2×s2+w3×s3, where W1 represents a scoring item weight corresponding to expression information, W2 represents a scoring item weight corresponding to speech information, W3 represents a scoring item weight corresponding to physiological information, S1 represents a scoring item score calculated from expression information, S2 represents a scoring item score calculated from speech information, and S3 represents a scoring item score calculated from physiological information.

For example, in the case of a multi-user interaction mode such as kissing, hugging, etc., the state matching degree m=w1×s1+w2×s2+w3×s3+w4×s4, where W1 represents a scoring item weight corresponding to expression information, W2 represents a scoring item weight corresponding to speech information, W3 represents a scoring item weight corresponding to physiological information, W4 represents a scoring item weight corresponding to distance information, S1 represents a scoring item score calculated from expression information, S2 represents a scoring item score calculated from speech information, S3 represents a scoring item score calculated from physiological information, and S4 represents a scoring item score calculated from distance information.

Wherein W1, W2, W3 and/or W4 are scoring item weights of the target object in the state matching degree acquisition, the scoring item weights are set according to the importance of the scoring items,the greater the weight means the greater the contribution of the scoring term to the degree of matching. The sum of the weights is 1.S is S ₁ 、S ₂ 、S ₃ And/or S ₄ The related information of the user object can be quantized to the point that the score is 0-1 as the user information of the target object.

For example, the state matching degree when the interaction mode is shouting is calculated by using the score 1 as expression information and the weight coefficient W ₁ 0.3, S ₁ 0.8 minutes; score term 2 language information, weight coefficient W ₂ 0.2, S ₂ 0.5 minutes; score term 3 physiological information, weight coefficient W ₃ 0.5, S ₃ The score of 1 is then obtained, and m=0.3×0.8+0.2×0.5+0.5×1=0.84, so that it can be obtained that the physiological information in the above example is taken as the most important scoring term (the weight is the largest), and the score directly determines the higher value of the state matching degree M. The method combining the mode and the weight can well describe the fit relation between the mode and the current interaction state, and provides a judgment basis for the recommendation of the mode.

In addition, the information of the target object includes, but is not limited to, expression information, language information, physiological information, and distance information, and can be adjusted according to actual requirements, which is not particularly limited in the present application.

Step S103, determining the triggering probability corresponding to each interaction mode according to the state information.

The state information includes expression information, language information, physiological information and/or distance information, and is represented by feature vectors, and then the feature fusion is used for inputting an interaction mode classification model, so that the triggering probability corresponding to each interaction mode is obtained.

In some embodiments, the determining the trigger probability corresponding to each interaction mode according to the state information includes: calculating the triggering probability corresponding to each interaction mode according to the following formula: p=α×m± (1- α) ×t, where P represents a trigger probability corresponding to each interaction mode, α represents a weight parameter corresponding to each interaction mode, M represents a state matching degree corresponding to each interaction mode, T represents a threshold value corresponding to each interaction mode, and when the threshold value is proportional to the trigger probability, a +number is adopted, and when the threshold value is inversely proportional to the trigger probability, a-number is adopted; calculating a threshold value corresponding to each interaction mode according to the following formula: t=mc/Mm, where Mc represents an actual parameter value corresponding to the target object under the triggering condition threshold corresponding to each interaction mode, and Mm represents an ideal matching threshold corresponding to each interaction mode.

For example, different interaction modes are included in the interaction mode library, and each interaction mode may further set a trigger threshold to indicate how easily the interaction mode is triggered in a certain interaction scene.

Alternatively, the trigger threshold may be set by an expert or obtained through deep learning, for example, the trigger threshold may be a corresponding trigger difficulty index set for each interaction mode according to expert experience, which is used as a reference basis for determining Mm later. The trigger threshold is generally expressed in an interval mode, for example, when the interaction mode is hug, the trigger threshold can be set to be in a range of [0,1], and the heart rate is increased by [10%,50% ], so that the distance information Mm is determined to be 1, and the Mm corresponding to the heart rate information is increased by 10%.

The trigger threshold setting of the interactive mode is different, the higher the threshold is, the more severe the condition required by the interactive mode to be activated is, and the more difficult the interactive mode to be triggered is. The threshold setting reflects how difficult the interaction mode is to be triggered.

For example, the interaction mode is nodding and is intended, the trigger threshold of the interaction mode is lower, the interaction mode can be activated at a long distance, the main condition is that the sights of the two parties are intersected and have friendly signals such as face exposure smile, the threshold can be set as the sight interaction time to be more than 2s, and smile is detected; the interaction mode is hug, the mode triggering threshold is higher, higher affinity and interaction depth are needed, the main conditions are that the distance is closer, the excitation release or celebration emotion exists, the body contact intention exists and the like, the threshold can be set to be less than 1m, the heart rate is increased by more than 10%, the arms are opened and placed and the like; the interaction mode is kissing, the interaction mode triggers the threshold highest, requires extremely high emotion intention and interaction depth, and the main conditions are hug gesture, eye communication height intimacy, head approach and the like, and the threshold can be set to simultaneously meet hug threshold elements, eye communication >80% and head distance <30cm and the like; the interaction mode is a clapping palm, the triggering threshold of the interaction mode is moderate, a certain friendly interaction intention is needed, the main conditions are that an arm is lifted, the palm is opened, the sight is focused on the palm of the other party, and the like, and the threshold can be set to be an arm lifting angle of more than 60 degrees, palm recognition, sight concentration of more than 50 percent, and the like; the interaction mode is clapping, the triggering threshold of the mode is lower, the mode is easy to activate when watching performance, the main condition is that palm sound audio is detected or arms of a user are put to move at a certain frequency, and the threshold can be set to be that the arm movement frequency is 15 times per minute; the interaction mode is shouting, the difficulty of triggering the threshold of the interaction mode is general, obvious performance interaction intention is needed, the main condition is that shouting voice or opening of the mouth shape of a user is detected, and the threshold can be set to detect the shouting voice and match a certain volume or opening of the mouth shape for more than 3 seconds; the interactive mode is follow singing, the trigger threshold of the interactive mode is higher, higher performance interactive intention and emotion experience are needed, the main condition is that the singing voice of the user is detected and matched with the performance lyric rhythm, and the threshold can be set to be that the singing voice of the user is detected for more than 10 seconds and matched with the performance lyric rhythm; the interaction mode is a lifting mode, the triggering threshold difficulty of the interaction mode is general, basic performance interaction intention is needed, the main condition is that one or both hands of a user are detected to lift the top of the head, the threshold can be set to be used for detecting that the lifting angle of the arm of the user is larger than 135 degrees and lasts for more than 3 seconds, and the like.

The method includes the steps of obtaining data corresponding to a triggering condition threshold corresponding to a target object in an interaction mode, and comparing the data corresponding to the triggering condition threshold with an ideal matching threshold in the interaction mode to obtain a corresponding threshold.

For example, taking an interaction mode as a hug, assuming that a threshold is set as distance information, obtaining distance information Mc of a first target object and a second target object in a virtual scene, and obtaining an ideal matching threshold (preset distance information) Mm corresponding to the interaction mode from an interaction mode library, and further using a ratio between the distance information Mc and the preset distance information Mm as a triggering condition or a triggering threshold corresponding to the interaction mode. Wherein the closer the distance information between the first target object and the second target object is, the larger the trigger probability is, i.e. the threshold value is inversely proportional to the trigger probability, then p=α×m- (1- α) ×t.

For example, taking an interaction mode as a nodding example, assuming that a threshold is set as an interaction time, the interaction time Mc of the first target object and the second target object in the virtual scene is obtained, and an ideal matching threshold (preset interaction time) Mm corresponding to the interaction mode is obtained from an interaction mode library, and then the triggering condition or the triggering threshold corresponding to the interaction mode is used according to the ratio between the interaction time Mc and the interaction time Mm. Wherein the longer the interaction time between the first target object and the second target object, the larger the trigger probability, i.e. the threshold value is proportional to the trigger probability, p=α×m+ (1- α) ×t.

Optionally, when setting the triggering condition or the triggering threshold corresponding to the interaction mode, the triggering condition or the triggering threshold can be set according to the distance information, can also be set as a threshold value of indexes such as interaction time, emotion intensity and the like between target objects, and can also be set as an arm lifting angle of >60 degrees.

The method includes the steps of obtaining a weight parameter α corresponding to each interaction mode set between a state matching degree and a triggering condition in the interaction mode, and further obtaining a triggering probability of a target object in the interaction mode according to p=α×m± (1- α) ×t.

For example, the different interaction modes have different degrees of matching with the interaction state of the current scene, and the higher the state matching degree is, the greater the triggering probability of the interaction mode is. Because a high state match indicates that the interaction mode is well suited to be activated in the current interaction state.

In addition, a series of interaction modes which are highly matched with the current interaction mode can be searched in the interaction mode library when the triggering probability of the interaction mode is calculated. Modes with too high trigger conditions are filtered out, and modes with moderate trigger conditions are selected. And then dividing the selected modes into different clusters according to the association degree, and finding out which mode clusters are most matched with the current virtual scene. The patterns in the cluster are used as a matching pattern set, and can drive the subsequent virtual character interaction strategy.

Wherein alpha is a weight parameter, and alpha is more than or equal to 0 and less than or equal to 1, and is used for balancing the influence degree of two factors. When α approaches 1, this means that the state matching M has a greater influence on the probability calculation. When α approaches 0, the influence of the threshold value T is greater. The selection of α requires a trade-off between specific application scenarios and data characteristics.

In some embodiments, before determining the trigger probability corresponding to each interaction mode according to the state information, the method further includes: calculating the state matching degree corresponding to each interaction mode according to the following formula: m (t) =m+β×m (t-1), where M represents the state matching degree calculated according to the user data at the time t, and M (t-1) represents the state matching degree at the time t-1; and beta represents a time attenuation factor and is used for representing the influence degree of the state matching degree M (t-1) corresponding to the t-1 moment on the t moment.

For example, the state matching degree of the target object corresponding to each interaction mode at the time t is related to not only expression information, voice information, physiological information and/or distance information at the time t, but also the state matching degree corresponding to the time t-1.

For example, if the interaction mode is shouting, the state matching degree is related to expression information, voice information and physiological information, and if the state matching degree of the target object in the interaction mode is shouting at time t is M (t) =w1×s1 (t) +w2×s2 (t) +w3×s3 (t) +β×m (t-1), where M (t) is the state matching degree of time t, s1 (t) is the feature matching degree corresponding to expression information at time t, s2 (t) is the voice feature matching degree corresponding to voice information at time t, s3 (t) is the physiological feature matching degree corresponding to physiological information at time t, w1, w2, w3 are the feature weights corresponding to expression information, voice information and physiological information, respectively, and β is a time attenuation factor, so as to control the influence degree of the matching degree M (t-1) of the previous time t-1 on the current time t.

The sign of the time t indicates that the calculation is performed at a certain time, and the matching degree M (t-1) of the previous time is introduced, and the influence of the matching degree M on the current time is controlled by beta. Thus, the continuity and the dependence of the matching degree along with time can be modeled, the situation that each calculation is independent and discontinuous large fluctuation occurs is avoided, and the fusion of the historical information can help to improve the accuracy of the current calculation.

Alternatively, the time interval t may be set to be preferably 5-10 seconds, or a variable time interval t may be set according to different interaction modes, scenes and users. Shorter intervals may be selected at the beginning and then suitably lengthened step by step. The application is not particularly limited, and can be set according to the requirements.

In some embodiments, the at least one target object is a first target object and a second target object, the determining a trigger probability corresponding to each interaction mode according to the state information, the method further includes: when the interaction mode is a multi-person interaction type, acquiring a first trigger probability corresponding to the first target object and a second trigger probability corresponding to the second target object; and carrying out fusion analysis according to the first trigger probability and the second trigger probability to obtain target trigger probabilities corresponding to the first target object and the second target object in the interaction mode.

For example, when the interaction mode is a multi-person interaction mode, such as hug, kissing, clapping, or the like, the target object includes a first target object and a second target object, and in order to ensure the will of both parties in the interaction mode, the first trigger probability corresponding to the first target object and the second trigger probability corresponding to the second target object need to be obtained by using the above method, so as to fuse the first trigger probability and the second trigger probability and further determine the target trigger probability corresponding to the first target object when the first target object interacts with the second target object in the interaction mode, or the target trigger probability corresponding to the second target object when the second target object interacts with the first target object in the interaction mode.

For example, if the interaction mode is hug, according to the above manner, the first trigger probability corresponding to the first target object is p1, and the second trigger probability corresponding to the second target object is p2, then the average value of the first trigger probability p1 and the second trigger probability p2 may be calculated to be the target trigger probability, and further the trigger probability corresponding to the hug of the first target object in the interaction mode library is (p1+p2)/2, and similarly the trigger probability corresponding to the hug of the second target object in the interaction mode library is (p1+p2)/2.

In addition, when the target trigger probability is calculated, different trigger probabilities can be given corresponding weights for adjusting the intention degree of different target objects, so that the user requirements are met more.

Step S104, generating an interaction strategy corresponding to the target object based on the triggering probability corresponding to each interaction mode.

When the trigger probability corresponding to each interaction mode is obtained, the trigger probabilities are sequenced to obtain the interaction mode corresponding to the maximum trigger probability, and the interaction strategy corresponding to the target object is generated according to the interaction mode.

In some embodiments, the generating the interaction policy corresponding to the target object based on the triggering probability corresponding to each interaction mode, specifically referring to fig. 2, step S104 includes: substep S1041 to substep S1044.

And step S1041, sorting according to the triggering probability corresponding to each interaction mode in the interaction mode library to obtain an interaction mode set corresponding to the maximum triggering probability, wherein the interaction mode set is used for storing the interaction mode corresponding to the maximum triggering probability.

The method includes the steps of sequencing triggering probabilities corresponding to interaction modes from high to low, obtaining a maximum value corresponding to the triggering probabilities, and forming an interaction mode set by the interaction modes corresponding to the maximum value, wherein the number of the interaction modes in the interaction mode set may be 1 or more.

In the substep S1042, when the number of the interaction modes in the interaction mode set is equal to 1, determining the corresponding interaction mode in the interaction mode set as the target interaction mode.

For example, if the number of interaction modes in the interaction mode set is 1, the interaction mode indicating the highest trigger probability is ranked first. The matching degree of the interaction mode and the current interaction state is highest, and the threshold condition is easy to achieve, so that the corresponding interaction mode with the highest triggering probability is considered most preferably, and the interaction mode is determined as the target interaction mode.

And step S1043, when the number of the interaction modes in the interaction mode set is greater than or equal to 2, obtaining a threshold value corresponding to each interaction mode in the interaction mode set, and determining a target interaction mode according to the threshold value.

For example, when the number of the interaction modes in the interaction mode set is greater than or equal to 2, it indicates that the plurality of interaction modes have the same maximum trigger probability value, and the difficulty level of the trigger threshold needs to be considered. The mode with lower threshold should be prioritized because the condition is easier to be reached, and can be triggered to push interaction as soon as possible, so that the target interaction mode can be determined according to the difficulty level reached by the threshold.

In some embodiments, after obtaining the threshold value corresponding to each interaction mode in the interaction mode set when the number of interaction modes in the interaction mode set is greater than or equal to 2, the method further includes: acquiring corresponding interaction characteristics of the target object in an interaction scene, and determining scene matching degree of the interaction mode and the interaction scene according to the interaction characteristics; acquiring a historical trigger frequency corresponding to the interaction mode and a trigger time interval corresponding to the interaction mode, determining a frequency matching degree corresponding to the interaction mode according to the historical trigger frequency, and determining a time interval matching degree corresponding to the interaction mode according to the trigger time interval; determining a scene fitting degree corresponding to the interaction mode according to the scene matching degree, the frequency matching degree and the time interval matching degree, and taking the interaction mode corresponding to the maximum value of the scene fitting degree as a target interaction mode; and calculating the scene fitting degree corresponding to each interaction mode according to the following steps:

Score(t) =γ*F(t) +δ*D(t) +η*H(t)

wherein F (t) represents scene matching degree corresponding to the interaction mode at the moment t, D (t) represents frequency matching degree corresponding to the interaction mode at the moment t, H (t) represents time interval matching degree corresponding to the interaction mode at the moment t, theta represents weight parameter corresponding to the trigger probability, gamma represents weight parameter corresponding to the scene matching degree, delta represents weight parameter corresponding to the frequency matching degree, and eta represents weight parameter corresponding to the time interval matching degree.

For example, if the trigger probability is the same as the threshold condition, and the number of the interaction modes in the interaction mode set is still greater than or equal to 2, then the fitting degree between each interaction mode and the current interaction scene needs to be further determined. The mode with higher degree of fit with the scene is arranged in front to ensure consistency and suitability of interaction. Scene fitting degree refers to matching and suitability degree of an interaction mode and a current virtual scene or interaction state. In other words, whether a certain interaction mode is matched and coordinated with elements, atmosphere, theme, etc. of the scene, and whether the interaction mode accords with the expectations and interaction habits of the user in the scene. If an interactive mode can be naturally embedded in the current scene and is identified and actively triggered by a user, the scene fitting degree of the interactive mode is high.

For example, in a virtual bar scene, the scene fitness of interaction modes such as 'cup' and 'boring' and 'jingjiu' is relatively high; the scene fitting degree of the interaction modes such as driving invitation, going out, moving together and the like is low, and the scene fitting degree is not in line with scene characteristics and user expectations. Therefore, when the interactive mode is recommended in the scene, the mode with higher scene fitting degree should be prioritized.

The method includes the steps of obtaining corresponding interaction characteristics, such as places, participating users, topics and the like, of an interaction scene where a target object is located, analyzing the interaction characteristics of a current interaction scene, and calculating the matching degree of each interaction mode and the current scene, wherein the higher the matching degree is, the larger the scene matching degree F (t) is.

For example, the feature expression of the interactive scene corresponding to each interactive mode in the interactive mode library is obtained, the expressed vector is used as a first expression result of the interactive scene in the interactive mode, the feature of the current interactive scene is obtained, the vector expression is carried out to obtain a second expression result of the current interactive scene, the similarity between the first expression result and the second expression result is calculated, the maximum value of the similarity is mapped to a 0-1 interval, the similarity is converted into scene matching degree F (t) =a+sim+b, a and b are scaling parameters, and the value domain is controlled to be mapped to 0-1, so that the scene matching degree is obtained.

For example, for each possible interaction scenario, representative features are extracted, such as places (indoor/outdoor, formal/leisure, etc.), participating user identities (friends/strangers, etc.), discussion topics (formal/leisure, etc.), and these features are represented as vectors using one-hot coding or word vectors, etc.; for each interaction mode, extracting a corresponding feature vector according to a suitable scene thereof, utilizing the feature dimensions related to the interaction mode, such as places, time, user relations, topics and the like, and analyzing the suitable scene value of each interaction mode, such as boring modes: the places are divided into indoor or outdoor, the time is divided into daytime or night, the user relationship is divided into acquaintances or friends, and the topics are divided into daily topics. The method comprises the steps of utilizing onehot codes to encode classification of places, time, user relations, topics and the like, for example, indoor: [1, 0. ], daytime: [1,0], friends: [1,0] and daily topics: [1,0], and then splicing encoding results to obtain feature vectors corresponding to the interaction mode: the chat mode is [1,0, 0.. 1,0,1,0,0,0,1,0,0].

And obtaining a feature vector Fm of m of each interaction mode and a feature vector Fs of the current scene s according to a mode of one-hot coding or word vector and the like, calculating cosine similarity of the feature vector Fm of m of each interaction mode and the feature vector Fs of the current scene s, wherein sim=Fs is the same as Fm/(|Fs|Fm|), the bigger the sim value is, the more the scene s is matched with the interaction mode m, and converting the similarity sim into scene matching degree by using F (t) =a is the same as sim+b, a and b are scaling parameters, and mapping F (t) to be between 0 and 1.

For example, a history trigger frequency corresponding to an interaction pattern is obtained, and when the history trigger frequency is smaller, the frequency matching degree H (t) corresponding to the interaction pattern is larger.

For example, when the interactive mode a and the interactive mode B are obtained, the mode a is triggered 10 times in the past 1 hour, and the mode B is triggered 5 times in the past 1 hour, the history trigger frequency of the obtained interactive mode is that the mode a frequency is 10 times/60 minutes=0.17 times/minute, the mode B frequency is 5/60 minutes=0.08 times/minute, if the maximum history trigger frequency is set to 0.2 times/minute, and the matching degree is 1 when the frequency is 0. The matching degree function H (f) =1-f/0.1, where f is the historical trigger frequency corresponding to the interaction mode, so that the frequency matching degree corresponding to the mode a is H (0.17) =1-0.17/0.2=0.15, and the frequency matching degree corresponding to the mode B is H (0.08) =1-0.08/0.1=0.2, so that the frequency matching degree corresponding to the interaction mode B is larger.

The triggering time interval corresponding to the interaction mode is obtained, and the triggering time interval is used for counting the length of the time of last triggering of each interaction mode from the current moment, and when the triggering time interval is larger, the time interval matching degree D (t) is larger.

For example, defining the current time as t, counting the last time of triggering each interaction mode, wherein the last time of triggering the interaction mode A is t-5 minutes, and the last time of triggering the interaction mode B is t-10 minutes, so as to calculate the time interval of the two modes as t- (t-5) =5 minutes for the interaction mode A and t- (t-10) =10 minutes for the interaction mode B. A time interval matching degree function D (x) =k×x is defined, where k is a scaling factor and x is a time interval, the longer the time interval, the higher the matching degree.

Therefore, the time interval matching degree of the two modes is calculated to be respectively an interaction mode A: D (5) =k×5, an interaction mode B: D (10) =k×10, and then the time interval matching degree corresponding to the interaction mode A and the time interval matching degree corresponding to the interaction mode B are compared, and the time interval matching degree D (10) of the time interval matching degree of the mode B is higher because the time interval of the mode B is longer than 10 minutes, wherein k is more than or equal to 1.

For example, the triggering probability and the weight parameter θ corresponding to the triggering probability are multiplied to obtain a first product value, the scene matching degree and the weight parameter γ corresponding to the scene matching degree are multiplied to obtain a second product value, the frequency matching degree and the weight parameter δ corresponding to the frequency matching degree are multiplied to obtain a third product value, and the time interval matching degree and the weight parameter η corresponding to the time interval matching degree are multiplied to obtain a fourth product value. And adding the first product value, the second product value, the third product value and the fourth product value to obtain the scene fitting degree corresponding to the interaction mode, and taking the interaction mode corresponding to the maximum scene fitting degree as the target interaction mode.

And step S1044, determining an interaction strategy corresponding to the target object according to the target interaction mode.

Illustratively, the target object determines a corresponding interaction strategy according to the target interaction mode.

For example, when the target interaction mode is shouting, the interaction strategy is that two hands are placed around the mouth so as to improve the sound, and the like; when the target interaction mode is hug, the interaction strategy is that the target object reaches the target position and opens the upper arm and the like.

Step 105, executing corresponding interaction operation according to the interaction strategy.

By way of example, according to the interaction strategy, the target object executes corresponding interaction operation, so that the target object can better immersive experience virtual scenes, and good experience is obtained.

In some embodiments, after performing the corresponding interaction operation according to the interaction policy, the method further includes: and updating the interaction strategy of the target object.

By means of example, user information corresponding to the target object is collected again, interaction is triggered to enter a new stage, triggering probability corresponding to the interaction mode is recalculated, the corresponding interaction mode is reordered, and accordingly a new interaction strategy is obtained, continuity and development of the interaction are reflected to a certain extent, further, according to the reordered result, the next coherent interaction mode with high triggering probability is recommended, current interaction can be continued better, coherent evolution of the interaction is facilitated, and the method is repeated in such a way, and dynamic optimization and collaborative development of the interaction process are achieved.

In some implementations, for better immersive experience of the virtual scene, content contained in the real scene may be created in the virtual environment.

Taking a bar scene as an example, when creating a virtual bar scene, virtual patterns of blocks including a stage, a cassette area, a bar counter, a dance pool and the like can be: the stage area is a semicircular area, is slightly higher than other areas, and is provided with a lighting lamp and sound equipment. The virtual stage has band performance or DJ activity; the clamping seat area consists of a plurality of clamping seats, the semicircular layout surrounds the stage area, each clamping seat comprises a sofa seat and a tabletop, the clamping seats are separated by adopting a wooden or stone screen, and a relatively real clamping seat area is provided with small decorative articles to increase the sense of reality; the bar counter area is a strip-shaped area, a plurality of high-foot bar chairs are arranged, and interact with bar cocktail makers in a dialogue manner, and small accessories such as a drinking vessel, a cocktail mixer and the like are arranged on the bar counter; the dance pond area is an open area, is close to the sound equipment and is used for guests to dance, and the periphery of the area is provided with an LED screen or a laser lamp which is matched with music to create a strong dance pond atmosphere. Other rich virtual objects such as tables and chairs, sound equipment, light and the like are also contained in the bar virtual scene, and the physical properties of the objects are set, so that the reality of the scene is realized.

In some embodiments, in order to improve the real experience of the target object in the virtual bar scene, the application also supports setting the image of the target object, provides a rich virtual image template, and allows the target object to be personalized and customized by selecting favorite images. Wherein, the virtual image corresponding to the target object at least comprises a performer image and a spectator image.

The virtual image template design comprises a three-dimensional artist design of virtual image templates with various styles, including actor images, such as singer, dance actor, band member images and audience images, such as young men and women images, middle-aged and elderly people images, and the like. Each image template is provided with different facial features, hairstyles, garments and decorations.

The modeling of the virtual image corresponding to the target object can model each virtual image with high precision according to the design draft, and each virtual image comprises a head, an upper body, a lower body, hands, feet, a hairstyle and the like. All avatar templates share a unified animation skeleton for subsequent animation map use. Each modeling part is matched with a real material and a high-definition texture when setting a corresponding avatar according to a target object. The hair material is lifelike, the texture of the clothing cloth is fine and accurate, and the color matching of all elements accords with the skin color of a real person, so that the effect of high fidelity of the image material is achieved.

In addition, the target object can also select favorite templates in the display area according to personalized settings when setting the avatar, and the system provides rich personalized options for modification, such as changing facial features, hairstyles, clothing styles and the like. The target object can be continuously modified according to own will to customize the virtual image belonging to the user. When the target object finishes personalized customization and generates an own virtual image, the system automatically extracts all the information such as the three-dimensional geometric resources, the materials, the textures and the like after customization. And synchronizing the specification of the virtual character to the subsequent virtual reality scene, and performing visual coordination matching and vivid interaction between the virtual character of the driving target object and the surrounding virtual environment.

In some embodiments, to ensure the authenticity of the target object, the application sets multiple states and actions for the avatars of different target objects, such as the actor figures need to have rich performance actions and the audience figures need to have various interactions. Through rich and varied state settings and action designs, different avatars can exhibit lifelike gestures and interactions in the virtual reality scene. The actor's image drives the whole scene atmosphere, and the spectator's image produces coherent interactive experience with it.

For example, the actor's figures include standing, walking, performing, interactive, wherein,

the standing state is divided into a standing still state, a standing waving handedness state, a standing nodding handedness state and the like; different gait animations such as normal walking, slow walking, fast walking, etc. of the walking action atmosphere. The performance actions are divided into stage performance common actions such as a hand-held microphone lifting action, a double-hand boxing lifting action, a microphone rotation action, a double-hand clapping action, a jump action, a turning action and the like; the interactive motion is classified into a gesture motion representing interaction such as a motion looking in a specific direction, a motion extending fingers toward a specific person, a motion grasping hair with both hands, a shoulder shrugging motion, a chest beating motion, and the like.

For example, audience images are classified into standing states, walking actions, interactive actions, and transaction actions. Wherein, the standing state is divided into standing states of standing still, standing listening, standing watch and other different states; the walking action is divided into walking states such as slow walking, position pacing and the like; the interaction actions include a clapping action of both hands, a lifting action of both hands, a nodding action, a shaking action, a shoulder shrugging action, a waving action to a performer, and the like, which represent gesture actions of interaction between spectators and the performer; the transaction actions are set by the actions of the handheld beverage cup, the simulated drinking action, the action of taking up the surrounding small food for eating, the action of taking out the real object from the pocket or the wallet for purchasing, and the like.

Therefore, the target object in the application can automatically select the virtual image and related actions, and the higher the individuation degree is, the stronger the experience is. Meanwhile, corresponding special effects can be set for appearance, movement, state switching and the like of the virtual image, and the sense of reality is enhanced.

In some embodiments, when the trigger probability of one interaction mode in the interaction mode library is far greater than that of other interaction modes, the interaction mode is the best choice in the current interaction state, and the interaction mode should be explicitly recommended to the target object for interaction, so that the continuity of interaction experience of the target object is improved, and the immersion sense is realized.

For example, when detecting whether the triggering probability of one interaction mode in the interaction mode library is far higher than that of other selectable interaction modes, if the triggering probability P (i) of the interaction mode i exceeds more than 50% of the triggering probabilities of all other interaction modes, the interaction mode i is considered to be far higher than that of the other interaction modes, and whether to start an active recommendation strategy is inquired according to prompt information sent to a target object, if the target object prefers to autonomously explore the experience, a prompt mode of lower intervention is maintained; if the continuity of the interactive experience of the target object is poor, a more active recommendation strategy is selected for guiding; otherwise, maintaining the current ordering prompt strategy.

A visual cue (e.g., highlighting or arrow cues within the virtual environment), an audible cue (e.g., voice-interpreted cues), a tactile cue (e.g., device vibration cues), or a combination of modes (audiovisual or touch-sensitive) may be selected in determining the recommended manner of interaction pattern i. But also requires a control of the amount of information to avoid overload.

After the recommended prompting mode of the interaction mode is selected, prompting contents can be designed and implemented in the virtual environment. If the active area of the mode i is highlighted, broadcasting a voice prompt interaction mode in the scene; or generating a vibration alert on the tactile device corresponding to the interaction pattern i.

In some embodiments, after determining the interaction mode selected by the target object, detecting the interaction response of the target object to the interaction mode, if the target object does not perform the interaction operation of the interaction mode within a certain time (for example, within 10 seconds), indicating that the recommendation prompt does not reach the expected effect, and detecting the probability distribution situation corresponding to each interaction mode in the interaction mode library again. If the target object responds to the interaction mode, the recommendation is successful, and the probability calculation between the interaction state and the interaction mode is further updated. Updating the recommendation strategy according to the recommendation effect detection result, if the recommendation is successful, the tendency of the recommendation strategy is correspondingly improved, and if the recommendation is failed, the next recommendation strategy is adjusted (such as weakening the recommendation strength).

In some embodiments, each target object browses the virtual scene through its own virtual reality head-display device, and the view angle of the image is calculated in real time according to the position and the line-of-sight direction of the virtual character corresponding to the target object in the scene, and if the two target objects are located at different positions of the scene, the view and the detail of the image seen through the head-display will be completely different.

In some embodiments, when any one of the target objects interacts to change the scene content, such as by operating a switch to illuminate a light, the change may be presented in real-time and simultaneously on the head-up screen of each target object. The scene detail changes seen by all target objects are uniform and instantaneously completed. Besides the visual angle, the target object can hear the three-dimensional stereo sound effect which accords with the position of the target object in the scene to deepen the sense immersion. Multiple target objects appear in the dancing pool area at the same time, and can also see virtual roles of the target objects to interact, so that a strong multi-person social atmosphere is created. Through advanced computer graphics technology, virtual reality scenes can support simultaneous appearance of multiple target objects and real-time rendering of personalized perspectives and experiences for each target object. The target object can be explored and interacted with other virtual guests in various areas to obtain social immersion feeling.

For example, the first target object a selects a young man image to enter a bar scene, detects that a is located in a card seat area in the bar scene, communicates with other virtual visitors to a certain extent, and judges that the first target object a is in a common social interaction state. In the interaction mode library, the system searches the interaction modes matched with the state, including nodding to the mind, kissing to the hand, beating to the palm and the like. And calculating according to the mode threshold and the matching degree, wherein the probability of recommending the palm beating mode is 70%. The probability of the 'kiss' mode is 20%, the 'hug' mode is only 5%, then a visual prompt is adopted, an arrow is displayed above the arm of another virtual young male image (a second target object B) in the cassette area, and the 'clapping' interaction is performed by the prompt A. Meanwhile, the A is judged to better actively recommend the experience, so that interaction prompt information is intuitively displayed, the A sees the prompt, the palm beating interaction is carried out with the virtual image of the B, the success of the interaction is detected, the interaction is updated to be an active interaction state, the system recalculates the interaction mode probability, the rising of the 'hug' mode to 30% is found, the mode is the next recommendation preference, the visual prompt is adopted, the gesture of opening double arms of the B is displayed, the voice prompt is 'hug |', and the A is prompted to carry out the 'hug' interaction. The virtual images of A and B are hugged, interaction is successful again, and therefore dependence on an active recommendation strategy is enhanced, and further in subsequent interaction, a matched interaction mode with high probability is continuously recommended, and continuous adjustment is carried out according to effects, so that a consistent high-quality social experience of A in a virtual reality bar scene is achieved.

For example, the target object C selects a middle-aged female image to enter a bar scene, detects that a singer performs show in the current bar, and C is located on a seat of a spectator area, judges to watch performance interaction state, and searches interaction modes matched with the state in an interaction mode library, wherein the interaction modes comprise following singing, clapping, lifting hands, shouting and the like. And calculating according to the mode threshold and the matching degree, and recommending the maximum probability of the 'applause' mode to be 90%. The probability of the "shouting" mode is 70%, and the probability of the "lifting hands" mode is 60%. The performance is trendy, audible prompts are adopted, prompt tones are played in the C earphone, namely 'fast fueling singer with a clapping bar |', meanwhile, the C is considered to be biased towards passive interaction, a lighter prompt mode is selected, and the C hears the prompts and carries out intense clapping interaction along with audiences. The method comprises the steps of detecting successful interaction, updating to a high interaction state, recalculating the probability of the mode, finding that a 'shouting' mode is increased to 85%, a 'following singing' mode is 75%, becoming the recommendation preference of the next step, enabling performance to continue to reach a new climax, prompting a user C to conduct 'shouting' and 'following singing' interaction through voice prompt, enabling the user C to participate in the interaction, enabling the user C to shouting and follow singing with other virtual audiences, enabling the interaction to succeed again, updating the dependence on active recommendation, continuing to monitor the performance rhythm before performance is finished, recommending the matched interaction mode, guiding the C to achieve the full-immersion performance experience, and achieving the real experience effect of a target object in a virtual reality scene.

The meta-universe-based user interaction method provided by the application creates a meta-universe application scene-virtual bar supporting large-scale social interaction and cultural interaction, obtains expression information, language information, physiological information and/or distance information corresponding to target objects in a virtual environment, further analyzes the interaction mode between a first target object and a second target object in the virtual bar in real time, recommends the most natural and matched social interaction strategy, realizes smooth interaction experience, further appears and realizes cooperative interaction with virtual avatars of other target objects in the same virtual space, generates strong immersive effect and common experience feeling, can be better applied to various meta-universe application scenes needing high social interaction and cultural interaction, realizes high-level interaction and high-level interaction between people, and has high application value and market potential.

Referring to fig. 3, fig. 3 is a meta-universe-based user interaction device 200 provided by an embodiment of the present application, where the meta-universe-based user interaction device 200 includes a data acquisition module 201, a data processing module 202, a data analysis module 203, a policy determination module 204, and a policy execution module 205, where the data acquisition module 201 is configured to construct an interaction pattern library and acquire user data of at least one target object, and the interaction pattern library includes at least one interaction pattern; a data processing module 202, configured to detect status information corresponding to a target object based on the user data; the data analysis module 203 is configured to determine a trigger probability corresponding to each interaction mode according to the state information; the policy determining module 204 is configured to generate an interaction policy corresponding to the target object based on the trigger probability corresponding to each interaction mode; the policy execution module 205 is configured to execute a corresponding interaction operation according to the interaction policy.

In some embodiments, the user data includes expression information, voice information, physiological information, and/or distance information, and the data processing module 202 performs, in the process of detecting the state information corresponding to the target object based on the user data:

facial image data and voice data corresponding to the target object are obtained, feature analysis is carried out on the facial image data to obtain expression information, and voice analysis is carried out on the voice data to obtain voice information;

collecting physiological signals corresponding to the target object, and performing signal analysis on the physiological signals to obtain physiological information corresponding to the target object;

and/or calculating position information between the target objects and determining the distance information according to the position information;

determining a state matching degree corresponding to the target object based on the expression information, the voice information, the physiological information and/or the distance information, and further determining the state information according to the state matching degree;

wherein, calculate the state matching degree of the goal object according to the following formula:

M = W1×S1 + W2×S2 + ... + Wn×Sn，

wherein i=1, 2,..n, wi represents scoring item weights, si represents scoring item scores, and the scoring items are calculated according to expression information, voice information, physiological information and/or distance information corresponding to the target object.

In some embodiments, the data analysis module 203 performs, in the determining the trigger probability corresponding to each interaction mode according to the state information:

calculating the triggering probability corresponding to each interaction mode according to the following formula:

P=α×M±(1-α)×T，

wherein P represents the triggering probability corresponding to each interaction mode, alpha represents the weight parameter corresponding to each interaction mode, M represents the state matching degree corresponding to each interaction mode, T represents the threshold value corresponding to each interaction mode, when the threshold value is in direct proportion to the triggering probability, the positive number is adopted, and when the threshold value is in inverse proportion to the triggering probability, the negative number is adopted;

calculating a threshold value corresponding to each interaction mode according to the following formula:

T = Mc / Mm ，

wherein Mc represents an actual parameter value corresponding to the target object under the triggering condition threshold corresponding to each interaction mode, and Mm represents an ideal matching threshold corresponding to each interaction mode.

In some embodiments, the data analysis module 203 further performs, before determining the trigger probability corresponding to each interaction mode according to the state information:

calculating the state matching degree corresponding to each interaction mode according to the following formula:

M(t) = M+β*M(t-1)，

wherein M represents the state matching degree calculated according to the user data at the time t, and M (t-1) represents the state matching degree at the time t-1; and beta represents a time attenuation factor and is used for representing the influence degree of the state matching degree M (t-1) corresponding to the t-1 moment on the t moment.

In some embodiments, the at least one target object is a first target object and a second target object, and the data analysis module 203 further performs, in the determining the trigger probability corresponding to each interaction mode according to the state information:

when the interaction mode is a multi-person interaction type, acquiring a first trigger probability corresponding to the first target object and a second trigger probability corresponding to the second target object;

and carrying out fusion analysis according to the first trigger probability and the second trigger probability to obtain target trigger probabilities corresponding to the first target object and the second target object in the interaction mode.

In some embodiments, the policy determining module 204 performs, in the process of generating the interaction policy corresponding to the target object based on the trigger probability corresponding to each interaction mode:

sorting according to the triggering probability corresponding to each interaction mode in the interaction mode library to obtain an interaction mode set corresponding to the maximum triggering probability, wherein the interaction mode set is used for storing the interaction mode corresponding to the maximum triggering probability;

when the number of the interaction modes in the interaction mode set is equal to 1, determining the corresponding interaction mode in the interaction mode set as a target interaction mode;

When the number of the interaction modes in the interaction mode set is more than or equal to 2, a threshold value corresponding to each interaction mode in the interaction mode set is obtained, and a target interaction mode is determined according to the threshold value;

and determining an interaction strategy corresponding to the target object according to the target interaction mode.

In some embodiments, the policy determining module 204 further performs, after the obtaining the threshold value corresponding to each interaction mode in the interaction mode set when the number of interaction modes in the interaction mode set is greater than or equal to 2:

acquiring corresponding interaction characteristics of the target object in an interaction scene, and determining scene matching degree of the interaction mode and the interaction scene according to the interaction characteristics;

acquiring a historical trigger frequency corresponding to the interaction mode and a trigger time interval corresponding to the interaction mode, determining a frequency matching degree corresponding to the interaction mode according to the historical trigger frequency, and determining a time interval matching degree corresponding to the interaction mode according to the trigger time interval;

determining a scene fitting degree corresponding to the interaction mode according to the scene matching degree, the frequency matching degree and the time interval matching degree, and taking the interaction mode corresponding to the maximum value of the scene fitting degree as a target interaction mode;

And calculating the scene fitting degree corresponding to each interaction mode according to the following steps:

Score(t) =γ*F(t) +δ*D(t) +η*H(t)

wherein F (t) represents scene matching degree corresponding to the interaction mode at the moment t, D (t) represents frequency matching degree corresponding to the interaction mode at the moment t, H (t) represents time interval matching degree corresponding to the interaction mode at the moment t, gamma represents weight parameter corresponding to the scene matching degree, delta represents weight parameter corresponding to the frequency matching degree, and eta represents weight parameter corresponding to the time interval matching degree.

Alternatively, the meta-universe based user interaction device 200 may be used for a terminal device.

Referring to fig. 4, fig. 4 is a schematic block diagram of a structure of a terminal device according to an embodiment of the present invention.

As shown in fig. 4, the terminal device 300 includes a processor 301 and a memory 302, the processor 301 and the memory 302 being connected by a bus 303, such as an I2C (Inter-integrated Circuit) bus.

In particular, the processor 301 is used to provide computing and control capabilities, supporting the operation of the entire terminal device. The processor 301 may be a central processing unit (Central Processing Unit, CPU), the processor 301 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field-programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Specifically, the Memory 302 may be a Flash chip, a Read-Only Memory (ROM) disk, an optical disk, a U-disk, a removable hard disk, or the like.

It will be appreciated by those skilled in the art that the structure shown in fig. 4 is merely a block diagram of a portion of the structure related to the embodiment of the present invention, and does not constitute a limitation of the terminal device to which the embodiment of the present invention is applied, and that a specific server may include more or less components than those shown in the drawings, or may combine some components, or have a different arrangement of components.

The processor is used for running a computer program stored in the memory, and implementing any one of the user interaction methods based on the metauniverse provided by the embodiment of the invention when the computer program is executed.

In an embodiment, the processor is configured to run a computer program stored in a memory and to implement the following steps when executing the computer program:

In some embodiments, the user data includes expression information, voice information, physiological information, and/or distance information, and the processor 301 performs, in the process of detecting the state information corresponding to the target object based on the user data:

M = W1×S1 + W2×S2 + ... + Wn×Sn，

In some embodiments, the processor 301 performs, in the determining the trigger probability corresponding to each interaction mode according to the state information:

P=α×M±(1-α)×T，

T = Mc / Mm ，

In some embodiments, before the determining the trigger probability corresponding to each interaction mode according to the state information, the processor 301 further performs:

M(t) = M+β*M(t-1)，

In some embodiments, the at least one target object is a first target object and a second target object, and the processor 301 further performs, in the determining the trigger probability corresponding to each interaction mode according to the state information:

In some embodiments, the processor 301 performs, in the process of generating the interaction policy corresponding to the target object based on the trigger probability corresponding to each interaction mode:

In some embodiments, the processor 301 further performs, after the obtaining the threshold value corresponding to each of the interaction modes in the interaction mode set when the number of interaction modes in the interaction mode set is greater than or equal to 2:

Score(t) =γ*F(t) +δ*D(t) +η*H(t)

It should be noted that, for convenience and brevity of description, specific working processes of the terminal device described above may refer to corresponding processes in the foregoing meta-universe-based user interaction method embodiment, and are not described herein again.

Embodiments of the present invention also provide a storage medium for computer readable storage, where the storage medium stores one or more programs that can be executed by one or more processors to implement any of the steps of the metauniverse-based user interaction method as provided in the embodiments of the present invention.

The storage medium may be an internal storage unit of the terminal device according to the foregoing embodiment, for example, a hard disk or a memory of the terminal device. The storage medium may also be an external storage device of the terminal device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the terminal device.

Those of ordinary skill in the art will appreciate that all or some of the steps, systems, functional modules/units in the apparatus, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware embodiment, the division between the functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed cooperatively by several physical components. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.

It should be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments. While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. A meta-universe based user interaction method, the method comprising:

constructing an interaction mode library and acquiring user data of at least one target object, wherein the interaction mode library comprises at least one interaction mode; the interaction mode is a single interaction mode or a multi-person interaction mode;

executing corresponding interaction operation according to the interaction strategy;

the user data includes expression information, voice information, physiological information and/or distance information, and the detecting state information corresponding to a target object based on the user data includes:

M＝W1×S1+W2×S2+...+Wn×Sn，

wherein i=1, 2, n, wi represents scoring item weights, si represents scoring item scores, and the scoring items are calculated according to expression information, voice information, physiological information and/or distance information corresponding to a target object;

determining the triggering probability corresponding to each interaction mode according to the state information, wherein the triggering probability comprises the following steps:

P＝α×M±(1-α)×T，

T＝Mc/Mm，

2. The method of claim 1, wherein before determining the trigger probability corresponding to each interaction mode according to the state information, the method further comprises:

M(t)＝M+β*M(t-1)，

3. The method of claim 1, wherein the at least one target object is a first target object and a second target object, and wherein the determining the trigger probability corresponding to each interaction mode according to the state information further comprises:

4. The method of claim 1, wherein generating the interaction strategy corresponding to the target object based on the trigger probability corresponding to each interaction pattern comprises:

5. The method of claim 4, wherein when the number of interaction modes in the interaction mode set is greater than or equal to 2, after obtaining the threshold value corresponding to each interaction mode in the interaction mode set, the method further comprises:

Score(t)＝γ*F(t)+δ*D(t)+η*H(t)

6. A meta-universe based user interaction device, comprising:

the data acquisition module is used for constructing an interaction mode library and acquiring user data of at least one target object, wherein the interaction mode library comprises at least one interaction mode; the interaction mode is a single interaction mode or a multi-person interaction mode;

the strategy execution module is used for executing corresponding interaction operation according to the interaction strategy;

the user data comprises expression information, voice information, physiological information and/or distance information, and the data processing module comprises:

the data acquisition sub-module is used for acquiring facial image data and voice data corresponding to the target object, performing feature analysis on the facial image data to obtain expression information, and performing voice analysis on the voice data to obtain voice information;

The data acquisition sub-module is used for acquiring physiological signals corresponding to the target object and carrying out signal analysis on the physiological signals to obtain physiological information corresponding to the target object;

and/or a data calculation sub-module, which is used for calculating the position information between the target objects and determining the distance information according to the position information;

the data determining submodule is used for determining state matching degree corresponding to the target object based on the expression information, the voice information, the physiological information and/or the distance information, and further determining the state information according to the state matching degree;

M＝W1×S1+W2×S2+...+Wn×Sn，

the data analysis module is specifically configured to:

P＝α×M±(1-α)×T，

T＝Mc/Mm，

7. A terminal device, characterized in that the terminal device comprises a processor and a memory;

the memory is used for storing a computer program;

the processor is configured to execute the computer program and to implement the meta-universe based user interaction method as claimed in any one of claims 1 to 5 when the computer program is executed.

8. A computer-readable storage medium, which when executed by one or more processors, causes the one or more processors to perform the metauniverse-based user interaction method steps of any one of claims 1 to 5.