CN117908683B - Intelligent mobile AI digital human interaction method and system based on transparent display equipment - Google Patents

Intelligent mobile AI digital human interaction method and system based on transparent display equipment Download PDF

Info

Publication number
CN117908683B
CN117908683B CN202410314018.2A CN202410314018A CN117908683B CN 117908683 B CN117908683 B CN 117908683B CN 202410314018 A CN202410314018 A CN 202410314018A CN 117908683 B CN117908683 B CN 117908683B
Authority
CN
China
Prior art keywords
intelligent mobile
digital
response
execution
expression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410314018.2A
Other languages
Chinese (zh)
Other versions
CN117908683A (en
Inventor
张启红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Qili Technology Co ltd
Original Assignee
Shenzhen Qili Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Qili Technology Co ltd filed Critical Shenzhen Qili Technology Co ltd
Priority to CN202410314018.2A priority Critical patent/CN117908683B/en
Publication of CN117908683A publication Critical patent/CN117908683A/en
Application granted granted Critical
Publication of CN117908683B publication Critical patent/CN117908683B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to the technical field of intelligent mobile AI digital human interaction, and discloses an intelligent mobile AI digital human interaction method and system based on transparent display equipment. The method comprises the following steps: performing interactive recognition and tracking on a target user through transparent display equipment to obtain voice data, expression data and action data; performing feature extraction and feature association relation analysis to generate digital person dialogue response parameters, digital person expression response parameters and digital person action response parameters; generating an initial intelligent mobile AI digital human interaction execution parameter combination; generating a first intelligent mobile AI digital person and performing multi-dimensional digital person execution strategy adjustment to obtain a voice response execution strategy, an expression response execution strategy and an action response execution strategy; the method and the device for controlling the intelligent mobile AI digital person interaction determine the target intelligent mobile AI digital person interaction execution parameter combination and perform response optimization to obtain the second intelligent mobile AI digital person.

Description

Intelligent mobile AI digital human interaction method and system based on transparent display equipment
Technical Field
The application relates to the technical field of intelligent mobile AI digital human interaction, in particular to an intelligent mobile AI digital human interaction method and system based on transparent display equipment.
Background
Under the current technical background, the combination of Artificial Intelligence (AI) and transparent display technology opens up a new interaction mode, and introduces a new wave of digital human interaction technology. The transparent display device provides possibility for realizing more natural and visual man-machine interaction due to the unique visual transparency, and particularly has wide application potential in the fields of retail, exhibition, home automation and the like. However, while this combination of techniques brings about an unprecedented interactive experience, many challenges remain in practical applications.
Existing intelligent mobile AI digital human interaction methods often rely on traditional display technology, which limits the naturalness and immersion of the user experience. The physical limitations and manner of operation of conventional display devices have somewhat hindered the natural communication between users and digital content, resulting in limitations in the interactive experience. In the prior art, when complex background noise, subtle expression changes and rapid and irregular actions are processed, high accuracy and instantaneity are often difficult to achieve, and natural smoothness of interaction and timeliness of response are limited. How to adjust the dialogue content, expression and action response of the digital person in real time according to the specific requirements and emotion changes of the user so as to realize more personalized and emotional interaction is an important direction of the current technology development. The method not only needs to deeply analyze the user behavior data, but also needs to combine a complex AI algorithm to perform real-time processing and response so as to meet the interaction requirements of user diversification and dynamic change.
Disclosure of Invention
The application provides an intelligent mobile AI digital human interaction method and system based on transparent display equipment.
In a first aspect, the present application provides an intelligent mobile AI digital human interaction method based on a transparent display device, where the intelligent mobile AI digital human interaction method based on the transparent display device includes:
performing interactive recognition and tracking on a target user through transparent display equipment to obtain voice data, expression data and action data of the target user;
Respectively carrying out feature extraction on the voice data, the expression data and the action data to obtain a voice feature set, an expression feature set and an action feature set, and carrying out feature association relation analysis on the voice feature set, the expression feature set and the action feature set based on a three-layer Bayesian network to obtain a feature association relation set;
performing digital human self-adaptive dialogue response analysis on the voice feature set to generate digital human dialogue response parameters, performing emotion perception and digital human expression response analysis on the expression feature set to generate digital human expression response parameters, and performing user behavior prediction and action response analysis on the action feature set to generate digital human action response parameters;
Performing intelligent mobile AI digital human interaction execution synchronous analysis and parameter fusion on the digital human dialogue response parameters, the digital human expression response parameters and the digital human action response parameters according to the characteristic association relation set to generate initial intelligent mobile AI digital human interaction execution parameter combination;
Generating a first intelligent mobile AI digital person corresponding to the target user according to the initial intelligent mobile AI digital person interaction execution parameter combination through the transparent display equipment, and performing multidimensional digital person execution strategy adjustment on the first intelligent mobile AI digital person through a reinforcement learning algorithm to obtain a voice response execution strategy, an expression response execution strategy and an action response execution strategy;
and determining a target intelligent mobile AI digital person interaction execution parameter combination of the first intelligent mobile AI digital person according to the voice response execution strategy, the expression response execution strategy and the action response execution strategy, and carrying out response optimization on the first intelligent mobile AI digital person based on the target intelligent mobile AI digital person interaction execution parameter combination to obtain a second intelligent mobile AI digital person.
In a second aspect, the present application provides an intelligent mobile AI digital human interaction system based on a transparent display device, where the intelligent mobile AI digital human interaction system based on the transparent display device includes:
The recognition module is used for carrying out interactive recognition and tracking on the target user through the transparent display equipment to obtain voice data, expression data and action data of the target user;
The extraction module is used for carrying out feature extraction on the voice data, the expression data and the action data respectively to obtain a voice feature set, an expression feature set and an action feature set, and carrying out feature association relation analysis on the voice feature set, the expression feature set and the action feature set based on a three-layer Bayesian network to obtain a feature association relation set;
the analysis module is used for carrying out digital human self-adaptive dialogue response analysis on the voice feature set to generate digital human dialogue response parameters, carrying out emotion perception and digital human expression response analysis on the expression feature set to generate digital human expression response parameters, and carrying out user behavior prediction and action response analysis on the action feature set to generate digital human action response parameters;
The fusion module is used for carrying out intelligent mobile AI digital human interaction execution synchronous analysis and parameter fusion on the digital human dialogue response parameters, the digital human expression response parameters and the digital human action response parameters according to the characteristic association relation set to generate initial intelligent mobile AI digital human interaction execution parameter combination;
The adjusting module is used for generating a first intelligent mobile AI digital person corresponding to the target user according to the initial intelligent mobile AI digital person interaction execution parameter combination through the transparent display equipment, and performing multidimensional digital person execution strategy adjustment on the first intelligent mobile AI digital person through a reinforcement learning algorithm to obtain a voice response execution strategy, an expression response execution strategy and an action response execution strategy;
And the optimization module is used for determining a target intelligent mobile AI digital person interaction execution parameter combination of the first intelligent mobile AI digital person according to the voice response execution strategy, the expression response execution strategy and the action response execution strategy, and carrying out response optimization on the first intelligent mobile AI digital person based on the target intelligent mobile AI digital person interaction execution parameter combination to obtain a second intelligent mobile AI digital person.
A third aspect of the present application provides a computer apparatus comprising: a memory and at least one processor, the memory having instructions stored therein; the at least one processor invokes the instructions in the memory to cause the computer device to perform the intelligent mobile AI digital human interaction method described above based on the transparent display device.
A fourth aspect of the present application provides a computer-readable storage medium having instructions stored therein that, when executed on a computer, cause the computer to perform the above-described intelligent mobile AI digital human interaction method based on a transparent display device.
According to the technical scheme provided by the application, the digital content and the real-world visual experience can be seamlessly fused through the application of the transparent display equipment, and the user can interact with the intelligent mobile AI digital person through visual experience, so that the user experience is smoother and lively. By efficiently capturing and analyzing the voice, expression and action data of the target user, the accurate recognition and real-time tracking of the user interaction intention are realized. Through feature extraction and a three-layer Bayesian network, subtle differences of user behaviors can be effectively analyzed, so that more accurate interaction response is provided. The fluency and response speed in the interaction process are ensured, and the user satisfaction is greatly improved. Based on the AI analysis capability, the dialogue content, expression performance and action response of the digital person can be adjusted in real time according to multidimensional data such as voice, expression and action of the user. Through natural language processing and reinforcement learning, specific demands of users can be understood, emotion changes of the users can be perceived, and personalized and emotional response is achieved. The interaction between the user and the intelligent mobile AI digital person is more natural and emotional, and the participation feeling and satisfaction degree of the user are enhanced. The transparent display equipment is combined with the intelligent mobile AI digital man-in-the-art technology, so that information display and interactive consultation can be provided, personalized service can be carried out according to scene requirements, and further control accuracy of intelligent mobile AI digital man-in-the-art is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained based on these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of an embodiment of an intelligent mobile AI digital human interaction method based on a transparent display device in an embodiment of the application;
fig. 2 is a schematic diagram of an embodiment of an intelligent mobile AI digital human interaction system based on a transparent display device according to an embodiment of the present application.
Detailed Description
The embodiment of the application provides an intelligent mobile AI digital human interaction method and system based on transparent display equipment. The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.
For easy understanding, the following describes a specific flow of an embodiment of the present application, referring to fig. 1, and an embodiment of an intelligent mobile AI digital human interaction method based on a transparent display device in an embodiment of the present application includes:
step 101, performing interactive recognition and tracking on a target user through transparent display equipment to obtain voice data, expression data and action data of the target user;
It can be understood that the execution subject of the present application may be an intelligent mobile AI digital human interaction system based on a transparent display device, and may also be a terminal or a server, which is not limited herein. The embodiment of the application is described by taking a server as an execution main body as an example.
Specifically, a voice signal is collected for a target user through an audio sensor arranged in the transparent display device, and an acoustic wave signal sent by the user is captured. And then, carrying out short-time Fourier transform on the voice signal, converting the time domain signal into a frequency domain signal, and analyzing the frequency components of the voice signal more accurately. The short-time fourier transform cuts the signal by introducing a window function and represents the twiddle factor of the frequency domain with a complex exponential function, thereby obtaining the frequency and time correlation of the voice data. An image sensor in the transparent display device performs visual interactive recognition on a user and captures image data of the user. And carrying out depth calculation on the image data by analyzing the difference between the pixel points to obtain the depth information of each pixel point in the image. Based on the pixel depth data, pixel pairs are matched and a disparity value is calculated, which reflects the difference in distance between each portion of the image and the camera. By analyzing the parallax value, face correction can be carried out on the user image, a corrected face image is generated, and expression recognition is carried out on the face image, so that expression data are obtained. The motion state of the user is estimated through a preset state estimation function, and the function is based on a Kalman filtering theory and is a linear dynamic system state estimation method. The state estimation function takes the difference between the actual measured value and the predicted value into consideration, and the motion state estimation data is obtained by adjusting the Kalman gain optimization estimation result. Based on the motion state estimation data, the acceleration, the current speed and the position information of the user are determined, and accurate motion data is further calculated.
Step 102, respectively extracting features of voice data, expression data and action data to obtain a voice feature set, an expression feature set and an action feature set, and analyzing feature association relations of the voice feature set, the expression feature set and the action feature set based on a three-layer Bayesian network to obtain a feature association relation set;
Specifically, mel frequency cepstrum coefficients are adopted to perform feature extraction on voice data, so that the essential characteristics of voice signals are effectively captured. Mel-frequency cepstrum coefficients transform a complex speech signal into a set of eigenvalues, i.e. a set of speech features, that characterize the unique characteristics of speech by simulating the different perceptions of sounds of different frequencies by the human ear. The useful feature representation is automatically learned from the image data by analysis of the surface data through a Convolutional Neural Network (CNN). Through multi-level convolution operation and nonlinear mapping on the image, the CNN can extract key features of the expression, such as key points of the facial expression and expression intensity, and form an expression feature set. In the aspect of feature extraction of motion data, a skeleton tracking technology is adopted, and motion and gesture information of a user is captured by identifying and tracking the positions of key points of a human body. And extracting a feature set of the action by analyzing the movement track and the speed change of the human skeleton. And comprehensively analyzing the extracted voice feature set, expression feature set and action feature set based on the three-layer Bayesian network. The three-layer Bayesian network is used as a probability graph model, and can effectively establish causal relationship and dependency relationship among different data. By constructing nodes and edges in the network, which represent feature sets and relationships between them, respectively, the inherent relations between speech and expression, speech and motion, and expression and motion are analyzed. And carrying out quantitative analysis on the association strength of the relation through a preset characteristic association system, and calculating association strength values among the voice and the expression, the voice and the action and the expression and the action. The association strength values reflect the degree of tightness of interaction among different features, and a quantitative decision basis is provided for interactive execution of intelligent mobile AI digital people. And creating a characteristic association relation set according to the calculated association strength value, and providing scientific guidance for response generation and behavior adjustment of the intelligent mobile AI digital person.
Step 103, carrying out digital human self-adaptive dialogue response analysis on the voice feature set to generate digital human dialogue response parameters, carrying out emotion perception and digital human expression response analysis on the expression feature set to generate digital human expression response parameters, and carrying out user behavior prediction and action response analysis on the action feature set to generate digital human action response parameters;
specifically, user intention analysis is performed on the captured voice feature set through a natural language processing model, and intention information of a user is identified. The natural language processing model converts complex language input into a clear representation of user intent by understanding lexical, syntactic, and semantic information in speech. According to the user intention information, dynamically adjusting the dialogue strategy of the digital person, and generating a targeted answer or action, thereby realizing self-adaptive dialogue response. And aiming at analysis of the expression feature set, carrying out emotion perception classification by adopting a support vector machine model. The support vector machine model can accurately identify the current emotion state of the user through high-efficiency classification processing of the emotion features, so that a digital person can respond to the user with corresponding expression or mood, and the emotion depth and sense of reality of interaction are enhanced. And predicting the user behavior of the action feature set through the graph neural network. The graph neural network is able to predict the user's next behavior or intent by analyzing a structured representation of the user's actions, such as the temporal and spatial relationships of the actions.
104, Performing intelligent mobile AI digital human interaction execution synchronous analysis and parameter fusion on the digital human dialogue response parameters, the digital human expression response parameters and the digital human action response parameters according to the characteristic association relation set to generate an initial intelligent mobile AI digital human interaction execution parameter combination;
Specifically, intelligent mobile AI digital human interaction execution synchronization analysis is performed on the digital human dialogue response parameters, the digital human expression response parameters and the digital human action response parameters to generate interaction execution synchronization information, and the information describes time sequence relations and interactions among different response parameters. And according to the interactive execution synchronization information, aligning response parameters of the dialogue response parameters, the expression response parameters and the action response parameters, and ensuring that the behavior and the performance of the intelligent mobile AI digital person at any given moment are consistent. And comprehensively fusing the response parameters subjected to the synchronous analysis and alignment based on the characteristic association relation set to generate an initial intelligent mobile AI digital human interaction execution parameter combination. In this process, the feature association set contains detailed information about how the different response parameters complement and augment each other. Through analysis and application of the information, dialogue, expression and action parameters are integrated to form a unified interactive execution scheme. The scheme considers the specific requirements and situations of users, and also fully utilizes the internal connection among different features, thereby realizing a finer and personalized interaction mode.
Step 105, generating a first intelligent mobile AI digital person corresponding to the target user according to the initial intelligent mobile AI digital person interaction execution parameter combination through the transparent display equipment, and performing multi-dimensional digital person execution strategy adjustment on the first intelligent mobile AI digital person through a reinforcement learning algorithm to obtain a voice response execution strategy, an expression response execution strategy and an action response execution strategy;
specifically, a first intelligent mobile AI digital person corresponding to a target user is generated through transparent display equipment according to the initial intelligent mobile AI digital person interaction execution parameter combination. The parameters are converted into specific behaviors, expressions and language outputs of the intelligent mobile AI digital person, so that the intelligent mobile AI digital person can interact with the user in a manner meeting the expected and situation demands of the user. A plurality of initial agents for the first intelligent mobile AI digital person are created by a reinforcement learning algorithm and a set of specific agent parameters are set for each agent. These agents learn how to improve their execution strategies through interactions with the environment to improve the effectiveness of the interactions and user satisfaction. In the reinforcement learning framework, each agent attempts to adjust and optimize its behavior strategy by maximizing the jackpot that it obtains. And carrying out agent parameter configuration on the initial agent to obtain an executing strategy agent capable of executing the specific strategy. These agents include an input layer, an encoding network, a decoding network, and an output layer capable of processing and analyzing interactive execution parameters of intelligent mobile AI digital humans. The input layer of each agent receives the initial intelligent mobile AI digital human interactive execution parameter combination and encodes and normalizes it to generate a normalized interactive execution parameter vector. And extracting hidden characteristics of the standardized parameter vector through an encoding network of the execution strategy agent, and capturing key information and modes in the interactive execution parameters. The hidden interactive execution parameter vector output by each coding network contains a deep understanding and abstract representation of the original parameters. The decoding network of the execution strategy agent analyzes and processes the hidden interactive execution parameter vector and predicts a series of execution control parameters. Finally, the execution control parameters are converted into specific execution control strategies including a voice response execution strategy, an expression response execution strategy and an action response execution strategy through the output layer of each execution strategy agent.
And 106, determining a target intelligent mobile AI digital person interaction execution parameter combination of the first intelligent mobile AI digital person according to the voice response execution strategy, the expression response execution strategy and the action response execution strategy, and carrying out response optimization on the first intelligent mobile AI digital person based on the target intelligent mobile AI digital person interaction execution parameter combination to obtain a second intelligent mobile AI digital person.
Specifically, the current voice response execution strategy, expression response execution strategy and action response execution strategy of the intelligent mobile AI digital person are subjected to strategy collaborative analysis to obtain correlations and potential collaborative effects among execution strategies with different dimensions, so that a comprehensive collaborative response execution strategy is generated. And determining a target intelligent mobile AI digital person interaction execution parameter combination of the first intelligent mobile AI digital person based on the cooperative response execution strategy. The parameter combination is based on the result of collaborative strategy analysis, and covers the optimization targets and expected behavior patterns of multiple dimensions such as voice, expression, action and the like. These parameters reflect the behavior and the manner in which the intelligent mobile AI digital person should take in future interactions, and also consider how to make these behavior and reactions more consistent and coordinated in different interaction scenarios. And performing response optimization on the first intelligent mobile AI digital person based on the target intelligent mobile AI digital person interaction execution parameter combination, and generating a second intelligent mobile AI digital person. The process involves the adjustment and improvement of the behavior pattern, response logic, interaction policy, etc. of the intelligent mobile AI digital person. Through optimization, the first intelligent mobile AI digital person can better adapt to the interaction requirement and preference of the user while maintaining the core interaction capability of the first intelligent mobile AI digital person, and more personalized and efficient interaction experience is realized. Response optimization is based on analysis and evaluation of existing execution strategies, and also includes prediction and preparation of various interaction scenarios that may be encountered by the intelligent mobile AI digital person in the future.
In the embodiment of the application, the application of the transparent display equipment can seamlessly integrate the digital content with the visual experience of the real world, and the user can interact with the intelligent mobile AI digital person through visual experience, so that the user experience is smoother and lively. By efficiently capturing and analyzing the voice, expression and action data of the target user, the accurate recognition and real-time tracking of the user interaction intention are realized. Through feature extraction and a three-layer Bayesian network, subtle differences of user behaviors can be effectively analyzed, so that more accurate interaction response is provided. The fluency and response speed in the interaction process are ensured, and the user satisfaction is greatly improved. Based on the AI analysis capability, the dialogue content, expression performance and action response of the digital person can be adjusted in real time according to multidimensional data such as voice, expression and action of the user. Through natural language processing and reinforcement learning, specific demands of users can be understood, emotion changes of the users can be perceived, and personalized and emotional response is achieved. The interaction between the user and the intelligent mobile AI digital person is more natural and emotional, and the participation feeling and satisfaction degree of the user are enhanced. The transparent display equipment is combined with the intelligent mobile AI digital man-in-the-art technology, so that information display and interactive consultation can be provided, personalized service can be carried out according to scene requirements, and further control accuracy of intelligent mobile AI digital man-in-the-art is improved.
In a specific embodiment, the process of executing step 101 may specifically include the following steps:
(1) The voice signal acquisition is carried out on the target user through the audio sensor in the transparent display equipment to obtain a voice signal, and short-time Fourier transform is carried out on the voice signal to obtain voice data, wherein the short-time Fourier transform comprises the following steps: ,/> Is speech data,/> Time is expressed by/>Representing the frequency domain of the frequency domain,Is a speech signal,/>Is a window function for at each time point/>The voice signal is subjected to a localization process,Is a complex exponential function representing the frequency domain/>Twiddle factor of/>Is a time index,/>Representing imaginary units;
(2) User interaction identification is carried out on a target user through an image sensor in the transparent display device, user image data are obtained, and image depth calculation is carried out on the user image data, so that pixel depth data are obtained;
(3) Performing pixel pair matching on user image data through pixel depth data to obtain a plurality of target pixel pairs, and performing parallax value calculation on each target pixel pair to obtain a plurality of parallax values;
(4) Performing face correction generation on the user image data through a plurality of parallax values to obtain a corresponding face correction image, and performing expression recognition on the face correction image to obtain expression data;
(5) Performing motion state estimation on the user image data through a preset state estimation function to obtain motion state estimation data, wherein the state estimation function is as follows: ,/> representing motion state estimation data,/> Representing Kalman gain,/>Representing the actual measured value,/>Representing a measurement matrix;
(6) And determining acceleration data, current speed and current position of the target user according to the motion state estimation data, and calculating motion data of the target user according to the acceleration data, the current speed and the current position.
Specifically, the voice signal of the target user is captured by an audio sensor in the transparent display device, and is subjected to short-time Fourier transform and converted into voice data in a frequency domain. The short-time fourier transform is a process of converting a time-domain signal into a signal representing frequency components, by which the frequency characteristics of a speech signal are analyzed. In this process, the signal at each time point is localized by a window function and then converted to the frequency domain by a complex exponential function, thereby capturing the frequency components of the signal and preserving time information, so that the subsequent speech recognition and processing is more accurate and efficient. And visually capturing the target user through an image sensor in the transparent display device to acquire image data of the user. And performing depth calculation on the image data to generate pixel depth data. The pixel depth data provides relative distance information of each pixel point in the image from the sensor, and then three-dimensional space reconstruction and analysis are carried out through pixel pair matching and parallax value calculation. And performing face correction generation on the user image data through a plurality of parallax values, and optimizing the image quality. The facial image after the facial image correction is analyzed in the facial recognition process, the facial change of the user is recognized, and the facial image is converted into facial data. And performing motion state estimation on the user image data through a preset state estimation function, such as a Kalman filter, so as to obtain motion state estimation data. The Kalman filter is an effective method of estimating the state of a dynamic system by combining past estimates, current observations, and a dynamic model of the system to predict the future state of the system. In this process, the motion state estimation data includes the current position, speed of the user, and future motion of the user can also be predicted based on this information.
In a specific embodiment, the process of executing step 102 may specifically include the following steps:
(1) Carrying out feature extraction on voice data by adopting Mel frequency cepstrum coefficients to obtain a voice feature set;
(2) Carrying out convolution characteristic operation on the expression data through a convolution neural network to obtain an expression characteristic set;
(3) Extracting characteristics of the motion data by adopting a skeleton tracking technology to obtain a motion characteristic set;
(4) Performing feature relation construction on the voice feature set, the expression feature set and the action feature set based on a three-layer Bayesian network to obtain a first feature relation between the voice feature set and the expression feature set, a second feature relation between the voice feature set and the action feature set and a third feature relation between the expression feature set and the action feature set;
(5) According to a preset characteristic association system, calculating association strength data of a first characteristic relation to obtain a first association strength value, calculating association strength data of a second characteristic relation to obtain a second association strength value, and calculating association strength data of a third characteristic relation to obtain a third association strength value;
(6) And creating a corresponding characteristic association relation set according to the first association strength value, the second association strength value and the third association strength value.
Specifically, mel frequency cepstrum coefficients are adopted to perform feature extraction on voice data, so that a voice feature set is obtained. Mel-frequency cepstrum coefficients can effectively extract a representative feature set from a complex speech signal by simulating the auditory properties of the human ear, which captures basic properties of speech such as pitch, volume, rhythm, etc. And carrying out convolution characteristic operation on the expression data through a convolution neural network. The convolutional neural network can automatically learn key features of facial expressions, such as key point positions, expression intensities and the like of the facial expressions through multi-layer convolutional operation, a significant expression feature set is extracted from a large amount of complex image data, the convolutional neural network can adapt to different environments and facial changes, and the accuracy and the robustness of expression recognition are ensured. And extracting the characteristics of the motion data by adopting a skeleton tracking technology. By identifying and tracking movements of various critical parts of the human body, skeletal tracking techniques can accurately capture the user's motion information. By analyzing the dynamic change of the human skeleton model, a feature set of the action is extracted, wherein the feature set comprises the type, the speed, the amplitude and the like of the action. And constructing characteristic relations of the voice characteristic set, the expression characteristic set and the action characteristic set based on the three-layer Bayesian network. Three-layer bayesian networks are able to effectively build associations and dependencies between these different types of features, revealing the inherent links between them through probabilistic inference. In this process, the bayesian network analyzes the direct relationships between speech and expression, speech and motion, and expression and motion, and also considers how these relationships work together in the interaction process, revealing more complex feature relationships. And quantifying the association strength of the feature relations through a preset feature association system, and calculating to obtain association strength values of the first, second and third feature relations. The magnitude of the correlation intensity value directly reflects the degree of tightness of the interaction between the different features. And creating a corresponding characteristic association relation set based on each calculated association strength value. The collection contains the relations among various features, and is optimized and adjusted according to the association strength among the features, so that the intelligent mobile AI digital person can show more coordinated and natural behaviors in interaction.
In a specific embodiment, the process of executing step 103 may specifically include the following steps:
(1) User intention analysis is carried out on the voice feature set through a natural language processing model to obtain user intention information, and digital human self-adaptive dialogue response analysis is carried out through the user intention information to generate digital human dialogue response parameters;
(2) Carrying out emotion perception classification on the emotion feature set through a support vector machine model to obtain emotion perception classification results, and carrying out digital human expression response analysis through the emotion perception classification results to generate digital human expression response parameters;
(3) And predicting the user behavior of the action feature set through the graph neural network to obtain a user behavior prediction result, and analyzing the action response through the user behavior prediction result to generate digital human action response parameters.
Specifically, user intention analysis is performed on the voice feature set through a natural language processing model, so that user intention information is obtained. The speech signal is converted into text, and then the text content is analyzed using a natural language processing model to identify keywords, phrases, and sentence structures, thereby deducing the intent of the user. The natural language processing model accurately captures the needs and desires of the user by training the academy to recognize the diversity and complexity of the language, including different utterances, mood and contextual meanings. Based on the analysis result of the user intention, the intelligent mobile AI digital person generates corresponding dialogue response parameters which guide the intelligent mobile AI digital person how to respond to the user in the most appropriate mode, including selecting appropriate vocabulary, constructing appropriate sentences, or adjusting the strategy of dialogue so as to better meet the requirement of the user. And carrying out emotion perception classification on the emotion feature set through a support vector machine model, and mapping the emotion features to specific emotion categories. A support vector machine is a supervised learning model that distinguishes between different classes of data points by building one or more hyperplanes. When processing expression data, the support vector machine model can recognize subtle facial expression changes and classify them as basic moods of happiness, sadness, anger, and the like. The emotion perception classification result provides important information about the current emotion state of the user for the intelligent mobile AI digital person, and based on the information, the intelligent mobile AI digital person can generate corresponding expression response parameters, adjust own expression to match the emotion of the user, or adopt specific language styles and contents to respond to the user so as to improve the emotion depth and the authenticity of interaction. And predicting the user behavior of the action feature set through the graph neural network, analyzing the action data of the user, and predicting the possible next behavior or action intention of the user. The graph neural network, by running on graph structure data, is able to capture complex relationships between motion features, such as temporal and spatial relationships of motion. The prediction result of the user behavior enables the intelligent mobile AI digital person to adjust the behavior of the intelligent mobile AI digital person in advance and generate corresponding action response parameters, and the parameters guide the intelligent mobile AI digital person how to respond to the user through actions, such as simulating certain actions of the user, carrying out expected action prompt or interacting with the actions of the user, so that more natural and smooth man-machine interaction is realized.
In a specific embodiment, the process of executing step 104 may specifically include the following steps:
(1) Performing intelligent mobile AI digital human interactive execution synchronization analysis on the digital human dialogue response parameters, the digital human expression response parameters and the digital human action response parameters to generate interactive execution synchronization information;
(2) According to the interactive execution synchronization information, aligning response parameters of the digital human dialogue response parameters, the digital human expression response parameters and the digital human action response parameters;
(3) And carrying out parameter fusion on the digital human dialogue response parameters, the digital human expression response parameters and the digital human action response parameters according to the characteristic association relation set to generate an initial intelligent mobile AI digital human interaction execution parameter combination.
Specifically, intelligent mobile AI digital human interaction execution synchronization analysis is performed on the digital human dialogue response parameters, the expression response parameters and the action response parameters, so that the interrelationship and time sequence alignment requirements among three response dimensions are obtained, and interaction execution synchronization information is generated. The synchronization information records the timing and logic relationships between the different response parameters and also indicates how to optimize the combination and execution order of these response parameters under a specific interaction scenario to achieve a smoother and natural interaction experience. According to the interactive execution synchronization information, the dialogue, the expression and the action response parameters of the digital person are aligned, so that the language expression, the facial expression and the body action of the intelligent mobile AI digital person can accurately reflect the interactive intention and the emotion state of the user at any given interaction moment, and meanwhile, the consistency of internal logic is maintained. The response parameter alignment work covers fine work of voice adjustment, expression fine adjustment and action adaptation, and the response details such as the intonation and speed of voice, the expression strength and the action amplitude and speed are adjusted to achieve a highly consistent and natural interaction effect. And according to the feature association relation set, parameter fusion is carried out on dialogue, expression and action response parameters of the digital person, results of synchronous analysis and parameter alignment are integrated, and an initial intelligent mobile AI digital person interaction execution parameter combination is generated. The feature association relation set guides how to effectively combine different response parameters together based on understanding of complex relations between user behaviors and intelligent mobile AI digital person responses, and forms a unified and coordinated interaction strategy. This includes how to adjust the voice response, expression change and action selection of the intelligent mobile AI digital person according to the user's language, emotion and behavior patterns, and how to flexibly apply these policies in different interaction scenarios to achieve a personalized and contextualized interaction experience.
In a specific embodiment, the process of executing step 105 may specifically include the following steps:
(1) Generating a first intelligent mobile AI digital person corresponding to the target user according to the initial intelligent mobile AI digital person interaction execution parameter combination through the transparent display equipment;
(2) Creating a plurality of initial agents for the first intelligent mobile AI digital person through a reinforcement learning algorithm, and setting an agent parameter set of each initial agent;
(3) Performing intelligent agent parameter configuration on a plurality of initial intelligent agents according to the intelligent agent parameter set to obtain a plurality of execution strategy intelligent agents, wherein each execution strategy intelligent agent comprises an input layer, a coding network, a decoding network and an output layer;
(4) Respectively inputting the initial intelligent mobile AI digital human interaction execution parameter combination into a plurality of execution strategy agents, receiving the initial intelligent mobile AI digital human interaction execution parameter combination through the input layer of each execution strategy agent, and coding and standardizing the initial intelligent mobile AI digital human interaction execution parameter combination to obtain a standard interaction execution parameter vector corresponding to each input layer;
(5) Extracting hidden characteristics of the standard interaction execution parameter vector corresponding to each input layer through the coding network of each execution strategy agent to obtain the hidden interaction execution parameter vector of each coding network;
(6) Performing execution control parameter prediction on the hidden interaction execution parameter vector of each coding network through the decoding network of each execution strategy agent to obtain a plurality of execution control parameters of each decoding network;
(7) And generating an execution control strategy for a plurality of execution control parameters of each decoding network through an output layer of each execution strategy agent, and outputting a voice response execution strategy, an expression response execution strategy and an action response execution strategy.
Specifically, a first intelligent mobile AI digital person corresponding to a target user is generated through transparent display equipment according to initial intelligent mobile AI digital person interaction execution parameter combinations, wherein the parameter combinations comprise basic response strategies of conversations, expressions and actions, and the purpose is to provide a basic interaction framework. A plurality of initial agents are created for a first intelligent mobile AI digital person by a reinforcement learning algorithm, and a unique set of parameters is configured for each agent. An agent is designed to explore and learn optimal behavior strategies under different interaction scenarios, and each agent's parameter set defines its learning objectives and behavior constraints, thereby guiding them to self-optimize in a simulated interaction environment. And carrying out intelligent agent parameter configuration on a plurality of initial intelligent agents according to the intelligent agent parameter set, so as to ensure that each intelligent agent has certain initial strategy and learning capacity. The executing strategy agent comprises an input layer, an encoding network, a decoding network and an output layer, and forms a neural network structure. The interactive execution parameters of the initial intelligent mobile AI digital person are input into the intelligent agent in a combined way, are received through an input layer, are subjected to preliminary coding and standardization processing, and are converted into standard interactive execution parameter vectors, so that the consistency and the treatability of input data are ensured. And carrying out depth analysis on the standard interaction execution parameter vector through the coding network, and extracting key hidden features. The hidden feature vector contains a deep understanding of the initial interaction execution parameters, reflecting the inherent relationships and patterns between the different parameters. The hidden feature vector is converted into specific execution control parameters through a decoding network, and the parameters directly influence the actual behaviors and response strategies of the intelligent mobile AI digital person. Through the decoding network, the intelligent agent can generate specific behavior instructions according to the extracted features, such as adjusting the intonation of the voice, changing the expression of the expression and executing specific actions. Finally, the execution control parameters are converted into actual execution control strategies including a voice response execution strategy, an expression response execution strategy and an action response execution strategy through the output layer of each agent. These strategies are the result of the agent learning and optimization process, which will be directly applied to the interactive behavior of the intelligent mobile AI digital person, enabling the intelligent mobile AI digital person to interact with the user in a more natural, intelligent and personalized manner.
In a specific embodiment, the process of executing step 106 may specifically include the following steps:
(1) Performing policy collaborative analysis on the voice response execution policy, the expression response execution policy and the action response execution policy to generate a collaborative response execution policy;
(2) Determining a target intelligent mobile AI digital person interaction execution parameter combination of the first intelligent mobile AI digital person through a cooperative response execution strategy;
(3) And performing response optimization on the first intelligent mobile AI digital person based on the target intelligent mobile AI digital person interaction execution parameter combination to obtain a second intelligent mobile AI digital person.
Specifically, the voice response execution policy, the expression response execution policy and the action response execution policy are subjected to policy collaborative analysis, and the interrelation and interaction among the voice response execution policy, the expression response execution policy and the action response execution policy are understood to reveal how to integrate the policies more effectively so as to realize more coordinated interaction experience. And determining a target intelligent mobile AI digital person interaction execution parameter combination of the first intelligent mobile AI digital person based on the analysis result of the cooperative response execution strategy. The parameter combination reflects the optimization objective of a single response dimension and reflects the synergy and integration among multiple response dimensions. This requires the system to consider not only the optimization within each response strategy, but also the balance and coordination between the different response strategies, ensuring that the overall interaction behavior of the intelligent mobile AI digital person is consistent and consistent. For example, the intonation and speed of speech are matched to the change in expression and the rhythm of action to create a unified interactive experience. And performing response optimization on the first intelligent mobile AI digital person based on the target intelligent mobile AI digital person interaction execution parameter combination. And (3) comprehensively optimizing the interaction mode, the behavior logic and the reaction mechanism of the intelligent mobile AI digital person, and performing deep learning and self-optimization on the interaction capability of the intelligent mobile AI digital person through an algorithm and a model such as deep learning and reinforcement learning to generate a second intelligent mobile AI digital person.
The above describes the intelligent mobile AI digital human interaction method based on the transparent display device in the embodiment of the present application, and the following describes the intelligent mobile AI digital human interaction system based on the transparent display device in the embodiment of the present application, please refer to fig. 2, and one embodiment of the intelligent mobile AI digital human interaction system based on the transparent display device in the embodiment of the present application includes:
The recognition module 201 is configured to interactively recognize and track the target user through the transparent display device, so as to obtain voice data, expression data and action data of the target user;
the extraction module 202 is configured to perform feature extraction on the voice data, the expression data, and the motion data to obtain a voice feature set, an expression feature set, and a motion feature set, and perform feature association analysis on the voice feature set, the expression feature set, and the motion feature set based on a three-layer bayesian network to obtain a feature association set;
the analysis module 203 is configured to perform digital human adaptive dialogue response analysis on the voice feature set to generate digital human dialogue response parameters, perform emotion perception and digital human expression response analysis on the emotion feature set to generate digital human expression response parameters, and perform user behavior prediction and action response analysis on the action feature set to generate digital human action response parameters;
The fusion module 204 is configured to perform synchronous analysis and parameter fusion on the digital human interaction execution of the intelligent mobile AI according to the feature association relation set, to generate an initial intelligent mobile AI digital human interaction execution parameter combination;
The adjustment module 205 is configured to generate, through the transparent display device, a first intelligent mobile AI digital person corresponding to the target user according to the initial intelligent mobile AI digital person interaction execution parameter combination, and perform multi-dimensional digital person execution policy adjustment on the first intelligent mobile AI digital person through the reinforcement learning algorithm, so as to obtain a voice response execution policy, an expression response execution policy, and an action response execution policy;
The optimizing module 206 is configured to determine a target intelligent mobile AI digital person interaction execution parameter combination of the first intelligent mobile AI digital person according to the voice response execution policy, the expression response execution policy, and the action response execution policy, and perform response optimization on the first intelligent mobile AI digital person based on the target intelligent mobile AI digital person interaction execution parameter combination, so as to obtain a second intelligent mobile AI digital person.
Through the cooperative cooperation of the components, the digital content and the real-world visual experience can be seamlessly fused through the application of the transparent display device, and the user can interact with the intelligent mobile AI digital person through visual experience, so that the user experience is smoother and lively. By efficiently capturing and analyzing the voice, expression and action data of the target user, the accurate recognition and real-time tracking of the user interaction intention are realized. Through feature extraction and a three-layer Bayesian network, subtle differences of user behaviors can be effectively analyzed, so that more accurate interaction response is provided. The fluency and response speed in the interaction process are ensured, and the user satisfaction is greatly improved. Based on the AI analysis capability, the dialogue content, expression performance and action response of the digital person can be adjusted in real time according to multidimensional data such as voice, expression and action of the user. Through natural language processing and reinforcement learning, specific demands of users can be understood, emotion changes of the users can be perceived, and personalized and emotional response is achieved. The interaction between the user and the intelligent mobile AI digital person is more natural and emotional, and the participation feeling and satisfaction degree of the user are enhanced. The transparent display equipment is combined with the intelligent mobile AI digital man-in-the-art technology, so that information display and interactive consultation can be provided, personalized service can be carried out according to scene requirements, and further control accuracy of intelligent mobile AI digital man-in-the-art is improved.
The application also provides a computer device, which comprises a memory and a processor, wherein the memory stores computer readable instructions, and the computer readable instructions, when executed by the processor, cause the processor to execute the steps of the intelligent mobile AI digital human interaction method based on the transparent display device in the above embodiments.
The present application also provides a computer readable storage medium, which may be a non-volatile computer readable storage medium, and may also be a volatile computer readable storage medium, where instructions are stored in the computer readable storage medium, when the instructions run on a computer, cause the computer to perform the steps of the transparent display device-based intelligent mobile AI digital human interaction method.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, systems and units may refer to the corresponding processes in the foregoing method embodiments, which are not repeated herein.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (9)

1. The intelligent mobile AI digital human interaction method based on the transparent display equipment is characterized by comprising the following steps of:
performing interactive recognition and tracking on a target user through transparent display equipment to obtain voice data, expression data and action data of the target user;
Respectively carrying out feature extraction on the voice data, the expression data and the action data to obtain a voice feature set, an expression feature set and an action feature set, and carrying out feature association relation analysis on the voice feature set, the expression feature set and the action feature set based on a three-layer Bayesian network to obtain a feature association relation set;
performing digital human self-adaptive dialogue response analysis on the voice feature set to generate digital human dialogue response parameters, performing emotion perception and digital human expression response analysis on the expression feature set to generate digital human expression response parameters, and performing user behavior prediction and action response analysis on the action feature set to generate digital human action response parameters;
Performing intelligent mobile AI digital human interaction execution synchronous analysis and parameter fusion on the digital human dialogue response parameters, the digital human expression response parameters and the digital human action response parameters according to the characteristic association relation set to generate initial intelligent mobile AI digital human interaction execution parameter combination; the method specifically comprises the following steps: performing intelligent mobile AI digital human interactive execution synchronization analysis on the digital human dialogue response parameters, the digital human expression response parameters and the digital human action response parameters to generate interactive execution synchronization information; according to the interactive execution synchronization information, carrying out response parameter alignment on the digital human dialogue response parameter, the digital human expression response parameter and the digital human action response parameter; carrying out parameter fusion on the digital person dialogue response parameters, the digital person expression response parameters and the digital person action response parameters according to the characteristic association relation set to generate an initial intelligent mobile AI digital person interaction execution parameter combination;
Generating a first intelligent mobile AI digital person corresponding to the target user according to the initial intelligent mobile AI digital person interaction execution parameter combination through the transparent display equipment, and performing multidimensional digital person execution strategy adjustment on the first intelligent mobile AI digital person through a reinforcement learning algorithm to obtain a voice response execution strategy, an expression response execution strategy and an action response execution strategy;
and determining a target intelligent mobile AI digital person interaction execution parameter combination of the first intelligent mobile AI digital person according to the voice response execution strategy, the expression response execution strategy and the action response execution strategy, and carrying out response optimization on the first intelligent mobile AI digital person based on the target intelligent mobile AI digital person interaction execution parameter combination to obtain a second intelligent mobile AI digital person.
2. The transparent display device-based intelligent mobile AI digital human interaction method of claim 1, wherein the interactive recognition and tracking of the target user through the transparent display device to obtain the voice data, the expression data and the action data of the target user comprises:
The voice signal acquisition is carried out on a target user through an audio sensor in the transparent display equipment to obtain a voice signal, and short-time Fourier transform is carried out on the voice signal to obtain voice data, wherein the short-time Fourier transform comprises: ,/> Is speech data,/> Time is expressed by/>Representing the frequency domain of the frequency domain,Is a speech signal,/>Is a window function for at each time point/>The voice signal is subjected to a localization process,Is a complex exponential function representing the frequency domain/>Twiddle factor of/>Is a time index,/>Representing imaginary units;
user interaction identification is carried out on the target user through an image sensor in the transparent display equipment, user image data are obtained, and image depth calculation is carried out on the user image data, so that pixel depth data are obtained;
Performing pixel pair matching on the user image data through the pixel depth data to obtain a plurality of target pixel pairs, and performing parallax value calculation on each target pixel pair to obtain a plurality of parallax values;
Performing face correction generation on the user image data through a plurality of parallax values to obtain a corresponding face correction image, and performing expression recognition on the face correction image to obtain expression data;
Performing motion state estimation on the user image data through a preset state estimation function to obtain motion state estimation data, wherein the state estimation function is as follows: ,/> representing motion state estimation data,/> Representing Kalman gain,/>Representing the actual measured value,/>Representing a measurement matrix;
And determining acceleration data, current speed and current position of the target user according to the motion state estimation data, and calculating motion data of the target user according to the acceleration data, the current speed and the current position.
3. The transparent display device-based intelligent mobile AI digital human interaction method of claim 1, wherein the feature extraction is performed on the voice data, the expression data, and the action data to obtain a voice feature set, an expression feature set, and an action feature set, and the feature association analysis is performed on the voice feature set, the expression feature set, and the action feature set based on a three-layer bayesian network to obtain a feature association set, including:
carrying out feature extraction on the voice data by adopting Mel frequency cepstrum coefficients to obtain a voice feature set;
performing convolution feature operation on the expression data through a convolution neural network to obtain an expression feature set;
extracting characteristics of the motion data by adopting a skeleton tracking technology to obtain a motion characteristic set;
Performing feature relation construction on the voice feature set, the expression feature set and the action feature set based on a three-layer Bayesian network to obtain a first feature relation between the voice feature set and the expression feature set, a second feature relation between the voice feature set and the action feature set and a third feature relation between the expression feature set and the action feature set;
calculating the association strength data of the first characteristic relation according to a preset characteristic association system to obtain a first association strength value, calculating the association strength data of the second characteristic relation to obtain a second association strength value, and calculating the association strength data of the third characteristic relation to obtain a third association strength value;
And creating a corresponding characteristic association relation set according to the first association strength value, the second association strength value and the third association strength value.
4. The transparent display device-based intelligent mobile AI digital human interaction method of claim 1, wherein the performing digital human adaptive dialogue response analysis on the speech feature set to generate digital human dialogue response parameters, performing emotion perception and digital human expression response analysis on the expression feature set to generate digital human expression response parameters, and performing user behavior prediction and action response analysis on the action feature set to generate digital human action response parameters, comprises:
user intention analysis is carried out on the voice feature set through a natural language processing model to obtain user intention information, and digital human self-adaptive dialogue response analysis is carried out through the user intention information to generate digital human dialogue response parameters;
carrying out emotion perception classification on the expression feature set through a support vector machine model to obtain an emotion perception classification result, and carrying out digital human expression response analysis through the emotion perception classification result to generate digital human expression response parameters;
And predicting the user behavior of the action feature set through the graph neural network to obtain a user behavior prediction result, and performing action response analysis through the user behavior prediction result to generate digital human action response parameters.
5. The transparent display device-based intelligent mobile AI digital person interaction method of claim 1, wherein generating, by the transparent display device, a first intelligent mobile AI digital person corresponding to the target user according to the initial intelligent mobile AI digital person interaction execution parameter combination, and performing multi-dimensional digital person execution policy adjustment on the first intelligent mobile AI digital person by a reinforcement learning algorithm to obtain a voice response execution policy, an expression response execution policy, and an action response execution policy, includes:
Generating a first intelligent mobile AI digital person corresponding to the target user according to the initial intelligent mobile AI digital person interaction execution parameter combination through the transparent display equipment;
creating a plurality of initial agents for the first intelligent mobile AI digital person through a reinforcement learning algorithm, and setting an agent parameter set of each initial agent;
Performing intelligent agent parameter configuration on the plurality of initial intelligent agents according to the intelligent agent parameter set to obtain a plurality of execution strategy intelligent agents, wherein each execution strategy intelligent agent comprises an input layer, a coding network, a decoding network and an output layer;
Inputting the initial intelligent mobile AI digital human interaction execution parameter combination into the execution strategy agents respectively, receiving the initial intelligent mobile AI digital human interaction execution parameter combination through the input layers of each execution strategy agent, and coding and standardizing the initial intelligent mobile AI digital human interaction execution parameter combination to obtain a standard interaction execution parameter vector corresponding to each input layer;
Extracting hidden characteristics of the standard interaction execution parameter vector corresponding to each input layer through the coding network of each execution strategy agent to obtain the hidden interaction execution parameter vector of each coding network;
Performing execution control parameter prediction on the hidden interaction execution parameter vector of each coding network through the decoding network of each execution strategy agent to obtain a plurality of execution control parameters of each decoding network;
And generating an execution control strategy for a plurality of execution control parameters of each decoding network through an output layer of each execution strategy agent, and outputting a voice response execution strategy, an expression response execution strategy and an action response execution strategy.
6. The transparent display device-based intelligent mobile AI digital person interaction method of claim 1, wherein determining a target intelligent mobile AI digital person interaction execution parameter combination of the first intelligent mobile AI digital person according to the voice response execution policy, the expression response execution policy, and the action response execution policy, and performing response optimization on the first intelligent mobile AI digital person based on the target intelligent mobile AI digital person interaction execution parameter combination, to obtain a second intelligent mobile AI digital person, comprises:
performing policy collaborative analysis on the voice response execution policy, the expression response execution policy and the action response execution policy to generate a collaborative response execution policy;
determining a target intelligent mobile AI digital person interaction execution parameter combination of the first intelligent mobile AI digital person through the cooperative response execution strategy;
and performing response optimization on the first intelligent mobile AI digital person based on the target intelligent mobile AI digital person interaction execution parameter combination to obtain a second intelligent mobile AI digital person.
7. An intelligent mobile AI digital human interaction system based on transparent display equipment is characterized in that the intelligent mobile AI digital human interaction system based on transparent display equipment comprises:
The recognition module is used for carrying out interactive recognition and tracking on the target user through the transparent display equipment to obtain voice data, expression data and action data of the target user;
The extraction module is used for carrying out feature extraction on the voice data, the expression data and the action data respectively to obtain a voice feature set, an expression feature set and an action feature set, and carrying out feature association relation analysis on the voice feature set, the expression feature set and the action feature set based on a three-layer Bayesian network to obtain a feature association relation set;
the analysis module is used for carrying out digital human self-adaptive dialogue response analysis on the voice feature set to generate digital human dialogue response parameters, carrying out emotion perception and digital human expression response analysis on the expression feature set to generate digital human expression response parameters, and carrying out user behavior prediction and action response analysis on the action feature set to generate digital human action response parameters;
The fusion module is used for carrying out intelligent mobile AI digital human interaction execution synchronous analysis and parameter fusion on the digital human dialogue response parameters, the digital human expression response parameters and the digital human action response parameters according to the characteristic association relation set to generate initial intelligent mobile AI digital human interaction execution parameter combination; the method specifically comprises the following steps: performing intelligent mobile AI digital human interactive execution synchronization analysis on the digital human dialogue response parameters, the digital human expression response parameters and the digital human action response parameters to generate interactive execution synchronization information; according to the interactive execution synchronization information, carrying out response parameter alignment on the digital human dialogue response parameter, the digital human expression response parameter and the digital human action response parameter; carrying out parameter fusion on the digital person dialogue response parameters, the digital person expression response parameters and the digital person action response parameters according to the characteristic association relation set to generate an initial intelligent mobile AI digital person interaction execution parameter combination;
The adjusting module is used for generating a first intelligent mobile AI digital person corresponding to the target user according to the initial intelligent mobile AI digital person interaction execution parameter combination through the transparent display equipment, and performing multidimensional digital person execution strategy adjustment on the first intelligent mobile AI digital person through a reinforcement learning algorithm to obtain a voice response execution strategy, an expression response execution strategy and an action response execution strategy;
And the optimization module is used for determining a target intelligent mobile AI digital person interaction execution parameter combination of the first intelligent mobile AI digital person according to the voice response execution strategy, the expression response execution strategy and the action response execution strategy, and carrying out response optimization on the first intelligent mobile AI digital person based on the target intelligent mobile AI digital person interaction execution parameter combination to obtain a second intelligent mobile AI digital person.
8. A computer device, the computer device comprising: a memory and at least one processor, the memory having instructions stored therein;
The at least one processor invoking the instructions in the memory to cause the computer device to perform the transparent display device-based intelligent mobile AI digital human interaction method of any of claims 1-6.
9. A computer readable storage medium having instructions stored thereon, which when executed by a processor, implement the transparent display device-based intelligent mobile AI digital human interaction method of any of claims 1-6.
CN202410314018.2A 2024-03-19 2024-03-19 Intelligent mobile AI digital human interaction method and system based on transparent display equipment Active CN117908683B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410314018.2A CN117908683B (en) 2024-03-19 2024-03-19 Intelligent mobile AI digital human interaction method and system based on transparent display equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410314018.2A CN117908683B (en) 2024-03-19 2024-03-19 Intelligent mobile AI digital human interaction method and system based on transparent display equipment

Publications (2)

Publication Number Publication Date
CN117908683A CN117908683A (en) 2024-04-19
CN117908683B true CN117908683B (en) 2024-05-28

Family

ID=90689332

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410314018.2A Active CN117908683B (en) 2024-03-19 2024-03-19 Intelligent mobile AI digital human interaction method and system based on transparent display equipment

Country Status (1)

Country Link
CN (1) CN117908683B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107765852A (en) * 2017-10-11 2018-03-06 北京光年无限科技有限公司 Multi-modal interaction processing method and system based on visual human
CN112221140A (en) * 2020-11-04 2021-01-15 腾讯科技(深圳)有限公司 Motion determination model training method, device, equipment and medium for virtual object
CN115455136A (en) * 2022-03-02 2022-12-09 杭州摸象大数据科技有限公司 Intelligent digital human marketing interaction method and device, computer equipment and storage medium
CN117251057A (en) * 2023-10-17 2023-12-19 世纪恒通科技股份有限公司 AIGC-based method and system for constructing AI number wisdom
CN117520498A (en) * 2023-11-07 2024-02-06 云从科技集团股份有限公司 Virtual digital human interaction processing method, system, terminal, equipment and medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11544886B2 (en) * 2019-12-17 2023-01-03 Samsung Electronics Co., Ltd. Generating digital avatar
CN112162628A (en) * 2020-09-01 2021-01-01 魔珐(上海)信息科技有限公司 Multi-mode interaction method, device and system based on virtual role, storage medium and terminal

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107765852A (en) * 2017-10-11 2018-03-06 北京光年无限科技有限公司 Multi-modal interaction processing method and system based on visual human
CN112221140A (en) * 2020-11-04 2021-01-15 腾讯科技(深圳)有限公司 Motion determination model training method, device, equipment and medium for virtual object
CN115455136A (en) * 2022-03-02 2022-12-09 杭州摸象大数据科技有限公司 Intelligent digital human marketing interaction method and device, computer equipment and storage medium
CN117251057A (en) * 2023-10-17 2023-12-19 世纪恒通科技股份有限公司 AIGC-based method and system for constructing AI number wisdom
CN117520498A (en) * 2023-11-07 2024-02-06 云从科技集团股份有限公司 Virtual digital human interaction processing method, system, terminal, equipment and medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Dongjae Kim ; Sang Wan Lee.Context-dependent meta-control for reinforcement learning using a Dirichlet process Gaussian mixture model.2018 6th International Conference on Brain-Computer Interface (BCI).2018,全文. *
何俊 ; 刘跃 ; 何忠文.多模态情感识别研究进展.《计算机应用研究》.2018,3201-3205. *

Also Published As

Publication number Publication date
CN117908683A (en) 2024-04-19

Similar Documents

Publication Publication Date Title
CN111312245B (en) Voice response method, device and storage medium
Sadoughi et al. Speech-driven expressive talking lips with conditional sequential generative adversarial networks
Chen et al. A novel dual attention-based BLSTM with hybrid features in speech emotion recognition
CN108227932A (en) Interaction is intended to determine method and device, computer equipment and storage medium
CN111966800B (en) Emotion dialogue generation method and device and emotion dialogue model training method and device
Lee et al. Modeling mutual influence of interlocutor emotion states in dyadic spoken interactions.
CN108334583A (en) Affective interaction method and device, computer readable storage medium, computer equipment
CN117521675A (en) Information processing method, device, equipment and storage medium based on large language model
CN110110169A (en) Man-machine interaction method and human-computer interaction device
KR20210070213A (en) Voice user interface
CN116070169A (en) Model training method and device, electronic equipment and storage medium
CN104541324A (en) A speech recognition system and a method of using dynamic bayesian network models
KR101738142B1 (en) System for generating digital life based on emotion and controlling method therefore
CN113220851A (en) Human-machine personalized dialogue method and system based on reasoning dialogue model
CN117234341B (en) Virtual reality man-machine interaction method and system based on artificial intelligence
CN113763979A (en) Audio noise reduction and audio noise reduction model processing method, device, equipment and medium
Lee et al. Deep representation learning for affective speech signal analysis and processing: Preventing unwanted signal disparities
WO2024100129A1 (en) Audio-driven body motion synthesis using a diffusion probabilistic model
Zhu et al. Emotion recognition based on brain-like multimodal hierarchical perception
CN117251057A (en) AIGC-based method and system for constructing AI number wisdom
CN114373443A (en) Speech synthesis method and apparatus, computing device, storage medium, and program product
Wang Research on the Construction of Human‐Computer Interaction System Based on a Machine Learning Algorithm
CN110363074B (en) Humanoid recognition interaction method for complex abstract events
CN117908683B (en) Intelligent mobile AI digital human interaction method and system based on transparent display equipment
Jothimani et al. A new spatio-temporal neural architecture with Bi-LSTM for multimodal emotion recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant