CN117788235A - Personalized talent training method, system, equipment and medium - Google Patents

Personalized talent training method, system, equipment and medium Download PDF

Info

Publication number
CN117788235A
CN117788235A CN202311700258.8A CN202311700258A CN117788235A CN 117788235 A CN117788235 A CN 117788235A CN 202311700258 A CN202311700258 A CN 202311700258A CN 117788235 A CN117788235 A CN 117788235A
Authority
CN
China
Prior art keywords
data
talent
model
brain wave
personalized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311700258.8A
Other languages
Chinese (zh)
Inventor
李翔
赵璧
吴美玲
詹歆
吴云川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xinlicheng Education Technology Co ltd
Original Assignee
Xinlicheng Education Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xinlicheng Education Technology Co ltd filed Critical Xinlicheng Education Technology Co ltd
Priority to CN202311700258.8A priority Critical patent/CN117788235A/en
Publication of CN117788235A publication Critical patent/CN117788235A/en
Pending legal-status Critical Current

Links

Landscapes

  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention provides a personalized talent training method, a system, equipment and a medium, wherein the method comprises the following steps: collecting perception data corresponding to a talent expressive person and/or a viewer in the talent training process, wherein the perception data is at least one of voice data, brain wave data, eye movement data or facial data; performing deep learning and analysis on the perception data based on a pre-established deep learning model to obtain the corresponding performance characteristics of different perception data; the performance characteristics include speech characteristics, psychological characteristics, attention characteristics, line of sight characteristics, and emotional characteristics; and calling an information base, and screening the suggestion information corresponding to different performance characteristics. The invention can comprehensively and accurately analyze the expression of the talent expressive person, give individualized feedback to the students and improve the efficiency of talent training.

Description

Personalized talent training method, system, equipment and medium
Technical Field
The invention relates to the technical field of talent training, in particular to a personalized talent training method, a system, equipment and a medium.
Background
The traditional talent training mainly adopts a teacher teaching mode, a talent expressive person faces a teacher to conduct expression communication, and the teacher subjectively judges the expression communication effect of the talent expressive person by observing the expression of the talent expressive person in the expression communication process. However, the teacher teaching mode lacks interactivity and is low in efficiency, and the traditional talent training mode is difficult to conduct personalized guidance aiming at specific situations of individuals, and learning effects of students are difficult to evaluate. Meanwhile, in the field of talent training, the expression language of a talent expressive person is generally evaluated, and perception data such as brain wave data and the like when the talent expressive person communicates are ignored, so that analysis of other dimensions is lacking, and a talent analysis result is too thin and cannot provide targeted advice and guidance for the talent expressive person.
Disclosure of Invention
The embodiment of the invention provides a personalized talent training method, a personalized talent training system, personalized talent training equipment and a personalized talent training medium, which are used for solving the problems of the related technologies and have the following technical scheme:
in a first aspect, an embodiment of the present invention provides a personalized talent training method, including:
collecting perception data corresponding to a talent expressive person and/or a viewer in a talent training process, wherein the perception data is at least one of voice data, brain wave data, eye movement data or facial data;
performing deep learning and analysis on the perception data based on a pre-established deep learning model to obtain the corresponding performance characteristics of different perception data; the deep learning model comprises an emotion analysis model and a behavior prediction model; the performance characteristics include speech characteristics, psychological characteristics, attention characteristics, line of sight characteristics, and emotional characteristics;
and calling an information base, and screening the suggestion information corresponding to different performance characteristics.
In one embodiment, the perceptual data is speech data, further comprising:
preprocessing voice data to generate preprocessed voice data;
performing text conversion on the preprocessed voice data to generate first display information for displaying text;
Carrying out semantic analysis on the text to obtain a semantic analysis result;
judging whether semantic errors exist in the semantic analysis result, and determining semantic error contents;
and generating second display information for displaying the content of the semantic error and the semantic analysis result according to the content of the semantic error and the semantic analysis result.
In one embodiment, analyzing the speech data based on the deep learning module includes:
based on the preprocessed voice data as a model input, the model input is input into a pre-established voice deep learning model for learning and training, and voice characteristics are output, wherein the voice characteristics comprise voice definition, voice speed and voice mood.
In one embodiment, the brain wave data includes brain wave signals and cortex signals; if brain wave data is acquired, the method for analyzing the brain wave data comprises the following steps:
extracting and marking signal characteristics of brain wave data, wherein the signal characteristics comprise frequency, amplitude and spectral density;
and (3) inputting brain wave data marked with signal characteristics as a model, inputting the brain wave data into a pre-established brain wave deep learning model for learning and training, and outputting psychological characteristics and attention characteristics, wherein the attention characteristics comprise attention concentration degree, memory capacity and response speed.
In one embodiment, if eye movement data is collected, the method for analyzing the eye movement data comprises the following steps:
the eye movement data is used as a model to be input into a pre-established eye movement deep learning model for analysis, and the sight line characteristics are output; the gaze feature includes a point of interest location and a gaze time.
In one embodiment, if facial data is collected, the method for analyzing the facial data is as follows:
the facial data is input as a model, analyzed based on a pre-established facial deep learning model, and emotional characteristics are output.
In one embodiment, after screening the suggestion information corresponding to the performance characteristics, the method further includes:
visually displaying the performance characteristics and the suggestion information; visual presentation includes presentation using at least one of charts, animations and interactive games.
In a second aspect, an embodiment of the present invention provides a personalized talent training system, which executes the personalized talent training method described above; the system comprises:
the acquisition module is used for acquiring perception data corresponding to the talents and/or viewers in the talent training process of the talents and the viewers, wherein the perception data is one or a combination of two or more of voice data, brain wave data, eye movement data and facial data;
The model analysis module is used for performing deep learning and analysis on the perception data according to a pre-established deep learning model and outputting the performance characteristics corresponding to different perception data; the performance characteristics include speech characteristics, psychological characteristics, attention characteristics, line of sight characteristics, and emotional characteristics;
and the suggestion pushing model is used for calling an information base to screen and push suggestion information corresponding to different performance characteristics.
In a third aspect, an embodiment of the present invention provides an electronic device, including: memory and a processor. Wherein the memory and the processor are in communication with each other via an internal connection, the memory is configured to store instructions, the processor is configured to execute the instructions stored by the memory, and when the processor executes the instructions stored by the memory, the processor is configured to perform the method of any one of the embodiments of the above aspects.
In a fourth aspect, embodiments of the present invention provide a computer readable storage medium storing a computer program, the method of any one of the above embodiments being performed when the computer program is run on a computer.
The advantages or beneficial effects in the technical scheme at least comprise:
the perception data of the spoken expressive person in the expression communication process is collected by adopting multi-perception technologies such as brain wave, eye movement tracking, facial expression recognition and the like, so that the expression of the spoken expressive person can be comprehensively and accurately analyzed; meanwhile, the deep learning technology is adopted to analyze the expression communication data of the students, so that personalized feedback can be given to the students; the interaction between a student and a teacher is realized by adopting the Internet of things technology, and the efficiency of talent training can be improved.
The foregoing summary is for the purpose of the specification only and is not intended to be limiting in any way. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features of the present invention will become apparent by reference to the drawings and the following detailed description.
Drawings
In the drawings, the same reference numerals refer to the same or similar parts or elements throughout the several views unless otherwise specified. The figures are not necessarily drawn to scale. It is appreciated that these drawings depict only some embodiments according to the disclosure and are not therefore to be considered limiting of its scope.
FIG. 1 is a flow chart of the personalized talent training method of the present invention;
FIG. 2 is a block diagram of a personalized talent training system of the present invention;
fig. 3 is a block diagram schematically illustrating a block structure of the electronic device of the present invention.
Detailed Description
Hereinafter, only certain exemplary embodiments are briefly described. As will be recognized by those of skill in the pertinent art, the described embodiments may be modified in various different ways without departing from the spirit or scope of the present invention. Accordingly, the drawings and description are to be regarded as illustrative in nature and not as restrictive.
Example 1
The embodiment provides a personalized talent training method, as shown in fig. 1, specifically comprising the following steps:
step S1: responding to the spoken training request to activate the perception equipment of the spoken expressive person and/or the viewer to acquire perception data corresponding to the spoken expressive person and/or the viewer in the spoken training process of the spoken expressive person, wherein the perception data is at least one of voice data, brain wave data, eye movement data or face data;
step S2: performing deep learning and analysis on the perception data based on a pre-established deep learning model, and outputting the performance characteristics corresponding to different perception data; the performance characteristics include speech characteristics, psychological characteristics, attention characteristics, line of sight characteristics, and emotional characteristics;
Step S3: and calling the information base to screen the suggestion information corresponding to the different performance characteristics and pushing the suggestion information.
The oral expressive person is a student who needs to express communication, and the viewer is a person who views the expression communication; wherein the talent expressive person can be a teacher for teaching, and the viewer can be a student; in addition, the identities of the students and the teacher can be exchanged according to the actual demands, namely, the spoken expressive person can also be the student, and the viewer is the teacher; personalized feedback is provided to the teacher or student who is expressing the communication by observing the teacher's and/or student's perception data.
Activating perception equipment of a talent expressive person and/or a viewer after a talent training request is sent, namely, when the talent expressive person is in expression communication, the perception equipment of the talent expressive person can be activated to collect perception data of the talent expressive person, and the expression of the talent expressive person is directly and automatically evaluated; and when the oral expressive person expresses communication, the perception equipment of the viewer can be activated, the oral of the current oral expressive person is reflected by acquiring the perception data side of the viewer, the feedback of the angle of the viewer is provided for the oral expressive person, and the expression communication expression of the oral expressive person is comprehensively analyzed.
The sensing equipment comprises voice acquisition equipment, brain wave acquisition equipment, eye movement tracking equipment and a camera; the voice collecting device is mainly a microphone, which can be a moving coil microphone, a capacitor microphone, a crystal microphone and the like, and the microphone is responsible for converting sound into an electric signal; the perceptual data acquired by the microphone is then speech data.
The brain wave acquisition device mainly comprises brain wave sensors, such as an electrode type brain wave sensor, a photoelectric type brain wave sensor and a magneto-electric type brain wave sensor, wherein electrodes are attached to the scalp of a user to acquire brain wave signals, and weak electric signals generated by the brain can be converted into digital signals so as to be analyzed and processed; the brain wave acquisition device can also be a brain wave-cortex signal combined acquisition device, and the brain wave-cortex signal combined acquisition device can acquire brain wave signals and cortex signals at the same time so as to better know the brain activities; the sensing data acquired by the brain wave acquisition device is brain wave data, wherein the brain wave data comprises brain wave signals and cortex signals.
The eye movement tracking device is mainly an eye movement instrument and can collect information such as eyeball position, movement speed, movement direction and the like of a user; eye movement instruments typically employ infrared optical techniques to collect eye position information and visual inertial navigation techniques to calculate eye movement speed and direction. The perceived data collected by the eye tracking device is eye movement data.
The camera is mainly used for capturing facial features of a user so as to perform facial analysis; the perceived data acquired by the camera is therefore facial data.
It should be noted that the above-mentioned sensing device can be activated according to actual conditions, and under the best condition, the voice acquisition device, the brain wave acquisition device, the eye movement tracking device and the camera are all opened and put into use, so that the various features of the expressive person and/or the viewer in the expression and communication process can be most comprehensively analyzed from different angles, and the comprehensiveness and accuracy of analysis can be improved. In addition, one of voice acquisition equipment, brain wave acquisition equipment, eye movement tracking equipment or cameras can be used according to actual demands, or two or more of voice acquisition equipment, brain wave acquisition equipment, eye movement tracking equipment or cameras are used for acquiring perception of a user in the process of expression communication so as to obtain corresponding perception data, so that corresponding talent training suggestions are generated after analysis.
In the step S2 of this embodiment, the analysis of the perception data is implemented by using a deep learning model including an emotion analysis model and a behavior prediction model in a specific field of talents, and specifically includes the following steps:
1. Emotion analysis model.
The general approach to emotion analysis is to convert the perceived data of the digital signal into corresponding text data and map the corresponding text data to emotion polarity space, such as positive, negative and neutral. The perception data comprises voice data, brain wave data, eye movement data and face data, digital signals of the perception data are converted into text data, for example, voice data are converted into text, or frequency and amplitude signals corresponding to the brain wave data are converted into text data which can be identified by a model, emotion of a talent is expressed through the model analysis, corresponding characteristics are obtained, and proper suggestions are provided for the talent according to the characteristics. The model may employ the following mathematical formula:
when higher level text representations are involved, some more complex concepts such as attentive mechanisms, concentration, tiredness, exclamation, long tail words, frequency, etc. may be introduced to achieve higher level text representations. The following is an updated high-level data operation formula:
(one) input representation (Advanced):
given a text sequence X, it can be represented as a word embedding matrix E, where each row corresponds to a word embedding of a word. The length of X is T, and the input of each time step T is a word embedding vector x_t, so x= [ x_1, x_2, ], x_t ].
In this more advanced model, complex attention mechanisms are introduced to capture information in text more finely. Specifically, consider the following concept:
1. attention mechanism (Attention Mechanism), expressed as:
αt=Attention(E,ht-1)
αt (Attention): this is a weight of attention that represents the degree of attention to the embedding of different words in the text at time step t. In particular, it is used to dynamically weight different word embeddings to capture important information in text. This weight can be calculated from the previous hidden state h_ { t-1} and the word embedding matrix E to ensure that different words are focused at different time steps.
Wherein the Attention () function determines how the Attention weight is calculated, h_ { t-1} is the hidden state of the last time step. The attention weight may then be used to calculate an input representation of the current time step:
x t (input representation): this is an input representation at time step t that represents a weighted sum of each word embedding in the text. It is dynamically calculated from the above-described attention weights αt and word embedding E to more accurately represent text information.
2. Concentration (Focus): a concentration variable γt is introduced, representing the concentration level in the text at time step $t$. This concentration may be dynamically adjusted based on context and history information. This process can be expressed as:
γ t =Focus(X,h t-1 ,context);
γt (Focus): this is a variable representing the level of concentration in the text at time step t. For dynamically adjusting the degree of attention based on context and history information. Concentration can affect fluency and consistency of spoken expressions, as different parts of the text may require different degrees of attention at different time steps. The Focus () function determines, among other things, how the concentration is calculated.
3. Feel tired (Fatigue): considering the tiredness of the user or text reader, the sensation may be dynamically changed by time. The introduction of a fatigue sensing variable phit can affect the attentiveness mechanism and concentration. This process can be expressed as:
φ t =Fatigue(t);
phi t (Fatigue): this is a variable that represents the tiredness of the user or text reader. May change dynamically over time for affecting the attentive mechanisms and concentration. The feeling of fatigue may be calculated from time t to ensure a moderate adjustment of the strength of the talent training during long talent training. Wherein the Fatigue () function can calculate the feeling of Fatigue from the time t.
4. Exclamation (Exclamations): considering an interjective in text, it is common to represent a portion of strong emotion. We introduce an interjective weight βt that is dynamically adjusted according to the occurrence of an interjective in the text. This process can be expressed as:
β t =Exclamations(X);
Beta t (Exclamations): this is a variable that represents the weight of an exclamation word in the text. The exclamation word generally indicates a portion where emotion is strong. The weights are dynamically adjusted based on the occurrence of interjections in the text. This ensures that the model is more focused on the parts with rich emotion and improves the infectivity expressed by the talents. Wherein the exclusions () function calculates the weights from the occurrence of exclamation words in the text.
5. Long Tail Words (Long-Tail Words): some words appear less frequently in text, but may have important information. The long-tailed word weight ωt is introduced for weighting the word embedding of these unusual words. This process can be expressed as:
ω t =LongTailWords(X);
omega t (LongTailWords): this is a variable representing the weight of the long-tailed word. Some words appear less frequently in text, but may contain important information, particularly in field-specific or professional text. The long-tailed word weight may be dynamically adjusted based on the frequency of occurrence of the word. This helps the model better capture the meaning of unusual words, improving the accuracy of the spoken utterance. Wherein the LongTailWords () function calculates weights according to the frequency of occurrence of words.
Finally, we can combine the above elements, calculate the input representation x_t of the current time step t, to more accurately capture the information in the text:
x t =α t,i ·γ t ·φ t ·β t ·ω t ·E i
The following is a specific definition explanation for each index:
αt (Attention): this is a weight of attention that represents the degree of attention to the embedding of different words in the text at time step t.
xt (input representation): this is an input representation at time step t that represents a weighted sum of each word embedding in the text. It is dynamically calculated from the above-described attention weights αt and word embedding E to more accurately represent text information.
γt (Focus): this is a variable representing the level of concentration in the text at time step t. It is used to dynamically adjust the degree of attention based on context and history information. Concentration can affect fluency and consistency of spoken expressions, as different parts of the text may require different degrees of attention at different time steps.
Phi t (Fatigue): this is a variable that represents the tiredness of the user or text reader. May change dynamically over time for affecting the attentive mechanisms and concentration. The feeling of fatigue may be calculated from the time $t$ to ensure a moderate adjustment of the strength of the spoken training over a long period of spoken training.
Beta t (Exclamations): this is a variable that represents the weight of an exclamation word in the text. The exclamation word generally indicates a portion where emotion is strong. It dynamically adjusts the weights based on the occurrence of interjections in the text. This ensures that the model is more focused on the parts with rich emotion and improves the infectivity expressed by the talents.
Omega t (LongTailWords): this is a variable representing the weight of the long-tailed word. Some words appear less frequently in text, but may contain important information, particularly in field-specific or professional text. The long-tailed word weight may be dynamically adjusted based on the frequency of occurrence of the word. This helps the model better capture the meaning of unusual words, improving the accuracy of the spoken utterance.
Finally, these indices are combined together for computing the input representation x_t of the current time step t to better capture the information in the text, providing more accurate and personalized spoken training advice. Dynamic adjustment and weighting of these metrics may improve the adaptability and expressive power of the spoken utterance model. This more advanced input representation model fully accounts for various factors in the text to provide a more accurate and informative representation of the text. This may help the model better understand the semantic and emotional information of the text.
The effects obtainable according to the above are:
1. concentration (Focus): introducing the concentration variable allows the model to assign different degrees of interest to different portions of text at different time steps. This is important for personalized spoken training in the inventive patent, as the text content at certain moments may be more critical than at other moments, requiring a higher attention. Thus, our invention can better adapt to the needs and situations of users, and provide more targeted talent training advice.
2. Feel tired (Fatigue): considering the tiredness of the user or text reader helps to improve the user experience. During long spoken training, users may feel tired and their concentration and interest may be lost. By introducing the fatigue sensing variable, the strength and content of the talent training can be adjusted to ensure that the user remains focused during the training period and the training effect is improved.
3. Exclamation (Exclamations): considering the interjections in the text allows the model to better identify the portion of strong emotion. In talent training, these strong emotional portions may be critical, as they may help to increase the infectivity and expressivity of talents. By introducing interjective weights, we can ensure that the model is more focused on these important emotion markers.
4. Long Tail Words (Long-Tail Words): some unusual words may contain critical information, especially in specific fields or professional text. The introduction of long-tailed word weights allows the model to better capture the meaning of these unusual words. In spoken training, this is very important for spoken expressions covering a wide range of topics and terminology.
By comprehensively considering the factors, the text can be more comprehensively analyzed, and semantic and emotion information in the text can be more accurately understood, so that more personalized and effective talent training suggestions are provided, and the talent expression capability of a user is improved. The method is not only beneficial to the fields of education and communication, but also has wide application prospects in a plurality of fields such as business and entertainment.
And secondly, introducing a higher-level model to fuse a plurality of characteristics expressed by the spoken words and dynamically weight word embedding of different words in an emotion classification layer. This more advanced model would include emotion-behavioural dual-attention RNNs and attention mechanisms to provide finer emotion classification results. The following is an updated high-level data operation formula:
in this more advanced model, the output of emotion-behavioural double Attention RNN (Sentiment-Behavior Dual-Attention RNN) is introduced as input, fusing multiple characteristics expressed by the mouth:
1. output of emotion-behavioural dual attention RNN:
the hidden state sequence $h$ obtained by emotion-behavior dual attention RNN will be divided into two parts, one for emotion classification and the other for considering the talent expression characteristics. This can be expressed as:
H Sentiment =[h Sentiment,1 ,h Sentiment,2 ,...,h Sentiment ,T]
H Behavior =[h Behavior,1 ,h Behavior,2 ,...,h Behavior,T ];
h Sentiment: this is an emotion-related sequence of hidden states representing emotion information for each time step in the text. It is calculated from emotion-Behavior Dual Attention RNN (Sentiment-Behavior Dual-Attention RNN). H of each time step Sentiment,t Emotion information of this time step is included, which is important in emotion analysis tasks.
H Behavior: this is a sequence of behavior-related hidden states representing behavior information for each time step in the text. It is also calculated from emotion-behavioral dual attention RNN. H of each time step Behavior,t The time step behavior information is included, which is important in the training of the spoken expression. Wherein H is Sentiment Comprises a hidden state sequence for emotion classification, H Behavior A sequence of hidden states for taking into account the characteristics of the talent expression is included.
2. Emotion classification attention (Sentiment Attention):
an emotion classification attention mechanism is introduced for dynamically weighting emotion hidden state sequence $H_ { text { Sentiment } $ so as to improve sensitivity to emotion, and characteristics of emotion expression, such as positive emotion, negative emotion and the like, are considered. This process can be expressed as:
α Sentiment =SentimentAttention(H Sentiment );
α Sentiment : this is an emotion-related attention weight used to capture important information in the hidden state sequence HSentiment. This weight is calculated by the sentimentAttention function, which ensures that emotion-related parts are more focused in the emotion analysis task.
3. Talcs express characteristic attention (Expression Feature Attention):
the characteristic attention mechanism is introduced for dynamically weighting the characteristic of the spoken utterance, such as sound production, loudness, frequency, timbre, consistency, speech and emotion fusion, etc. This process can be expressed as:
α Expression =ExpressionFeatureAttention(X);
α Expression : this is the attention weight of the spoken feature used to capture important information in the input text sequence X. This weight is calculated from the expression FeatureAttention function, which ensures that expression-related parts are more focused in the spoken expression training.
4. Emotion classification results:
and finally, comprehensively considering the emotion classification attention weight and the talent expression characteristic attention weight to fuse the emotion classification and the talent expression characteristic, and obtaining a final emotion classification result Y. This process can be expressed as:
Y=FC(α Sentiment ⊙H SentimentExpression ⊙X);
y: this is the final emotion classification result, representing the emotion polarity of the text. Calculated by the full connection layer (FC) and utilizing the attention weight alpha related to emotion Sentimen And the corresponding emotion hidden state sequence H Sentiment, and expressing the associated attention weight alpha Expression And entering a text sequence X. By combining emotion and expression information, emotion classification can be performed more accurately, which is important for personalized feedback of talent training. Wherein, as indicated by element wise multiplication, FC represents a multi-layer fully connected neural network.
This higher-level emotion classification model focuses not only on emotion information, but also fully considers multiple characteristics of spoken expressions, such as vocalization, loudness, frequency, timbre, consistency, speech-to-emotion fusion, positive emotion, negative emotion, etc., to provide more accurate emotion classification results. At the same time, the attention mechanism is used to dynamically adjust weights to better capture important information in the text.
(III) emotion classification layer:
and finally, mapping the hidden state sequence H to an emotion polarity space through a full connection layer to obtain an emotion classification result Y. In this advanced model, we can introduce multi-layer fully connected neural networks and batch normalization (Batch Normalization) techniques to increase the depth and expressive power of the model.
Y=FC(H)
Here FC stands for a multi-layer fully connected neural network.
The above has the following effects on the performance and function of the spoken utterance model:
1. output of emotion-behavioural dual attention RNN: dividing the output of emotion-behavior dual attention RNN into emotion and talent expression is beneficial to dividing the task of the model more clearly, thereby better capturing and understanding emotion and talent expression characteristics in the text. This helps provide more accurate training feedback, enabling the model to better adapt to different tasks.
2. Emotion classification attention and talent expression characteristic attention: the introduction of these two attention mechanisms helps the model to focus better on important parts of emotion and talent expression. Emotion classification attention improves emotion sensitivity, helping to classify emotion more accurately, while talent expression feature attention allows the model to dynamically focus on different talent features according to different tasks. The introduction of these attention mechanisms helps to improve the adaptation and performance of the model.
3. Comprehensive emotion classification results: by comprehensively considering emotion classification attention weight and talent expression characteristics attention weight, the model can integrate emotion classification and talent expression characteristics, so that richer emotion analysis results are provided. This is very important for personalized feedback of talent training, as it not only tells the user about the emotion of the text, but also provides information about the expression, such as sound characteristics and intonation.
4. Dynamically adjusting weights: the introduction of the attention mechanism allows the model to dynamically adjust the weights to better capture important information in the text. This helps to increase the flexibility and performance of the model, as it can adjust the focus of attention to different text and tasks.
The added contents enrich the expression model of the talents, so that the expression model can process text emotion and expression modes more comprehensively and provide more accurate and personalized training feedback. The method has important significance for the talent training application, and can improve the talent expression capability and emotion analysis accuracy of the user. The relevant technical innovation part of the inventive patent may include these matters to protect this innovation.
2. Behavior prediction model for spoken specific domain.
Behavior prediction model for spoken specific domain. Can be further complicated to more fully consider the multiple dimensions of the spoken utterance. The following is a more complex example model, including multiple sub-models to capture different spoken utterance features:
feature extraction: various features are extracted from the speech of the speaker including text, audio, facial expressions, speech speed, voice tones, consistency, grammar correctness, etc. These features may be represented as X, where x= [ text, audio, face, speech speed, tone, consistency, grammar.
Target value: a target value Y is defined that represents metrics for a number of speech related dimensions, such as speech effects, sound quality, facial expression confidence, grammar scores, etc. These are the values that we want to predict, which can be expressed as y= [ y_1, y_2, ], y_m ].
Deep learning model: a multi-modal deep learning model is used, comprising the following components:
text analysis sub-model: a Recurrent Neural Network (RNN) or Convolutional Neural Network (CNN) is used to process the speech text, predicting the text-related spoken dimensions.
Audio analysis sub-model: the audio data is processed using a convolutional neural network or a recurrent neural network to capture audio-related spoken features.
Facial expression analysis submodel: a convolutional neural network or facial feature extractor is used to analyze the facial expression data, predicting the talent dimension to which the facial expression relates.
Continuity and parse sub-model: natural Language Processing (NLP) techniques are used to analyze the continuity and grammar correctness of the lecture text, predict continuity and grammar scores.
And (3) comprehensive model: and fusing the outputs of the submodels together to form the final spoken dimension prediction. This may be achieved by a multi-layer fully connected neural network or an attention mechanism.
Training a model: the model is trained using the data of the known spoken expressive and the corresponding metric values of the spoken related dimension, finding the optimal parameters so that the model can best predict the values of the spoken related dimension.
Predicting behavior: once the model training is complete, the speech features of the new spoken expressive can be input into the model, which is used to predict the values of multiple spoken related dimensions, i.e., Y.
This more complex spoken behavior prediction model takes into account aspects of the spoken utterance including text, sound, facial expression, consistency, grammar, etc., thereby providing more comprehensive and accurate training feedback. The model structure and mathematical formulas will depend on the specific tasks and data and can be further designed and optimized according to the requirements. This comprehensive model helps the speaker to fully improve his/her talent ability and better understand the various aspects of his/her performance.
In one embodiment, after activating the voice acquisition device to obtain the voice data of the speaker, the method further includes:
after preprocessing the voice data, converting the voice data into text, and displaying the text converted by the voice as first display information, so that a talent expressive person can know the expression communication content of the talent expressive person;
the preprocessing module can reduce noise of the electric signal, remove noise and enhance definition of the voice signal through a voice enhancement algorithm;
carrying out semantic analysis on the text to obtain a semantic analysis result, judging whether semantic errors exist in the semantic analysis result, and determining semantic error contents;
according to the content of the semantic errors and the semantic analysis result, generating second display information for displaying the content of the semantic errors and the semantic analysis result and displaying the second display information, so that a talent expressive person can know the content of the semantic errors existing in the expression communication.
The semantic analysis can be to analyze the semantics after dividing the words in the text, and if the semantic analysis shows that the semantics are wrong, marking the wrong text position and displaying.
In one embodiment, the method for analyzing the voice data based on the deep learning module is as follows:
The preprocessed voice data is used as a model input, and is input into a pre-established voice deep learning model for learning and training to output voice characteristics, wherein the voice characteristics comprise voice definition, voice speed and voice.
The pre-constructed deep learning model is trained by a large number of sample packets, a large number of voice data are input as samples, and voice features are output as samples; the trained speech deep learning model can analyze speech definition, speech speed and intonation in speech data, wherein the intonation can be distinguished according to intonation, and mainly comprises statement intonation, imperative intonation, exclamation into the intonation, doubtful intonation and the like.
Finally, the speech characteristics of the expressive person of the export can be displayed in a mode of combining the speech playing with the text display and the graphic display, so that the expressive person of the export can know the problems in the expression communication and correct the problems in time according to the speech characteristics.
The voice analysis part can be used for recording and analyzing the voice expressed and communicated by the talent expressive person, and the voice recording and analyzing of the viewer can be omitted under the condition of no special requirement.
In one embodiment, an adaptive algorithm based on deep learning may be further provided for speech data dictation training, which specifically includes the steps of:
1. Data representation and feature extraction:
the data represents: let X denote the speech input, where X is the time sequence of a sound signal, the sampling frequency being fs; converting the voice signal into a frequency spectrum representation by using Short-time Fourier transform (Short-Time Fourier Transform, STFT) to obtain a frequency spectrum diagram of the voice; the spectrogram is divided into a plurality of time windows, and feature extraction is performed on each window. These features may include a sound spectrum envelope, a tone feature, and the like.
Wherein the continuous spectrum in the fourier transform is expressed as follows:
X(t,f)=fx(τ)·ω(t-τ)·e -j2πfτ dτ;
wherein X (t, f) represents the spectral values at time t and frequency f;
x (τ) is the input speech signal, representing the signal value at time τ, and in the fourier transform we transform this function from the time domain (time domain) to the frequency domain to obtain information of the different frequency components;
w (t- τ) is a window function for windowing the signal; window functions are typically used to limit the effective portion of a signal over a period of time in order to avoid spectral leakage in spectral analysis; window functions typically have a specific shape, such as hanning windows or blackman windows, to accommodate different analysis requirements;
e-j 2 pi f tau, which is a complex exponential function describing the frequency content of the signal; where f represents frequency and τ represents time delay; this complex exponential function may represent the frequency information of the signal, its amplitude and phase corresponding to the amplitude and phase of the frequency components;
j represents an imaginary unit, and the imaginary unit j is usually used in engineering and physics to represent the imaginary part in complex numbers, satisfying j2= -1.
The whole formulation is transformed into a frequency domain representation X (t, f) by fourier transforming the input signal X (t). At different times t and frequencies f, spectral values can be obtained, which represent the components of the signal at different frequencies. This is very useful in signal processing and spectral analysis for analyzing frequency characteristics in a signal.
2. Deep learning model:
a convolutional neural network (Convolutional Neural Network, CNN) is used as part of the model for extracting speech features. CNNs can learn local and global features through the convolutional and pooling layers.
The operation of the convolution layer is as follows (assuming multiple convolution kernels):
Conv i =σ(W i *X+b i );
where conv_i denotes the output of the ith convolutional layer, σ, which is an activation function, typically a sigmoid function or more commonly a ReLU (Rectified Linear Unit) function is used, which nonlinearly maps the output of the convolutional layer to introduce nonlinear properties; wi and bi are weights and deviations of the convolution layer for convolution operations; this is input data, such as image or time series data, in convolutional neural networks, typically the eigenvectors of the pixel values or series data of the image; bi, which is the bias of the convolutional layer, is used to offset the output of the convolutional layer.
A multi-layer Long Short-Term Memory network (LSTM) or gating loop unit (Gated Recurrent Unit, GRU) layer is added at the top of the model for capturing the time dependence of the speech. The LSTM operation is as follows:
h t =LSTM(X t ,h t-1 );
ht: this is a hidden state in Long Short-Term Memory network (LSTM), representing the model state at time step t; LSTM is a cyclic neural network (Recurrent Neural Network, RNN) variant suitable for sequence data for processing sequence data with long range dependencies;
xt: this is the input data or feature at time step t;
ht-1: this is the hidden state at time step t-1, which contains information of the past time step; LSTM uses hidden states to maintain and communicate information to handle long-term dependencies in sequence data.
3. Adaptive training algorithm:
during training, the model parameters are adjusted using an adaptive approach based on the user's speech performance and feedback. If the user's speech performance is poor, the model parameters may be updated using Gradient Descent (Gradient Descent) to reduce the loss function. The loss function may be a mean square error or the like.
Parameter updating:
Wherein, θnew: this is the result of a new value of the model parameter, after a single parameter update;
θold: this is the old value of the model parameter, representing the value before the parameter update;
η: this is a learning rate (learning rate), which is a super parameter that controls the step size or rate of parameter update; the selection of the learning rate is very important to the training process of the model;
this is the gradient (gradient) of the loss function with respect to the parameter θold, representing the rate of change of the loss function in the parameter space. By calculating the gradient, it is possible to determine how the loss function responds to changes in the parameters and update the parameters accordingly to minimize the loss function.
The above formulas are typically used in deep learning to describe the operation of convolutional and recurrent neural networks, as well as rules for parameter updating. They are key components of deep learning model training and optimization.
The above is presented as an example only, for illustrating the possibilities of a deep learning model and an adaptive algorithm for the specific field of dictation. The actual model and algorithm design will depend on the specific spoken training task and data. The complexity and specific mathematical formulas will vary from task to task.
In one embodiment, the brain wave data includes brain wave signals and cortex signals, and noise removal, such as removal of interference signals, may be performed on the acquired data after the brain wave data is acquired. If the brain wave acquisition equipment is activated, acquiring brain wave data, and analyzing the brain wave data comprises the following steps:
extracting and marking signal characteristics of brain wave data, wherein the signal characteristics comprise frequency, amplitude and spectral density;
the brain wave data marked with the signal features is input as a model, and is input into a brain wave deep learning model which is built in advance for learning and training to output psychological features and attention features, wherein the attention features comprise attention concentration degree, memory capacity and response speed.
Specifically, the oral expressive person wears an electrode type brain wave sensor, and electrodes are attached to the scalp to collect brain wave signals; brain waves are electrical signals generated by the electrical activity of brain nerve cells, which can be recorded on the scalp by electrodes, transmitted to a computer or other equipment, and analyzed using a deep learning algorithm. Wherein, a brain wave deep learning model is built in advance, the model uses a deep learning algorithm to learn and train a large amount of brain wave data in advance, and learns characteristic modes of brain wave signals under different physiological states, such as frequency, amplitude, spectral density, time domain or frequency domain distribution and the like, which are also characteristics for describing the modes and changes of brain waves.
After the brain wave data is analyzed by the brain wave deep learning model, the psychological state of the talent can be determined according to the signal characteristics, wherein the psychological state comprises tension, anxiety, tiredness and the like. For example, during expression communication, if the speaker feels stressed, a specific brain wave pattern may be exhibited, which may be different from a calm state or an anxiety state; by learning the brain wave signals of these modes, the current psychological state of the user can be recognized.
Since brain wave signals and cortex signals may have different performances in different physiological states, the characteristics are learned from a large amount of data through a deep learning algorithm, and the algorithm uses labeled data, namely the data of the known psychological states of the user, as a reference for learning in a training stage; the label data are used for establishing a corresponding relation between the brain wave signal mode and the psychological state, so that the algorithm can accurately identify the unknown data.
However, the psychological state of a person is greatly different from one another, and thus in practical applications, the model may need to take the individual differences into account, for example, to adjust the discrimination threshold of the model according to the history data of each user, so as to achieve more personalized recognition.
Similarly, based on the brain wave data, the attention characteristics of the expressive person such as the attention concentration degree, the memory capacity and the reaction speed can be analyzed.
The concentration refers to the level of attention a user has to a particular task or content in expressing a communication or other activity. The concentration of the user can be assessed by analyzing relevant features in the brain wave signal of the user, such as the frequency to amplitude ratio of the alpha wave and the beta wave. At higher levels of attention, the amplitude of the alpha wave may be lower and the amplitude of the beta wave may be higher.
Memory capability refers to the ability of a user to memorize and store information. The memory of the user may be assessed by analyzing patterns of brain wave signals, such as Event Related Potentials (ERP), etc., of the user during the presentation of the communication. Different memory tasks may cause different brain wave patterns that can be correlated to the memory capabilities of the user by means of deep learning algorithms.
The reaction rate refers to the rate at which the user reacts to external stimuli. The speed of a user's response may be assessed by recording Event Related Potentials (ERP) of the spoken utterance in the expression communication or other activity. Event-related potentials may appear to be a specific brain wave response within hundreds of milliseconds after stimulation, and users with rapid response may exhibit more pronounced event-related potentials.
The above process relies primarily on deep learning algorithms and data analysis techniques. The brain wave deep learning model can train a large amount of brain wave data by using a deep learning algorithm, and learn the association between brain wave signals and psychological states, concentration degree, memory, reaction speed and the like, so that the states are identified. The data analysis technology is used for extracting features from brain wave signals, classifying and predicting the features to obtain corresponding results.
The brain wave data is subjected to deep learning to obtain psychological characteristics and attention characteristics, personalized feedback is provided for the talent expressive person according to the psychological characteristics and the attention characteristics, and the user is helped to improve expression communication skills and improve talent level. For example, the brain wave collector recognizes brain wave data corresponding to the tension emotion of the spoken utterance, analyzes and obtains that the spoken utterance feels tension when expressing communication, and can provide some relaxation methods, such as deep breath or meditation, for the user according to the tension emotion of the spoken utterance.
The expressive person of the mouth feel tired when expressing and communicating, the brain wave collector can recognize the tired emotion of the expressive person of the mouth, and at the moment, some refreshing methods, such as drinking water or taking a rest, can be provided for the expressive person of the mouth according to the tired emotion of the expressive person of the mouth.
The expressive person of the mouth feel inattention when expressing communication, and the brain wave collector can recognize the inattention emotion of the expressive person of the mouth. At this time, some methods for focusing attention can be provided for the spoken expressive person according to the attention-deficit emotion of the spoken expressive person, for example, the spoken expressive person is prompted to review the key points of expression communication, or the spoken expressive person is reminded to breathe deeply and relax, and personalized feedback can be provided for the spoken expressive person through the brain wave acquisition module, so that the spoken expressive person can be helped to improve the expression communication level.
The brain wave analysis part can be used for collecting and analyzing brain wave data mainly aiming at the brain wave data when the oral expressive person expresses and communicates, and the brain wave data of the viewer can be not required to be collected under no special requirement.
In one embodiment, if eye movement data is collected, the method for analyzing the eye movement data comprises the following steps:
the eye movement data is used as a model to be input into a pre-established eye movement deep learning model for analysis so as to output sight line characteristics; the gaze feature includes a point of interest location and a gaze time.
The eye movement tracking apparatus measures the concentration of the user by tracking the eye movement locus of the user. When a user focuses on a certain task or information, their gaze point will be focused on a specific area and the gaze stay longer. Eye tracking techniques may record such information, including gaze point location, gaze time, and gaze sequence. By analyzing this data, the user's level of attention in different tasks or contexts can be derived.
Based on the principle, eye movement data of the spoken expressive person are collected, visual line characteristics such as visual search paths, gazing positions, gazing time and the like of the spoken expressive person are analyzed based on an eye movement deep learning module, eye mind conditions of the spoken expressive person in the expression communication process are known according to the visual line characteristics, the positions where the attention of the spoken expressive person is concentrated are known, further the spoken expressive person is helped to improve visual line contact, eye mind communication and the like, the spoken expressive person can be guided to improve eye mind contact, interaction with audiences is enhanced, and the effect and the attraction of expression communication are improved, so that the spoken expression level is improved.
Since the spoken expressive person is in the process of expressing communication, the viewer will also look at the spoken expressive person itself or what the spoken expressive person is presenting, i.e. the eyes of the viewer will also make a series of glances and fixations. Therefore, in the expression communication process of the talent expressive person, eye movement data of the observer can be acquired, the eye movement data of the observer is analyzed through the eye movement deep learning model, the sight line characteristics of the observer are known, the learning performance and the watching interest of the observer in the expression communication process can be estimated according to the characteristics of the point of gaze, the watching time and the like of the observer, the expression communication effect of the talent expressive person can be known on the side according to the learning performance fed back by the observer, and further the instruction suggestion corresponding to the talent expressive person is provided.
For example, the spoken utterance people collect eye movement data of the viewers during the expression communication, if the attention points of the viewers are analyzed according to the eye movement data and are not on the spoken utterance people or the content displayed by the spoken utterance people, the expression communication effect of the spoken utterance people is poor, and the spoken utterance people are suggested to enhance the expression communication attraction.
In addition, in order to realize more natural and visual user interaction, the eye movement tracking technology can be combined with the virtual reality technology, for example, in a virtual expression communication environment, a spoken expressive person can control page turning of a virtual slide through eye movement or can carry out eye-mind communication with a virtual audience, so that immersion and reality of virtual expression communication are enhanced.
In one embodiment, if the camera is activated to collect facial data, the method for analyzing the facial data is as follows:
the face data is input as a model, and the face data is analyzed based on a pre-established face deep learning model to output emotional characteristics.
The facial deep learning module takes a large number of facial pictures as model input, takes the emotion states represented by the facial expressions of the users in the pictures as model output, and learns and trains the model to obtain a facial deep learning model; the facial expression of the user can be analyzed by using the facial deep learning model, and the emotional characteristics of the user, such as happiness, sadness, anger and the like, are identified; so that the spoken expression level of the spoken expressive person can be evaluated based on the emotional characteristics.
Similarly, the face data of the viewer can be collected in the expression communication process, and the expression level of the talents is reflected by analyzing the emotion expression side surfaces of the viewer when the viewer listens to the expression communication, so that corresponding guidance and comments are given.
It should be noted that the training process of the above deep learning model is already disclosed in the prior art, and the specific training process of the model is not described in detail here, as long as the model can achieve the corresponding purpose.
After the expression characteristics of the spoken utterance and/or the viewer are analyzed through different deep learning models, the expression characteristic results are converted into feedback, and the feedback is output to the user, for example, the following feedback can be provided for the user:
the degree of concentration of the user;
a line-of-sight movement trajectory of the user;
facial expression of the user changes;
the user's expression communicates content.
The feedback helps the speaker to know the expression communication performance of the speaker and improve the expression communication performance according to the advice information. The method for pushing the corresponding improvement opinion according to the performance characteristics comprises the following steps: calling an information base to extract suggestion information corresponding to different performance characteristics for pushing; the information base stores advice information related to different performance characteristics, and personalized feedback is given to the talent expressive person.
The following is a specific case:
in the process of expressing and communicating, a user is not concentrated in attention, the sight is often separated from the expressing and communicating manuscript, and the facial expression is relatively stiff. Analysis of the user's performance characteristics provides the user with the following feedback:
the user's attention is not focused;
the user's line of sight often leaves the presentation;
the facial expression of the user is relatively stiff;
based on the feedback, the user can take the following measures to improve his own expression communication performance:
preparing before expression communication, knowing the content of the expression communication and practicing the expression communication;
during the expression communication, the concentration is kept, and the audience is visually observed;
in the expression communication process, expressions and gestures are used for expressing the emotion of the user;
through feedback, the user can increase his own expression communication level.
The presentation characteristics analyzed by the deep learning algorithm and the suggestion information corresponding to the screened presentation characteristics can be displayed in a visual mode; the visualization means includes presentation using at least one of charts, animations and interactive games. The animation and the interactive game can be realized through a virtual reality technology, namely, a virtual talent expressive person or a viewer is generated, and a virtual person can express the expression characteristics and the suggestion information of the current talent expressive person through the animation; or if the expressive person of the talents analyzes that the expression of any aspect of the voice, the concentration degree, the memory, the reaction speed, the emotion and the like accords with the preset good standard in the expression communication process, the corresponding score can be accumulated, the game can be won after the preset score is reached, the efficiency of the talent training can be improved through the modes of animation, the game and the like, and the interestingness of the talent training process is stronger.
Example two
The embodiment provides a personalized talent training system, which executes the personalized talent training method in the first embodiment; as shown in fig. 2, the personalized talent training system comprises:
the acquisition module is used for activating perception equipment of a talent expressive person and/or a viewer after receiving a training request to acquire perception data corresponding to the talent expressive person and/or the viewer in the talent training process of the talent expressive person, wherein the perception data is one or a combination of two or more of voice data, brain wave data, eye movement data and facial data;
the model analysis module is used for performing deep learning and analysis on the perception data according to a pre-established deep learning model and outputting the corresponding performance characteristics of different perception data; the performance features include speech features, psychological features, attention features, line of sight features, and emotional features;
and the suggestion pushing model is used for calling an information base to screen and push suggestion information corresponding to different performance characteristics.
The functions of each module in the system of the embodiment of the present invention may be referred to the corresponding descriptions in the above method, and will not be repeated here.
Example III
Fig. 3 shows a block diagram of an electronic device according to an embodiment of the invention. As shown in fig. 3, the electronic device includes: memory 910 and processor 920, memory 910 stores a computer program executable on processor 920. The processor 920, when executing the computer program, implements the personalized talent training method in the above embodiments. The number of memories 910 and processors 920 may be one or more.
The electronic device further includes:
and the communication interface 930 is used for communicating with external equipment and carrying out data interaction transmission.
If the memory 910, the processor 920, and the communication interface 930 are implemented independently, the memory 910, the processor 920, and the communication interface 930 may be connected to each other and perform communication with each other through buses. The bus may be an industry standard architecture (Industry Standard Architecture, ISA) bus, an external device interconnect (Peripheral Component Interconnect, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in fig. 3, but not only one bus or one type of bus.
Alternatively, in a specific implementation, if the memory 910, the processor 920, and the communication interface 930 are integrated on a chip, the memory 910, the processor 920, and the communication interface 930 may communicate with each other through internal interfaces.
The embodiment of the invention provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the method provided in the embodiment of the invention.
The embodiment of the invention also provides a chip, which comprises a processor and is used for calling the instructions stored in the memory from the memory and running the instructions stored in the memory, so that the communication equipment provided with the chip executes the method provided by the embodiment of the invention.
The embodiment of the invention also provides a chip, which comprises: the input interface, the output interface, the processor and the memory are connected through an internal connection path, the processor is used for executing codes in the memory, and when the codes are executed, the processor is used for executing the method provided by the embodiment of the invention.
It should be appreciated that the processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (digital signal processing, DSP), application specific integrated circuits (application specific integrated circuit, ASIC), field programmable gate arrays (fieldprogrammablegate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or any conventional processor or the like. It is noted that the processor may be a processor supporting an advanced reduced instruction set machine (advanced RISC machines, ARM) architecture.
Further, optionally, the memory may include a read-only memory and a random access memory, and may further include a nonvolatile random access memory. The memory may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may include a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory, among others. Volatile memory can include random access memory (random access memory, RAM), which acts as external cache memory. By way of example, and not limitation, many forms of RAM are available. For example, static RAM (SRAM), dynamic RAM (dynamic random access memory, DRAM), synchronous DRAM (SDRAM), double data rate synchronous DRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchronous DRAM (SLDRAM), and direct memory bus RAM (DR RAM).
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions in accordance with the present invention are fully or partially produced. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. Computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.
Any process or method description in a flowchart or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process. And the scope of the preferred embodiments of the present invention includes additional implementations in which functions may be performed in a substantially simultaneous manner or in an opposite order from that shown or discussed, including in accordance with the functions that are involved.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. All or part of the steps of the methods of the embodiments described above may be performed by a program that, when executed, comprises one or a combination of the steps of the method embodiments, instructs the associated hardware to perform the method.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules described above, if implemented in the form of software functional modules and sold or used as a stand-alone product, may also be stored in a computer-readable storage medium. The storage medium may be a read-only memory, a magnetic or optical disk, or the like.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that various changes and substitutions are possible within the scope of the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims (10)

1. A personalized talent training method, comprising:
collecting perception data corresponding to a talent expressive person and/or a viewer in a talent training process of the talent expressive person, wherein the perception data is at least one of voice data, brain wave data, eye movement data or facial data;
performing deep learning and analysis on the perception data based on a pre-established deep learning model to obtain different expression characteristics corresponding to the perception data; the deep learning model comprises an emotion analysis model and a behavior prediction model; the performance features include speech features, psychological features, attention features, line of sight features, and emotional features;
and calling an information base, and screening the suggestion information corresponding to different performance characteristics.
2. The personalized talent training method of claim 1, wherein the sensory data is speech data, further comprising:
preprocessing the voice data to generate preprocessed voice data;
performing text conversion on the preprocessed voice data to generate first display information for displaying text;
carrying out semantic analysis on the text to obtain a semantic analysis result;
judging whether semantic errors exist in the semantic analysis result, and determining semantic error contents;
and generating second display information for displaying the content of the semantic error and the semantic analysis result according to the content of the semantic error and the semantic analysis result.
3. The personalized talent training method according to claim 1, wherein said method of analyzing speech data comprises:
based on the preprocessed voice data as a model input, the voice data is input into a pre-established voice deep learning model for learning and training, and voice characteristics are output, wherein the voice characteristics comprise voice definition, speech speed and language.
4. The personalized talent training method of claim 1, wherein the brain wave data comprises brain wave signals and cortical signals; if the brain wave data is acquired, the method for analyzing the brain wave data comprises the following steps:
Extracting and marking signal features of the brain wave data, wherein the signal features comprise frequency, amplitude and spectral density;
and inputting the brain wave data marked with the signal characteristics as a model, inputting the brain wave data into a pre-established brain wave deep learning model for learning and training, and outputting psychological characteristics and attention characteristics, wherein the attention characteristics comprise attention concentration degree, memory capacity and response speed.
5. The personalized talent training method of claim 1, wherein if the eye movement data is collected, the method of analyzing the eye movement data comprises:
inputting the eye movement data as a model to a pre-established eye movement deep learning model for analysis, and outputting the sight line characteristics; the gaze feature includes a point of interest location and a gaze time.
6. The personalized talent training method of claim 1, wherein if the facial data is collected, the method of analyzing the facial data comprises:
and inputting the face data as a model, analyzing the face data based on a pre-established face deep learning model, and outputting the emotion characteristics.
7. The personalized talent training method according to claim 1, further comprising, after screening out the advice information corresponding to the performance characteristics:
visually displaying the performance characteristics and the suggestion information; the visual presentation includes presentation using at least one of charts, animations and interactive games.
8. A personalized talent training system, characterized in that a personalized talent training method according to any of claims 1-7 is performed; the system comprises:
the acquisition module is used for acquiring perception data corresponding to a talent expressive person and/or a viewer in the talent training process of the talent expressive person, wherein the perception data is one or a combination of two or more of voice data, brain wave data, eye movement data and facial data;
the model analysis module is used for performing deep learning and analysis on the perception data according to a pre-established deep learning model and outputting the corresponding performance characteristics of different perception data; the performance features include speech features, psychological features, attention features, line of sight features, and emotional features;
and the suggestion pushing model is used for calling an information base, screening suggestion information corresponding to different performance characteristics and pushing the suggestion information.
9. An electronic device, comprising: a processor and a memory in which instructions are stored, the instructions being loaded and executed by the processor to implement the personalized talent training method of any of claims 1-7.
10. A computer readable storage medium having stored therein a computer program which when executed by a processor implements the personalized spoken training method of any of claims 1-7.
CN202311700258.8A 2023-12-11 2023-12-11 Personalized talent training method, system, equipment and medium Pending CN117788235A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311700258.8A CN117788235A (en) 2023-12-11 2023-12-11 Personalized talent training method, system, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311700258.8A CN117788235A (en) 2023-12-11 2023-12-11 Personalized talent training method, system, equipment and medium

Publications (1)

Publication Number Publication Date
CN117788235A true CN117788235A (en) 2024-03-29

Family

ID=90399132

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311700258.8A Pending CN117788235A (en) 2023-12-11 2023-12-11 Personalized talent training method, system, equipment and medium

Country Status (1)

Country Link
CN (1) CN117788235A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015177908A1 (en) * 2014-05-22 2015-11-26 株式会社日立製作所 Training system
KR20180052382A (en) * 2016-11-10 2018-05-18 (주)티디엘 Method of training speech disorder by emotional inference and speech disorder traning system using thereof
US20180247549A1 (en) * 2017-02-21 2018-08-30 Scriyb LLC Deep academic learning intelligence and deep neural language network system and interfaces
CN116343824A (en) * 2023-05-29 2023-06-27 新励成教育科技股份有限公司 Comprehensive evaluation and solution method, system, device and medium for talent expression capability
CN116484318A (en) * 2023-06-20 2023-07-25 新励成教育科技股份有限公司 Lecture training feedback method, lecture training feedback device and storage medium
CN116543445A (en) * 2023-06-29 2023-08-04 新励成教育科技股份有限公司 Method, system, equipment and storage medium for analyzing facial expression of speaker

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015177908A1 (en) * 2014-05-22 2015-11-26 株式会社日立製作所 Training system
KR20180052382A (en) * 2016-11-10 2018-05-18 (주)티디엘 Method of training speech disorder by emotional inference and speech disorder traning system using thereof
US20180247549A1 (en) * 2017-02-21 2018-08-30 Scriyb LLC Deep academic learning intelligence and deep neural language network system and interfaces
CN116343824A (en) * 2023-05-29 2023-06-27 新励成教育科技股份有限公司 Comprehensive evaluation and solution method, system, device and medium for talent expression capability
CN116484318A (en) * 2023-06-20 2023-07-25 新励成教育科技股份有限公司 Lecture training feedback method, lecture training feedback device and storage medium
CN116543445A (en) * 2023-06-29 2023-08-04 新励成教育科技股份有限公司 Method, system, equipment and storage medium for analyzing facial expression of speaker

Similar Documents

Publication Publication Date Title
Bone et al. Robust unsupervised arousal rating: A rule-based framework withknowledge-inspired vocal features
Kartushina et al. The effect of phonetic production training with visual feedback on the perception and production of foreign speech sounds
CN105792752B (en) Computing techniques for diagnosing and treating language-related disorders
Narayanan et al. Behavioral signal processing: Deriving human behavioral informatics from speech and language
Chandrasekaran et al. Individual variability in cue-weighting and lexical tone learning
WO2019024247A1 (en) Data exchange network-based online teaching evaluation system and method
Caponetti et al. Biologically inspired emotion recognition from speech
US20230252315A1 (en) Adjusting mental state to improve task performance
CN117541444B (en) Interactive virtual reality talent expression training method, device, equipment and medium
Abdulsalam et al. Emotion recognition system based on hybrid techniques
CN117711444B (en) Interaction method, device, equipment and storage medium based on talent expression
CN117522643B (en) Talent training method, device, equipment and storage medium
CN117635383A (en) Virtual teacher and multi-person cooperative talent training system, method and equipment
Muhammad et al. Spectro-temporal directional derivative based automatic speech recognition for a serious game scenario
Matsane et al. The use of automatic speech recognition in education for identifying attitudes of the speakers
McTear et al. Affective conversational interfaces
CN117788235A (en) Personalized talent training method, system, equipment and medium
EP4033487A1 (en) Method and system for measuring the cognitive load of a user
JP2023534799A (en) Conversation-based mental disorder screening method and apparatus
Gomes Implementation of i-vector algorithm in speech emotion recognition by using two different classifiers: Gaussian mixture model and support vector machine
de Menezes et al. A method for lexical tone classification in audio-visual speech
Onwujekwe USING DEEP LEARNING-BASED FRAMEWORK FOR CHILD SPEECH EMOTION RECOGNITION
CN117788239B (en) Multi-mode feedback method, device, equipment and storage medium for talent training
Iliev Perspective Chapter: Emotion Detection Using Speech Analysis and Deep Learning
Karolus Proficiency-aware systems: designing for user skill and expertise

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination