CN117788239A - Multi-mode feedback method, device, equipment and storage medium for talent training - Google Patents

Multi-mode feedback method, device, equipment and storage medium for talent training Download PDF

Info

Publication number
CN117788239A
CN117788239A CN202410201444.5A CN202410201444A CN117788239A CN 117788239 A CN117788239 A CN 117788239A CN 202410201444 A CN202410201444 A CN 202410201444A CN 117788239 A CN117788239 A CN 117788239A
Authority
CN
China
Prior art keywords
feedback
learning
talent
emotion
score
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410201444.5A
Other languages
Chinese (zh)
Other versions
CN117788239B (en
Inventor
李翔
赵璧
詹歆
刘慧�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xinlicheng Education Technology Co ltd
Original Assignee
Xinlicheng Education Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xinlicheng Education Technology Co ltd filed Critical Xinlicheng Education Technology Co ltd
Priority to CN202410201444.5A priority Critical patent/CN117788239B/en
Priority claimed from CN202410201444.5A external-priority patent/CN117788239B/en
Publication of CN117788239A publication Critical patent/CN117788239A/en
Application granted granted Critical
Publication of CN117788239B publication Critical patent/CN117788239B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The multi-modal feedback method for the talent training comprises the steps of obtaining input information, generating an original learning plan according to the input information, obtaining first talent training data in the process of executing the original learning plan, analyzing a plurality of talent dimensions of the first talent training data to obtain talent scores corresponding to each talent dimension, adjusting the original learning plan according to the talent scores and the input information to obtain a target learning plan, determining the target learning plan based on individualized input information and the talent scores of the talent dimensions, conducting emotion analysis processing on the target talent training data to obtain emotion analysis processing results, determining multi-modal feedback according to emotion analysis processing results, and conducting multi-modal feedback by combining emotion factors to be beneficial to improving accuracy, diversity and training effects of feedback.

Description

Multi-mode feedback method, device, equipment and storage medium for talent training
Technical Field
The present application relates to the field of talent training, and in particular, to a method, an apparatus, a device, and a storage medium for multimodal feedback of talent training.
Background
Traditional talent training methods rely primarily on manual coaches or simple self-training, which generally fail to provide accurate feedback and personalized advice, are limited in time and place, fail to achieve learning anywhere and anytime, and have the following problems: 1. accuracy problems: the traditional method can not provide accurate talent expression evaluation and feedback, so that a learner can not know own weaknesses and improvement points; 2. individualization problem: different learners have different talent dimensions and learning requirements, and the traditional method can only make a fixed learning plan, but cannot provide personalized and targeted learning plans and suggestions; 3. multimodal problem: the emotion analysis and the multi-mode feedback are limited in application in talent training, and the traditional method can only perform text feedback based on text content expression, so that the comprehensive analysis and improvement of the performance of the lecture of a learner are limited; 4. timeliness problem: traditional face-to-face training is limited by time and place, and can not realize study at any time and any place; 5. feedback quality problem: the feedback of the traditional training method is subjective, and accurate and cannot provide clear improved guidance.
Disclosure of Invention
The embodiment of the application provides a multi-mode feedback method, device, equipment and storage medium for talent training, which are used for solving at least one problem existing in the related technology, and the technical scheme is as follows:
in a first aspect, embodiments of the present application provide a method for multimodal feedback of talent training, including:
acquiring input information, and generating an original learning plan according to the input information;
acquiring first talent training data in the process of executing the original learning plan;
analyzing a plurality of talent dimensions of the first talent training data to obtain a talent score corresponding to each talent dimension, and adjusting the original learning plan according to the talent score and the input information to obtain a target learning plan;
acquiring second talent training data in the process of executing the target learning plan, and taking the first talent training data or the second talent training data as target talent training data;
and carrying out emotion analysis processing on the target talent training data to obtain emotion analysis processing results, and determining multi-modal feedback according to the emotion analysis processing results, wherein the multi-modal feedback comprises at least one of text feedback, sound feedback, visual feedback and tactile feedback.
In one embodiment, the input information includes learning objectives, learning paths, and available learning time, each learning path includes corresponding learning materials, and the adjusting the original learning plan according to the mouth scores and the input information, to obtain the objective learning plan includes:
determining learning target parameters according to the learning targets, determining learning material parameters representing the learning material effects according to the learning materials, determining learning cost parameters according to the original learning plan, and determining behavior modeling parameters according to user feedback information or historical data of the original learning plan execution;
determining a plan score of the original learning plan according to the talent score, the learning objective parameter, the learning material parameter, the learning cost parameter, the available learning time and the behavior modeling parameter;
and when the plan score is larger than a score threshold value, adjusting the original learning plan to obtain a target learning plan.
In one embodiment, the determining the plan score of the original learning plan based on the spoken score, the learning objective parameter, the learning material parameter, the learning cost parameter, the available learning time, and the behavior modeling parameter includes:
Determining the mouth score, the learning target parameter, the learning material parameter, the learning cost parameter and the weight parameter corresponding to the behavior modeling parameter respectively;
and carrying out weighted calculation according to the weight parameters, the talent score, the learning target parameters, the learning material parameters, the learning cost parameters, the available learning time and the behavior modeling parameters to obtain the plan score of the original learning plan.
In one embodiment, the performing emotion analysis processing on the target spoken training data to obtain emotion analysis processing results includes:
performing multi-mode emotion analysis on the target talent training data through an emotion analysis engine to obtain a text emotion analysis result, a sound emotion analysis result and an image emotion analysis result;
and converting the text emotion analysis result, the sound emotion analysis result and the image emotion analysis result through a multi-mode feedback generator to obtain emotion intensity and emotion type.
In one embodiment, the determining the multimodal feedback based on the emotion analysis processing result includes:
Determining a target talent score corresponding to each talent dimension of the target talent training data;
when the emotion type is forward emotion, determining a target value of the emotion type as a first value, otherwise, determining the target value as a second value;
determining emotion weight parameters corresponding to multi-mode feedback according to the emotion intensity, wherein the emotion intensity and the emotion weight parameters are positively correlated;
weighting calculation is carried out according to each target talent score, each talent score weight parameter, each target numerical value, each emotion intensity and each emotion weight parameter, and a feedback score of multi-mode feedback is determined;
and at least one of text feedback, sound feedback, visual feedback and tactile feedback is performed according to the feedback score.
In one embodiment, the performing at least one of text feedback, audio feedback, visual feedback, and tactile feedback according to the feedback score comprises:
when the feedback score is greater than a feedback threshold, feeding back at least one of forward text, forward speech, forward image or animation, and first intensity haptic sensation through the virtual mentor system;
When the feedback score is less than or equal to a feedback threshold, feeding back at least one of negative text, negative speech, negative image or animation, and second intensity of haptic sensation by the virtual mentor system; the first intensity is greater than the second intensity.
In one embodiment, the method further comprises:
extracting features of the target talent training data to obtain a plurality of talent dimension indexes and time sequence talent dimension indexes;
determining a first emotion perception index according to the spoken dimension index and the preset weight, and determining a second emotion perception index according to the time sequence spoken dimension index and the preset weight;
determining emotion feedback suggestions according to the first emotion perception index and/or the second emotion perception index;
generating emotion feedback advice in real time, or determining target emotion feedback advice and generating target emotion feedback advice according to the emotion feedback advice and a reward function by using a reinforcement learning method;
or,
calculating the fluency score and the confidence score of the target talent training data;
and outputting an improvement recommendation and/or an advantage through the virtual teacher system according to the fluency score and the confidence score.
In a second aspect, embodiments of the present application provide a multi-modal feedback device for spoken training, comprising:
the first acquisition module is used for acquiring input information and generating an original learning plan according to the input information;
the second acquisition module is used for acquiring the first talent training data in the process of executing the original learning plan;
the adjustment module is used for analyzing a plurality of talent dimensions of the first talent training data to obtain a talent score corresponding to each talent dimension, and adjusting the original learning plan according to the talent score and the input information to obtain a target learning plan;
the third acquisition module is used for acquiring second talent training data in the process of executing the target learning plan, and taking the first talent training data or the second talent training data as target talent training data;
and the feedback module is used for carrying out emotion analysis processing on the target mouth training data to obtain emotion analysis processing results, and determining multi-modal feedback according to the emotion analysis processing results, wherein the multi-modal feedback comprises at least one of text feedback, sound feedback, visual feedback and tactile feedback.
In one embodiment, the feedback module is further configured to:
extracting features of the target talent training data to obtain a plurality of talent dimension indexes and time sequence talent dimension indexes;
determining a first emotion perception index according to the spoken dimension index and the preset weight, and determining a second emotion perception index according to the time sequence spoken dimension index and the preset weight;
determining emotion feedback suggestions according to the first emotion perception index and/or the second emotion perception index;
generating emotion feedback advice in real time, or determining target emotion feedback advice and generating target emotion feedback advice according to the emotion feedback advice and a reward function by using a reinforcement learning method;
or,
calculating the fluency score and the confidence score of the target talent training data;
and outputting an improvement recommendation and/or an advantage through the virtual teacher system according to the fluency score and the confidence score.
In a third aspect, an embodiment of the present application provides an electronic device, including: a processor and a memory in which instructions are stored, the instructions being loaded and executed by the processor to implement the method of any of the embodiments of the above aspects.
In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program, where the computer program when executed implements a method in any one of the embodiments of the above aspects.
The beneficial effects in the technical scheme at least comprise:
according to the method, input information is acquired, an original learning plan is generated according to the input information, first talent training data are acquired in the process of executing the original learning plan, a plurality of talent dimensions are analyzed on the first talent training data, a talent score corresponding to each talent dimension is obtained, the original learning plan is adjusted according to the talent score and the input information, a target learning plan is obtained, the target learning plan can be determined based on personalized input information and the talent scores of the talent dimensions, the method is beneficial to adapting to the requirements of different users, second talent training data are acquired in the process of executing the target learning plan, the first talent training data or the second talent training data are used as target talent training data, emotion analysis processing is carried out on the target talent training data, emotion analysis processing results are obtained, multi-mode feedback including at least one of text feedback, sound feedback, visual feedback and tactile feedback is determined according to emotion analysis processing results, multi-mode feedback is carried out by combining emotion factors, and the method is beneficial to improving the accuracy and diversity of feedback of users.
The foregoing summary is for the purpose of the specification only and is not intended to be limiting in any way. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features of the present application will become apparent by reference to the drawings and the following detailed description.
Drawings
In the drawings, the same reference numerals refer to the same or similar parts or elements throughout the several views unless otherwise specified. The figures are not necessarily drawn to scale. It is appreciated that these drawings depict only some embodiments according to the disclosure and are not therefore to be considered limiting of its scope.
FIG. 1 is a flowchart illustrating steps of a multi-modal feedback method for spoken training in accordance with an embodiment of the present application;
FIG. 2 is a block diagram of a multi-modal feedback device for talent training in accordance with one embodiment of the present application;
fig. 3 is a block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Hereinafter, only certain exemplary embodiments are briefly described. As will be recognized by those of skill in the pertinent art, the described embodiments may be modified in various different ways without departing from the spirit or scope of the present application. Accordingly, the drawings and description are to be regarded as illustrative in nature and not as restrictive.
Referring to fig. 1, a flowchart of a multi-modal feedback method for spoken training according to an embodiment of the present application is shown, where the multi-modal feedback method for spoken training may include at least steps S100-S500:
s100, acquiring input information, and generating an original learning plan according to the input information.
S200, acquiring first talent training data in the process of executing an original learning plan.
S300, analyzing the first talent training data in a plurality of talent dimensions to obtain a talent score corresponding to each talent dimension, and adjusting an original learning plan according to the talent score and input information to obtain a target learning plan.
S400, acquiring second talent training data in the process of executing the target learning plan, and taking the first talent training data or the second talent training data as target talent training data.
S500, emotion analysis processing is carried out on the target talent training data to obtain emotion analysis processing results, and multi-mode feedback is determined according to the emotion analysis processing results, wherein the multi-mode feedback comprises at least one of text feedback, sound feedback, visual feedback and tactile feedback.
The multimodal feedback method for talent training in the embodiment of the present application may be executed by an electronic control unit, a controller, a processor, etc. of a terminal such as a computer, a mobile phone, a tablet, a vehicle-mounted terminal, etc., or may be executed by a cloud server, for example, by a system of the cloud server.
According to the technical scheme, the original learning plan is generated according to the input information, the first talent training data is obtained in the process of executing the original learning plan, a plurality of talent dimensions are analyzed on the first talent training data, the talent score corresponding to each talent dimension is obtained, the original learning plan is adjusted according to the talent score and the input information, the target learning plan is obtained, the target learning plan can be determined based on the personalized input information and the talent score of the talent dimension, the requirements of different users are facilitated, the second talent training data is obtained in the process of executing the target learning plan, the first talent training data or the second talent training data are used as the target talent training data, emotion analysis processing results are obtained, multi-modal feedback including at least one of text feedback, sound feedback, visual feedback and tactile feedback is determined according to the emotion analysis processing results, multi-modal feedback is performed by combining emotion factors, the accuracy and diversity of feedback are facilitated, and the user is facilitated to be improved.
In one embodiment, a user starts the system, logs in or registers, and then enters the system, and then the user can perform input operation in the system according to actual needs to obtain input information for the system, wherein the input information comprises, but is not limited to, learning targets, learning paths, available learning time and the like, each learning path in the system comprises corresponding learning materials, and the learning paths comprise, but are not limited to, speaking confidence improvement, speech skill improvement, persuasion improvement and the like. After the user selects a certain learning path for personalized learning, the system determines corresponding learning materials such as courses, teaching materials and the like, and the learning target can have one or more learning targets for the user to select, for example, including but not limited to, improvement of speaking skills, improvement of professional development targets or improvement of language level and the like. In the embodiment of the application, after the system acquires the input information, a summary page is displayed for a user to confirm whether the input information is correct, and after the user confirms that the input information is correct, a personalized original learning plan is automatically generated according to the input information, wherein the personalized original learning plan comprises a learning schedule, learning materials (such as video courses, articles, lecture cases and the like), contents of exercises and the like, and the exercise frequency and the difficulty level, so that the user is helped to gradually improve the expression capacity of the talents, and the talents are trained through the learning environment of the holographic virtual teacher system.
In one embodiment, when executing an original learning plan, a user trains learning according to a learning material statement, in which process the system acquires training data of the user as first statement training data. It should be noted that the first talent training data may include photographed pictures, videos, or audio recordings.
In one embodiment, the spoken dimensions include, but are not limited to, fluency, confidence, language expression capability, and posture and Body Language, fluency (Fluency) can be assessed by calculating the frequency and duration of speech breaks, confidence (Confidence) can be derived by Natural Language Processing (NLP) for sound analysis and emotion analysis, language expression capability (statics) can be assessed by pronunciation accuracy and lexical diversity, posture and Body Language (Body Language) can be assessed by holographic projection techniques to calculate coordination of Body movements and diversity of facial expressions, for example, the spoken score calculation formula for each spoken dimension is as follows:
Fluency=TotalSpeechDuration/(1+PauseCount+PauseDuration)
Confidence=PitchRange+EmotionScore/2
Articulation=(PronunciationAccuracy×VocabularyDiversity)/100
BodyLanguage=(BodyCoordination+FacialExpressionDiversity)/2
wherein, pauseCOUNT is the number of times of interruption of the lecture, pauseDuration is the total duration of interruption of the lecture, totalSpeechDuration is the total duration of the lecture, and the PauseCOUNT is obtained by analyzing the training data of the first talent; pitchRange is the variation range of sound tones, emotionScore is emotion analysis score, and the first talent training data can be obtained through analysis processing of a deep learning model; the PronuationnAccus is used for evaluating whether the pronunciation of the user is accurate or not through a deep learning model, the VocabubularyDiversity is used for vocabulary diversity, and the VocabubularyDiversity is obtained through analysis of a text analysis model; body coordination is coordination of body movements, is assessed by monitoring gestures and actions of a user, is diversity of facial expressions, is assessed by analyzing facial expressions of the user, and can be obtained through analysis of an image processing algorithm.
In some embodiments, the total composite score of the scores may also be calculated, and a spoken dimension weight may be assigned to each spoken dimension configuration, and then weighted and summed to obtain the total composite score of the spoken dimension.
In some embodiments, the step S300 of adjusting the original learning plan according to the spoken score and the input information to obtain the target learning plan includes steps S310-S330:
s310, determining learning target parameters according to learning targets, determining learning material parameters representing learning material effects according to learning materials, determining learning cost parameters according to an original learning plan, and determining behavior modeling parameters according to user feedback information or historical data of original learning plan execution.
Optionally, (1) determining learning objective parameter g based on learning objective(s) j The learning objective can be quantified according to the priority, the relevance or the difficulty and other factors, and the score, namely the learning objective parameter g, is determined j . For example:
giving a priority score to each learning object, wherein a user can select the corresponding priority when configuring the learning object, such as 5 points for urgent target allocation, 3 points for important but not urgent target allocation and 1 point for common target allocation;
Relevance if the learning objective is highly relevant to the current capability level of the user, if the learning objective can increase the dimension of the talents (the lower the corresponding talent score is, the higher the relevance of the learning objective to the current capability level of the user is), a numerical value is allocated based on the relevance, such as that the relevance is high, 10 points are allocated, the middle relevance is allocated, 5 points are allocated, and the low relevance is allocated, 1 point is allocated.
Difficulty of the target can be quantified by estimating the time or effort required to complete the target, such as a high difficulty target assigned 10 points, a medium difficulty assigned 5 points, and a low difficulty assigned 1 point.
Optionally, (2) determining a learning material parameter m characterizing the learning material effect from the learning material k The learning material can be quantified according to the applicability, the understandability, the user feedback and other factors, and the score is determined, namely the learning material parameter m k . For example:
the applicability is that the matching degree of the data and the learning target is scored, 10 points are allocated in a complete matching way, 5 points are allocated in a partial matching way, and 0 points are allocated in a non-matching way.
Easy understanding, namely, according to the complexity of the data, the data is distributed with high score, and the data with difficult understanding is distributed with low score, such as 10 score of simple data and 1 score of complex data.
User feedback, namely, evaluating the effectiveness of the data according to historical user feedback, distributing 10 points to the high-efficiency data, distributing 5 points to the effect generally, and distributing 1 point to the effect difference.
Optionally, (3) determining learning cost parameters from the original learning plan(for example, cost such as time and resource, difficulty of learning materials, learning speed, and time cost for determining the high and low learning target requirements), the learning cost can be quantified according to time cost, resource consumption, etc., and the score, i.e., learning material parameter, can be determined>. For example:
time cost, namely, according to the expected time required for completing the learning task, the time is short to distribute low points (for example, the time is distributed for 1 minute in 1 hour), the time is long to distribute high points (for example, the time is distributed for 10 minutes in more than 10 hours), and the corresponding points of a plurality of time intervals can be set.
And (3) resource consumption, namely, according to the resources (such as money, materials and the like) required for completing the learning task, for example, money is taken as an example, low scores are allocated in a small consumption mode (such as 1 score allocated in 10 yuan), high scores are allocated in a large consumption mode (such as 10 scores allocated in more than 1000 yuan), and the scores corresponding to a plurality of consumption intervals can be set.
Optionally, (4) determining the behavioral modeling parameters based on user feedback information or historical data of the original learning plan execution(e.g., factors that may include user learning preferences, speed, memory, etc. of behavior determined by historical data or preference factors directly input by the user), behavior modeling may be quantified according to the user's learning preferences, speed, memory, etc., and scores, i.e., behavior modeling parameters +. >. For example:
and (3) learning preference, namely assigning scores according to a learning mode selected by a user in a learning path, assigning 10 scores to preference video learning, assigning 5 scores to preference reading, and assigning 1 score to no specific preference.
Learning speed-determining average time for the current user or all users to complete the same or similar learning tasks (learning targets) according to the historical data, assigning a high score to a fast learner (assigning a new skill of 10 points within 1 day), assigning a low score to a slow learner (assigning a score of 1 point only after 1 week), and setting multiple threshold intervals to distinguish between fast and slow.
Memory, namely, according to the recall capability of the user on the learned content, the memory is high in score (more than 90% of the content can be completely recalled and distributed for 10 scores), and the memory is low in score (only less than 50% of the content can be recalled and distributed for 1 score).
S320, determining a plan score of the original learning plan according to the talent score, the learning target parameter, the learning material parameter, the learning cost parameter, the available learning time and the behavior modeling parameter.
Alternatively, the calculation formula of the plan score P of the original learning plan is:
+/>+/>-/>+/>+f(P’,T))
optionally, the spoken score, the learning objective parameter, the learning material parameter, the learning cost parameter, and the behavior modeling parameter all have corresponding weight parameters 、/>、/>、/>、/>And then obtaining the plan score P of the original learning plan by utilizing the formula to carry out weighted calculation. Wherein (1)>Score for the i-th dimension of talents,/->A mouth dimension weight for the ith mouth dimension, +.>Weights of learning object parameters for jth learning object, +.>Learning data parameters for kth learning data (e.g. course, etc.)>Weight of->Is->Learning cost parameter of individual cost->Weight of->Modeling parameters corresponding to the mth factor +.>T is the available learning time (preferably minutes or seconds),f(P’,T) For evaluating and optimizing a function of the learning plan for a representation, for evaluating the quality of the learning plan, for example, an output function of a previously trained deep learning model,P’the method comprises the steps of scoring a talent, learning target parameters, learning data parameters, learning cost parameters, behavior modeling parameters and the like, and obtaining a scoring result after combining T input to a deep learning modelf(P’,T) The goal of this function is to ensure that the learning plan can meet the needs of the user, improve the learning effect, and provide a personalized learning experience, taking into account factors such as the comprehensiveness of the learning plan, the user satisfaction, the goal achievement, etc.
And S330, when the plan score is larger than the score threshold value, the original learning plan is adjusted to obtain the target learning plan.
Alternatively, the score threshold may be set based on the actual setting, for example, the score threshold is 0.5, and when the plan score is greater than 0.5, the overall direction of the current original learning plan is considered to be correct, and the current original learning plan is basically, effectively and executable, and only a certain fine adjustment is required to be made, so that the target learning plan can be obtained after the original learning plan is adjusted. For example, a molecular threshold may be set, and when a score of a portion of the plan scores is less than the molecular threshold, the portion is adjusted, e.g.The score of the score is smaller than a score threshold, at the moment, related learning materials and exercises for the dimension training of talents can be added, and a recommendation algorithm is adopted to select proper learning materials from a learning resource library of the system; />If the learning objective of the user is to increase affective power, for example, the planned learning materials can be focused on the relevant content of affective power, more learning materials relevant to the learning objective can be added, other parts can be adjusted in the same way, materials and available learning time of the user can be learned, and the system can determine which materials and exercises are contained in the learning plan and schedule the allocation of learning time and the like; according to the cost of learning the plan and modeling parameters, the system can consider the learning preference, speed, memory and other factors of the user so as to ensure the feasibility of the plan and the comfort of the user, and the details are not repeated. For example: it is also possible to compare which part has a lower score without setting a threshold value and then determine the object and direction of adjustment, illustratively:
Learning objective adjustment if the analysis results show learning objective parameter g j The contribution to the total score may be low because the target setting is not specific enough, challenging enough, or does not match the actual needs of the user. At this time, the adjustment may be performed by increasing or decreasing the number of learning targets, the difficulty of adjusting the targets, or changing the specific content of the targets;
adjustment of learning materials if learningData parameter m k The contribution to the total score is low, possibly because the selected learning material does not match the learning objective, the difficulty is not moderate, or the user is not interested. At this time, the type, difficulty and content of the learning material should be changed or adjusted according to feedback and learning effect of the user.
Adjustment of learning costs if learning cost parametersThe contribution to the total score is low, possibly because the time, effort, or other resources required to learn the plan are outside of the acceptable range of the user. At this time, the cost is reduced by adjusting the intensity, frequency, or duration of the learning plan.
Behavior modeling parameter adjustment if behavior modeling parametersThe contribution to the total score is low, possibly because the learning plan does not adapt well to the learning preferences, speed or memory of the user. At this time, the adaptability can be improved by adjusting the learning method, providing a personalized learning path, or adding links for review and exercise.
Through the adjustment, the specific adjustment of the original learning plan is realized, the target learning plan is obtained, and the new target learning plan is implemented. Feedback from the user and learning effect data are continuously collected during execution of the target learning plan for subsequent further optimization of the learning plan.
It can be understood that if the plan score is less than or equal to 0.5, it indicates that the original learning plan is invalid, the learning plan effect is far from reaching standards, and a large-scale adjustment is required, at this time, deep communication with the user is required, the user is required to understand the real needs and expectations, prompt the user to reset the learning target, ensure the relativity and accessibility of the learning target, and then perform comprehensive inspection and reconstruction to realize the adjustment of the original learning plan, thereby obtaining the target learning plan. For example:
adjustment of learning targets: re-evaluating and defining the learning objective when the plan score P is low, probably because the original learning objective does not meet the actual needs or capabilities of the user. At this time, the user needs to communicate deeply with the user, understand the real requirements and expectations, prompt the user to reset the learning target, and ensure the relevance and accessibility of the target.
And (3) adjustment of learning materials: and thoroughly replacing the learning materials, if the original learning materials cannot effectively support the learning objects or the feedback of the users to the materials is negative, the learning materials need to be thoroughly replaced, materials which are more suitable for the current level of the users and the learning objects are selected, and the diversity and the interactivity of the materials are ensured so as to improve the learning interest.
Adjustment of learning path (including learning mode) and: the adoption of new learning methods and paths, the low planning score P, may mean that the current learning methods and paths are unsuitable for the user, at which time the user may be considered to be prompted to reselect the learning path (including learning mode) or to introduce new learning techniques, such as adaptive learning systems, gambling learning or hybrid learning modes, to improve learning efficiency and user engagement.
And (3) adjustment of learning cost: review of learning costs-for low planning scores due to excessive costs, adjustments in both time and resources are required. For example, the learning period may be shortened, the learning frequency may be reduced, or more free resources may be used to ease the burden on the user.
And (3) adjusting behavior modeling parameters: personalized learning plans-low plan scores may be because the learning plan does not adapt well to the personalized needs of the user. At this point, more advanced data analysis and machine learning techniques may be employed to customize the personalized learning path and content according to the user's learning history, preferences, and feedback.
Then, a new target learning plan is determined based on the above adjustments, and adjustments and continuous monitoring are performed: dynamic adjustment and feedback loops, build a continuous monitoring and feedback mechanism, periodically evaluate the effectiveness of the learning plan, and make further adjustments based on the user's progress and feedback. The dynamic adjustment process helps to ensure that the learning plan always meets the actual requirements of the user and can adapt to the continuously changing learning state of the user.
In summary, the original learning plan can be further adjusted according to the first talent training data of the user, a more perfect target learning plan which meets the personalized requirements of the user is generated, and the learning plan is continuously optimized to maximize the comprehensive satisfaction degree and progress of the user.
In one embodiment, during the execution of the target learning plan, the second spoken training data is acquired, the first spoken training data or the second spoken training data is used as the target spoken training data, the first spoken training data is used as the target spoken training data, so that the feedback of the subsequent analysis to the user is facilitated more quickly, and the second spoken training data is used as the target spoken training data, so that the accuracy of the feedback is facilitated to be further improved.
In one embodiment, the emotion analysis processing is performed on the target spoken training data in step S500, and the obtaining the emotion analysis processing result includes steps S510-S530:
s510, performing multi-mode emotion analysis on the target talent training data through an emotion analysis engine to obtain a text emotion analysis result, a sound emotion analysis result and an image emotion analysis result.
Optionally, the system performs multi-mode emotion analysis on the target spoken training data through an emotion analysis engine comprising an NLP component, an acoustic model and a computer vision technology to obtain a text emotion analysis result, a sound emotion analysis result and an image emotion analysis result.
Alternatively, text emotion analysis resultsSentiment(T) Using Natural Language Processing (NLP) technology, combined with the spoken dimension index, the formula can be:
Sentiment(T)=
wherein,is the weight of the emotion vocabulary and is used for the emotion,S i is the emotion score of the emotion vocabulary,v j is the spoken dimension weight of the spoken dimension,K j the j-th talent dimension talent scoring can be performed by word segmentation and part-of-speech tagging on a speech text corresponding to target talent training data of a user so as to identify emotion words and other keywords, and then emotion scoring is performed on each word in the text by using an emotion dictionary.
Optionally, the results of the acoustic emotion analysisSentiment(A) The emotion model is used, and the dimension index of the talents is combined, so that the formula can be as follows:
Sentiment(A)=
wherein,is the weight of the sound feature (e.g. extracting the sound feature using an acoustic model, including pitch, volume, speech speed, etc.), ->Is the fraction of the ith sound feature, +.>For the number of sound features +.>As a number of the dimensions of the talent,v j is the spoken dimension weight of the spoken dimension,K j is the mouth score of the j-th mouth dimension.
Optionally, image emotion analysis resultsSentiment(I)Using an image analysis model, the image emotion analysis is obtained by combining facial expression, body language characteristics and a talent dimension index, and the formula can be as follows:
Sentiment(I)=
wherein,is the weight of image features (including facial expressions such as smile, anger, sadness, etc., body language features such as gestures, limb movements, etc., environmental features such as whether the environment at one person is pleasant, quiet or noisy, etc.), the weight of the image features (including facial expressions such as smile, anger, sad, etc.)>Is the fraction of the ith sound feature, +.>For the number of sound features +.>As a number of the dimensions of the talent,v j is the spoken dimension weight of the spoken dimension,K j is the mouth score of the j-th mouth dimension.
In some embodiments, a first weight parameter may be configured αSecond weight parameterβCalculating comprehensive emotion analysis scoreSentiment_Combined
Sentiment_Combined=α×(Sentiment(T)+Sentiment(A)+Sentiment(I))+β×(K 1 +K 2 +...+K j )。
S520, converting the text emotion analysis result, the sound emotion analysis result and the image emotion analysis result through a multi-mode feedback generator to obtain emotion intensity and emotion type.
Optionally, the system converts the text emotion analysis result, the sound emotion analysis result and the image emotion analysis result through a multi-mode feedback generator to obtain emotion intensity E and emotion typeT’
In one embodiment, determining the multimodal feedback in step S500 based on the emotion analysis processing results includes steps S510-S550:
s510, determining target talent scores corresponding to each talent dimension of the target talent training data.
It should be noted thatCalculating the target talent score corresponding to each talent dimension of the target talent training data based on the principle of the talent score calculation formula corresponding to the talent dimensionDnI.e. the target mouth score for the nth mouth dimension, the total number of mouth dimensions is N.
S520, when the emotion type is forward emotion, determining the target value of the emotion type as a first value, otherwise, determining the target value as a second value.
For example, when the emotion type is positive emotion, such as happy, excited, etc., a target value of the emotion type is determinedβ2、δ2、ν2、σ2 is the first value, otherwise the target value is determinedβ2、δ2、ν2、σ2 is a second value, the first value being greater than the second value.
S530, determining emotion weight parameters corresponding to the multi-mode feedback according to the emotion intensity.
Optionally, several intensity ranges and emotion weight parameters corresponding to each intensity range may be setβ1, when the intensity range is higher, the value of the emotion weight parameter is higher, namely the emotion intensity and the emotion weight parameter are positively correlated.
S540, carrying out weighted calculation according to each target talent score, the talent score weight parameter, the target numerical value, the emotion intensity and the emotion weight parameter, and determining the feedback score of the multi-mode feedback.
Optionally, the embodiments herein illustrate examples in which the multimodal feedback includes text feedback, audio feedback, visual feedback, and tactile feedback, so that the feedback score of the multimodal feedback includes text feedback score, audio feedback score, visual feedback score, and tactile feedback score, and one or more of the embodiments may include, without limitation. For example:
Text feedback scoreText_Feedback=(αD1+αD2+...+αN×DN)+βE+β2××T’
Sound feedback scoreSpeech_Feedback=(γD1+γD2+...+γN×DN)+δE+δT’
Visual feedback scoreVisual_Feedback=(μD1+μD2+...+μN×DN)+νE+νT’
Haptic feedback scoreHaptic_Feedback=(ρD1+ρD2+...+ρN×DN)+σE+σT’
Wherein,αiβiγiδiμiνi、、ρiσiis a weight coefficient that can be adjusted according to a specific context.
S550, at least one of text feedback, audio feedback, visual feedback and tactile feedback is performed according to the feedback score, in this embodiment, text feedback, audio feedback, visual feedback and tactile feedback are taken as examples, and in other embodiments, one or more feedback may be included, and the generated multi-modal feedback is integrated together to ensure consistency and coordination. Meanwhile, the system can collect feedback of the user on the feedback, know the acceptance degree and satisfaction degree of the user on the feedback, and adjust the weight coefficient based on the feedback and learning progress of the user so as to optimize the learning experience of the user; through the more complex design, the system can generate multi-modal feedback according to the dimensions of a plurality of spoken words and emotion analysis results, provide more personalized and highly accurate learning support, integrate the spoken word expression and emotion analysis, and provide unique learning experience for users.
Alternatively, S550 may include S5501-S5502:
s5501, when the feedback score is greater than the feedback threshold, feeding back at least one of forward text, forward speech, forward image or animation, and first intensity haptic sensation through the virtual mentor system.
Alternatively, the feedback threshold may be based on actual adjustment, and when the feedback score is greater than the feedback threshold, the text in the forward direction is fed back through the virtual instructor system, e.g. "your lecture content is very rich, giving the profound impression-! "; feedback of forward speech, for example, in happy, happy speech: "your emotion stick-! "; feeding back the forward image or animation, for example, generating a smiling portrait or pleasant animation; the first intensity of haptic sensation, including physical vibration or haptic simulation, is fed back by the haptic feedback device of the virtual mentor system.
In the virtual teacher system, the virtual teacher can be presented in front of the user in the form of holographic projection, and the user can see the three-dimensional holographic image of the virtual teacher, including the body language and facial expression of the virtual teacher; and the user can select one from virtual directors provided by the system, the virtual directors usually have different characters, expertise and styles, and the user can select according to own preference and requirement. By observing the image of the virtual teacher, the user can obtain emotion feedback and know the emotion state and response of the virtual teacher. In addition, the virtual teacher can also provide emotion feedback through voice output, which means that the virtual teacher can introduce or comment on the user's performance through voice according to the user's speech performance and emotion condition, and the user can hear the voice feedback to understand the emotion response and advice of the virtual teacher more deeply. Also, the virtual mentor may deliver physical sensations through the haptic feedback device to emphasize or highlight portions of important emotional feedback. In this way, the user can feel the emotional response of the virtual mentor through the sense of touch. Through these various interaction means, the user can receive emotional feedback comprehensively, including feedback in visual, auditory, text and tactile aspects, which helps the user to better understand the emotional response and advice of the virtual instructor, thereby improving his own presentation.
S5502, when the feedback score is less than or equal to the feedback threshold, feeding back at least one of negative text, negative voice, negative image or animation and second-intensity haptic sensation through the virtual teacher system; the first intensity is greater than the second intensity.
Conversely, when the feedback score is less than or equal to the feedback threshold, feedback is made through the virtual mentor system into the negative text, the negative speech, the negative image or animation, and the second intensity of haptic sensation. Wherein the first intensity is greater than the second intensity.
In some embodiments, the feedback threshold may be further refined, and the corresponding text feedback, audio feedback, visual feedback, and tactile feedback may be further refined. For example, the feedback threshold may be divided into:
(1) Excellent threshold: score greater than or equal to 90, corresponding to feedback:
text feedback, "your performance is very excellent-! Continue to hold-! "(forward text of first emotion intensity);
acoustic feedback-using encouraged and rewarded tones (first tone);
visual feedback, namely displaying victory or celebration animation (animation with a first degree of change, for example, the animation can be simulated in advance, playing different animations for a tester, detecting heartbeat brain signals of the tester through equipment, and the like, and determining the degree of change according to the degree of change);
Haptic feedback-Forward vibration feedback of intense first vibration intensity-rapid continuous vibration.
(2) Good threshold: score between 80 and 89, corresponding feedback:
text feedback "do well, and there are some small places to promote further. "(forward text of a second emotion intensity, the second emotion intensity being less than the first emotion intensity);
acoustic feedback, mildly encouraged tones (second tone, less than the first tone);
visual feedback, namely displaying smile or nodding animation (animation with second change degree, wherein the second change degree is smaller than the first change degree);
haptic feedback: forward vibration-intermittent vibration of a second vibration intensity (less than the first vibration intensity).
(3) Medium threshold: score between 70 and 79, corresponding to feedback:
text feedback "perform well, but there is room for improvement. "(Forward text with third emotion intensity, second emotion intensity less than second emotion intensity)
Acoustic feedback: mild tone proposal (third tone, less than second tone);
visual feedback, animation of neutral expression (animation of third degree of change, which is smaller than the second degree of change);
tactile feedback: vibratory alert of a third vibration intensity (less than the second vibration intensity).
(4) Improvement threshold: score between 60 and 69, corresponding feedback:
text feedback "improvements in some respects are desired. "(negative text of fourth emotion intensity)
Sound feedback, slightly-focused tones (fourth tone, less than third tone);
visual feedback-animation with slight frowning or thinking (animation with fourth degree of change, which is smaller than the third degree of change);
haptic feedback-slow single vibration cues of fourth vibration intensity (less than third vibration intensity).
(5) The attention threshold is less than 60, corresponding to feedback:
text feedback "there are several key points that require special attention and improvement. "(negative text of fifth emotion intensity, fifth emotion intensity greater than fourth emotion intensity);
acoustic feedback, a focused and careful tone (fifth tone, less than fourth tone);
visual feedback, namely displaying a mark or animation (animation with fifth change degree, wherein the fifth change degree is smaller than the fourth change degree) which needs to be focused;
haptic feedback-repeated slow vibration cues of a fifth vibration intensity (less than the fourth vibration intensity), indicating that attention is required.
In practical application, the feedback strategy and content can be adjusted by considering factors such as personal preference of the user, historical feedback acceptance condition, learning environment and the like besides the score range and the threshold. The system can dynamically adjust the feedback strategy and content according to the actual reaction of the user to the feedback (such as the investigation of the user satisfaction degree, the improvement condition of the learning progress, and the like) so as to realize the optimal teaching effect. In this way, the calculated score may be translated into specific, personalized feedback content and strategies, thereby providing more efficient and targeted learning support for the user.
Optionally, the interactive interface of the virtual instructor system also includes a series of functions, for example, the user may ask the system to provide more detailed feedback, ask questions about lecture skills, conduct exercise lectures, etc. The user can conduct real-time dialogue with the system and interact according to the own requirements. After the user's speech is completed, the system will provide summary and assessment reports to show the user's performance in different talent dimensions, which helps the user to learn about his own speech strength and aspects that need improvement. While the user may choose to continue learning, the system will adjust the personalized learning path and exercise program to help them to continuously improve speech skills, based on their performance and needs.
In one implementation, the multi-mode feedback method for talent training in the embodiment of the present application may further include steps S610 to S640:
and S610, extracting features of the target talent training data to obtain a plurality of talent dimension indexes and time sequence talent dimension indexes.
Optionally, feature extraction is performed on the target talent training data through a feature extraction model to obtain a plurality of talent dimension indexesS1,S2,S3,…,Sm(multiple dimensions such as confidence, speech speed, emotional state, facial expression, etc.) and time-series spoken dimension index S1,t,S2,t,S3,t,…,Sm,tI.e. further comprises a corresponding time point t on the basis of the spoken dimension index.
S620, determining a first emotion perception index according to the spoken dimension index and the preset weight, and determining a second emotion perception index according to the time sequence spoken dimension index and the preset weight.
Optionally, defining a first context awareness indexThe speech situation (different weights of different situations) of the user is represented, and the calculation formula is as follows:
alternatively, a sliding window method may be used to calculate a time series spoken dimension index, i.e., a second emotion perception index, over a period of timeC2, the calculation formula is as follows:
wherein,Nrepresenting the size of the time window, by adjusting the window size, the context changes over different time periods can be captured,is a preset weight. It should be noted that, according to the feedback of the user, the learning progress and the performance of the system, the preset weights are dynamically adjusted by using a machine learning algorithm.
Optionally, a deep learning model may also be introduced to optimize the accuracy of context awareness. For example, a multi-layer neural network model can be constructed, with the spoken dimension index as the input feature, learning a deeper context model through nonlinear transformation and hierarchical abstraction, assuming a deep neural network model F(. Quadrature.), which is input as time series data { of the spoken dimensionS1,t,S2,t,…,Sm,tOutput as a continuous context aware scoreC3:
C3=F({S1,t,S2,t,…,Sm,t})
The model can be trained by a back propagation algorithm to minimize the error between the predicted context awareness score and the actual feedback, thereby achieving accurate recognition of the user's speech context.
S630, determining emotion feedback suggestions according to the first emotion perception index and/or the second emotion perception index.
For example, if the first emotion perception indicator is less than the indicator threshold, and/or the second emotion perception indicatorIf the threshold is smaller than the index threshold, the confidence, the speech speed, the emotion state, the facial expression and other dimensions may be insufficient, and corresponding encouragement and advice are determined. For example, if the first emotion perception indicator is less than the indicator threshold, then it is determined that the first emotion perception indicator is less than the calculated score threshold or is the smallestFor example, possibly in emotional state +.>At this time, determining emotional feedback suggestions for improving the emotional state, or adjusting interaction strategies to adapt to new situations; similarly, determining +.f. less than the calculated score threshold or the minimum in the second emotion perception indicator>Corresponding emotional feedback advice is determined.
S640, generating emotion feedback advice in real time, or determining target emotion feedback advice and generating target emotion feedback advice according to the emotion feedback advice and the rewarding function by using a reinforcement learning method.
Optionally, after determining the emotional feedback advice, the emotional feedback advice is generated in real time in the virtual mentor system, for example, displayed in the page in the form of voice or in the form of text. In some embodiments, the reinforcement learning method may be further utilized to define a state space SS as all possible sets of context-aware results, an action space a as a set of emotion feedback suggestions executable by the system, and a reward functionR(s,a) Is shown in the statesTake action downwardsaThe instant rewards obtained after (emotion feedback advice) are subjected to self-iterative optimization under different environments by means of reinforcement learning methods such as Q-learning or deep Q-network (DQN), and the optimal interaction strategy, namely the target emotion feedback advice, can be found. Then, a target emotional feedback suggestion is generated. In summary, the scheme combines the advanced technologies of multi-component talent dimension analysis, time sequence modeling, deep learning, reinforcement learning and the like, and forms a system capable of comprehensively understanding usersThe speech situation and the situation perception and real-time interaction module which can flexibly provide personalized real-time feedback embody originality and advancement.
In one implementation, the multi-modal feedback method for talent training in the embodiments of the present application may further include steps S710-S720:
S710, calculating the fluency score and the confidence score of the target talent training data.
Alternatively, fluency scoringThe calculation formula is as follows:
wherein,representing the number of speech paragraphs in the target spoken training data,/->Silence duration representing the ith speech paragraph, represent +.>Total duration of the ith speech paragraph. />The higher the presentation the more fluent.
Confidence scoreThe calculation formula is as follows:
wherein,speech speed representing the jth speech paragraph, < ->Mean speech rate representing the whole target spoken training data,/->The higher the presentation the more confident the presentation.
Optionally, the fluency score and the confidence score of the target talent training data may be stored in a talent dimension index file of the user, and the system continuously monitors the speech performance of the user and continuously updates the talent dimension index file.
And S720, outputting an improvement recommendation and/or an advantage through the virtual teacher system according to the fluency score and the confidence score.
Alternatively, if the fluency score is greater than or equal to the score threshold, the dominance may be output as fluency by the virtual teacher system at this point, allowing the user to know the dominance and continue to hold, or if the fluency score is less than the score threshold, the virtual teacher system may output improvement advice at this point, how to improve to increase fluency. Further, on the basis of the above, if the confidence score is greater than or equal to the score threshold, the advantage can be outputted as the confidence score through the virtual teacher system at this time, so that the user knows the advantage and keeps the advantage, or if the confidence score is less than the score threshold, the improvement suggestion can be outputted through the virtual teacher system at this time, how to improve the confidence score.
Optionally, the system of the embodiment of the present application further provides a virtual teacher community, and the user may access the virtual teacher community, interact with other users, share experience, and participate in contests:
1. logging in a community: after the user starts the virtual director system, the user can choose to access the virtual director community. If the user has logged into the system, they can enter the community directly. If not already logged in, the system will require them to provide a user name and password or to log in using other means of authentication.
2. Community browsing: once users log into the virtual guide community, they can browse through different community slabs and topics. These slabs typically cover various lecture skills, spoken training, lecture experience sharing, competition discussions, etc. The user may select topics of interest, browse related posts and discussions.
3. Interaction and sharing: users may interact with other community members, including posting comments, praying, sharing their own lecture experience, asking questions, or answering questions of other users. Such interactions help users to build contact with others, share knowledge, and obtain feedback.
4. Participation in competition: the virtual instructor community also includes lecture competition plates, and users can participate in various lecture competition, display their own lecture skills and compete with other users, and the competition can be classified according to different spoken dimensions and topics so that the users can select the competition suitable for themselves.
5. Learning resources: the virtual mentor community also provides rich learning resources including courses, lecture paradigms, professional advice and learning materials. The user can browse the resources and acquire knowledge and skill about the dimensions of the portal.
6. Establishing a connection: the user can establish contact with other lectures lovers through the community, and establish a friendship, a learning partner or a partner. This helps users expand the personal network and make progress together.
7. Community management: the virtual director community is typically managed by administrators and editors to ensure quality and order of content, and administrators may monitor inappropriate content and take appropriate action to maintain a good atmosphere for the community.
Through the virtual teacher community, users can participate in interactions, share experiences, acquire suggestions, participate in contests and expand knowledge, thereby better improving their lecture skills and talent expression. This process helps users grow in the talent area and establish contact with other speech lovers. Before entering the virtual teacher community, the identity verification and user account management system is used to ensure that users in the virtual teacher community are legal registered users.
Optionally, the system of the embodiment of the application further provides a personalized learning resource library, and provides learning resources in the field of talent expression, including video courses, articles, instance lectures and exercise materials. Specifically:
1. logging in a resource library: users can access the personalized learning resource library in the virtual teacher system, and can select to enter the resource library plate in the system.
2. Searching resources: once in the repository, the user may use the search function or browse through different categories of learning resources, such as video courses, articles, instance lectures, and exercise materials. The user can search the resources according to the learning requirement and the spoken dimension index.
3. Personalized recommendation: the system can also provide personalized learning resource recommendation through a recommendation algorithm according to the talent dimension evaluation result and the learning target of the user. These recommended resources will address the weaknesses and needs of the users to help them improve presentation.
4. And (5) resource browsing: the user may click on the selected resource to view the detailed information. For example, if they choose to view a video tutorial, the system will provide information about the description, duration, author, etc. of the video. The user may choose to view or download the asset.
5. Learning and exercise: the user may select learning and exercise resources according to his own learning plan. For example, they may view course videos, read articles, refer to instance lectures, or download exercise material. These resources will help the user to improve talent expressive power and speech skills.
6. Learning and recording: the system will track the learning progress and activities of the user, including videos that have been watched, articles that have been read, exercises that have been completed, and so on. Users can view their learning records at any time to learn about their own progress.
Through the personalized learning resource library, a user can conveniently access learning resources in the talent expression field, select proper resources according to own requirements and talent dimension indexes, and improve the talent and speech capacity of the user in a personalized manner. This process helps users to learn and advance continuously in the area of talents.
Optionally, the system of the embodiment of the application further provides a talent dimension index tracking, and the user can track and compare own talent dimension index with other community members to evaluate own progress. Specifically:
1. accessing a talent dimension index tracking interface: the user can log in the virtual teacher community and choose to enter the "talent dimension index tracking" tile. This tile provides tracking and comparison functions for user spoken dimension indicators.
2. Looking at the personal talent dimension index: users can view their individual spoken dimension index profiles, which include historical data of multiple spoken dimension indices of fluency, confidence, language expression capability, gestural body language, and the like. These data show the variation in the user's performance of the dictation over different time periods.
3. Comparison with other community members: the user may choose to compare his own spoken dimension index with other community members. They can select specific members and view the spoken dimension index data of those members. This helps the user to understand their relative performance in the area of spoken expressions.
4. Setting a learning target: the user can set the individual talent learning target based on the history data of the talent dimension index and the comparison result. They can determine the talent dimension that needs improvement and formulate a corresponding learning plan.
5. Tracking progress: the user may periodically return to the spoken dimension index tracking interface to see the progress of their spoken dimension index during learning and practice. They can use charts and graphics to visually demonstrate their own progress.
Through the talent dimension index tracking function, a user can better know the development condition of the user in the talent expression field and compare the development condition with other community members, so that the progress of the user is estimated and a learning target is set. This helps the user to continuously increase and grow in terms of spoken expressions.
By the method of the embodiment of the application, at least the effects can be achieved:
1. accurate spoken assessment and feedback: by introducing a complex oral mathematical operation formula and a multi-mode feedback generation technology, accurate assessment of the oral expression capability of a learner is realized. Compared with the traditional method, the method can more accurately identify and quantify weaknesses and improvement spaces in the lecture, and provide high-quality feedback and improvement suggestions for learners.
2. Personalized learning path: the individualized learning plan is created by using the talent dimension evaluation result and the learning target set by the user, and the individualized plan can better meet the demands of different learners and help them to make greater progress in the talent expression field;
3. multi-modal emotion analysis: through the multi-mode feedback generation technology, emotion analysis is carried out on texts, and the multi-mode feedback generation technology also comprises multiple modes such as sound, images and the like, so that the comprehensive emotion analysis is beneficial to a learner to more comprehensively know own speech performance and improve speech skills.
4. Real-time interaction and feedback: the real-time interactive interface and virtual teacher community enable learners to interact with the system and other learners at any time and any place. The real-time interaction is helpful to improve learning efficiency, solve learning problems in time and obtain more feedback and suggestions.
5. Higher learning efficiency and quality: by comprehensively utilizing the technologies and functions, a more efficient, more personalized and more comprehensive talent training method is provided. The learner can promote the expression ability of the talents more quickly, and simultaneously improve the pleasure and participation of learning.
In conclusion, the method has the advantages of improving the accuracy, individuation degree and comprehensiveness of the talent training, and brings an innovative and efficient solution to the field of talent training by introducing a complex talent mathematical operation formula and a multi-mode feedback technology, so that the method is expected to improve the speech and expression skills of learners, and thereby has positive influence in multiple fields.
Referring to fig. 2, a block diagram of a multi-modal feedback device for talent training in accordance with an embodiment of the present application is shown, which may include:
the first acquisition module is used for acquiring input information and generating an original learning plan according to the input information;
the second acquisition module is used for acquiring the first talent training data in the process of executing the original learning plan;
the adjustment module is used for analyzing a plurality of talent dimensions of the first talent training data to obtain a talent score corresponding to each talent dimension, and adjusting the original learning plan according to the talent score and the input information to obtain a target learning plan;
The third acquisition module is used for acquiring second talent training data in the process of executing the target learning plan, and taking the first talent training data or the second talent training data as target talent training data;
the feedback module is used for carrying out emotion analysis processing on the target talent training data to obtain emotion analysis processing results, and determining multi-mode feedback according to the emotion analysis processing results, wherein the multi-mode feedback comprises at least one of text feedback, sound feedback, visual feedback and tactile feedback.
In one embodiment, the feedback module is further configured to:
extracting features of the target talent training data to obtain a plurality of talent dimension indexes and time sequence talent dimension indexes;
determining a first emotion perception index according to the spoken dimension index and the preset weight, and determining a second emotion perception index according to the time sequence spoken dimension index and the preset weight;
determining emotion feedback suggestions according to the first emotion perception index and/or the second emotion perception index;
generating emotion feedback advice in real time, or determining target emotion feedback advice and generating target emotion feedback advice according to the emotion feedback advice and a reward function by using a reinforcement learning method;
Or,
calculating the fluency score and the confidence score of the target talent training data;
and outputting the improvement recommendation and/or the advantage through the virtual teacher system according to the fluency score and the confidence score.
The functions of each module in each apparatus of the embodiments of the present application may be referred to the corresponding descriptions in the above methods, which are not described herein again.
Referring to fig. 3, a block diagram of an electronic device according to an embodiment of the present application is shown, the electronic device including: memory 310 and processor 320, memory 310 stores instructions executable on processor 320, and processor 320 loads and executes the instructions to implement the multi-modal feedback method of oral training in the above embodiments. Wherein the number of memory 310 and processors 320 may be one or more.
In one embodiment, the electronic device further includes a communication interface 330 for communicating with an external device for data interactive transmission. If the memory 310, the processor 320 and the communication interface 330 are implemented independently, the memory 310, the processor 320 and the communication interface 330 may be connected to each other and communicate with each other through buses. The bus may be an industry standard architecture (Industry Standard Architecture, ISA) bus, an external device interconnect (Peripheral Component Interconnect, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in fig. 3, but not only one bus or one type of bus.
Alternatively, in a specific implementation, if the memory 310, the processor 320, and the communication interface 330 are integrated on a chip, the memory 310, the processor 320, and the communication interface 330 may communicate with each other through internal interfaces.
The present embodiment provides a computer readable storage medium storing a computer program which when executed by a processor implements the multimodal feedback method of talent training provided in the above embodiment.
The embodiment of the application also provides a chip, which comprises a processor and is used for calling the instructions stored in the memory from the memory and running the instructions stored in the memory, so that the communication device provided with the chip executes the method provided by the embodiment of the application.
The embodiment of the application also provides a chip, which comprises: the input interface, the output interface, the processor and the memory are connected through an internal connection path, the processor is used for executing codes in the memory, and when the codes are executed, the processor is used for executing the method provided by the application embodiment.
It should be appreciated that the processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (digital signal processing, DSP), application specific integrated circuits (application specific integrated circuit, ASIC), field programmable gate arrays (fieldprogrammablegate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or any conventional processor or the like. It is noted that the processor may be a processor supporting an advanced reduced instruction set machine (advanced RISC machines, ARM) architecture.
Further, optionally, the memory may include a read-only memory and a random access memory, and may further include a nonvolatile random access memory. The memory may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may include a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory, among others. Volatile memory can include random access memory (random access memory, RAM), which acts as external cache memory. By way of example and not limitation, many forms of RAM are available. For example, static RAM (SRAM), dynamic RAM (dynamic random access memory, DRAM), synchronous DRAM (SDRAM), double data rate synchronous DRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchronous DRAM (SLDRAM), and direct memory bus RAM (DR RAM).
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions in accordance with the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable devices. Computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another.
In the description of the present specification, a description referring to the terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
Furthermore, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present application, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.
Any process or method description in a flowchart or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process. And the scope of the preferred embodiments of the present application includes additional implementations in which functions may be performed in a substantially simultaneous manner or in an opposite order from that shown or discussed, including in accordance with the functions that are involved.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions.
It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. All or part of the steps of the methods of the embodiments described above may be performed by a program that, when executed, comprises one or a combination of the steps of the method embodiments, instructs the associated hardware to perform the method.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules described above, if implemented in the form of software functional modules and sold or used as a stand-alone product, may also be stored in a computer-readable storage medium. The storage medium may be a read-only memory, a magnetic or optical disk, or the like.
The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think of various changes or substitutions within the technical scope of the present application, and these should be covered in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A multimodal feedback method for talent training, comprising:
acquiring input information, and generating an original learning plan according to the input information;
acquiring first talent training data in the process of executing the original learning plan;
analyzing a plurality of talent dimensions of the first talent training data to obtain a talent score corresponding to each talent dimension, and adjusting the original learning plan according to the talent score and the input information to obtain a target learning plan;
acquiring second talent training data in the process of executing the target learning plan, and taking the first talent training data or the second talent training data as target talent training data;
and carrying out emotion analysis processing on the target talent training data to obtain emotion analysis processing results, and determining multi-modal feedback according to the emotion analysis processing results, wherein the multi-modal feedback comprises at least one of text feedback, sound feedback, visual feedback and tactile feedback.
2. The multi-modal feedback method of spoken training of claim 1, wherein: the input information comprises learning targets, learning paths and available learning time, each learning path comprises corresponding learning materials, the original learning plan is adjusted according to the talent score and the input information, and the target learning plan is obtained by the steps of:
determining learning target parameters according to the learning targets, determining learning material parameters representing the learning material effects according to the learning materials, determining learning cost parameters according to the original learning plan, and determining behavior modeling parameters according to user feedback information or historical data of the original learning plan execution;
determining a plan score of the original learning plan according to the talent score, the learning objective parameter, the learning material parameter, the learning cost parameter, the available learning time and the behavior modeling parameter;
and when the plan score is larger than a score threshold value, adjusting the original learning plan to obtain a target learning plan.
3. The multi-modal feedback method of spoken training of claim 2, wherein: the determining the plan score of the original learning plan according to the talent score, the learning objective parameter, the learning material parameter, the learning cost parameter, the available learning time and the behavior modeling parameter includes:
Determining the mouth score, the learning target parameter, the learning material parameter, the learning cost parameter and the weight parameter corresponding to the behavior modeling parameter respectively;
and carrying out weighted calculation according to the weight parameters, the talent score, the learning target parameters, the learning material parameters, the learning cost parameters, the available learning time and the behavior modeling parameters to obtain the plan score of the original learning plan.
4. The multi-modal feedback method of spoken training of claim 1, wherein: the emotion analysis processing is carried out on the target talent training data, and the emotion analysis processing result is obtained, wherein the emotion analysis processing result comprises the following steps:
performing multi-mode emotion analysis on the target talent training data through an emotion analysis engine to obtain a text emotion analysis result, a sound emotion analysis result and an image emotion analysis result;
and converting the text emotion analysis result, the sound emotion analysis result and the image emotion analysis result through a multi-mode feedback generator to obtain emotion intensity and emotion type.
5. The multi-modal feedback method of spoken training of claim 4 wherein: the determining the multi-mode feedback according to the emotion analysis processing result comprises the following steps:
Determining a target talent score corresponding to each talent dimension of the target talent training data;
when the emotion type is forward emotion, determining a target value of the emotion type as a first value, otherwise, determining the target value as a second value;
determining emotion weight parameters corresponding to multi-mode feedback according to the emotion intensity, wherein the emotion intensity and the emotion weight parameters are positively correlated;
weighting calculation is carried out according to each target talent score, each talent score weight parameter, each target numerical value, each emotion intensity and each emotion weight parameter, and a feedback score of multi-mode feedback is determined;
and at least one of text feedback, sound feedback, visual feedback and tactile feedback is performed according to the feedback score.
6. The multi-modal feedback method of spoken training of claim 5 wherein: the at least one of text feedback, audio feedback, visual feedback, and tactile feedback according to the feedback score includes:
when the feedback score is greater than a feedback threshold, feeding back at least one of forward text, forward speech, forward image or animation, and first intensity haptic sensation through the virtual mentor system;
When the feedback score is less than or equal to a feedback threshold, feeding back at least one of negative text, negative speech, negative image or animation, and second intensity of haptic sensation by the virtual mentor system; the first intensity is greater than the second intensity.
7. The method of multimodal feedback for spoken training of any of claims 1-6, wherein: the method further comprises the steps of:
extracting features of the target talent training data to obtain a plurality of talent dimension indexes and time sequence talent dimension indexes;
determining a first emotion perception index according to the spoken dimension index and the preset weight, and determining a second emotion perception index according to the time sequence spoken dimension index and the preset weight;
determining emotion feedback suggestions according to the first emotion perception index and/or the second emotion perception index;
generating emotion feedback advice in real time, or determining target emotion feedback advice and generating target emotion feedback advice according to the emotion feedback advice and a reward function by using a reinforcement learning method;
or,
calculating the fluency score and the confidence score of the target talent training data;
And outputting an improvement recommendation and/or an advantage through the virtual teacher system according to the fluency score and the confidence score.
8. A multi-modal feedback device for spoken training, comprising:
the first acquisition module is used for acquiring input information and generating an original learning plan according to the input information;
the second acquisition module is used for acquiring the first talent training data in the process of executing the original learning plan;
the adjustment module is used for analyzing a plurality of talent dimensions of the first talent training data to obtain a talent score corresponding to each talent dimension, and adjusting the original learning plan according to the talent score and the input information to obtain a target learning plan;
the third acquisition module is used for acquiring second talent training data in the process of executing the target learning plan, and taking the first talent training data or the second talent training data as target talent training data;
and the feedback module is used for carrying out emotion analysis processing on the target mouth training data to obtain emotion analysis processing results, and determining multi-modal feedback according to the emotion analysis processing results, wherein the multi-modal feedback comprises at least one of text feedback, sound feedback, visual feedback and tactile feedback.
9. An electronic device, comprising: a processor and a memory in which instructions are stored, the instructions being loaded and executed by the processor to implement the method of any one of claims 1 to 7.
10. A computer readable storage medium having stored therein a computer program which when executed implements the method of any of claims 1-7.
CN202410201444.5A 2024-02-23 Multi-mode feedback method, device, equipment and storage medium for talent training Active CN117788239B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410201444.5A CN117788239B (en) 2024-02-23 Multi-mode feedback method, device, equipment and storage medium for talent training

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410201444.5A CN117788239B (en) 2024-02-23 Multi-mode feedback method, device, equipment and storage medium for talent training

Publications (2)

Publication Number Publication Date
CN117788239A true CN117788239A (en) 2024-03-29
CN117788239B CN117788239B (en) 2024-05-31

Family

ID=

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112766173A (en) * 2021-01-21 2021-05-07 福建天泉教育科技有限公司 Multi-mode emotion analysis method and system based on AI deep learning
CN114187544A (en) * 2021-11-30 2022-03-15 厦门大学 College English speaking multi-mode automatic scoring method
CN115496077A (en) * 2022-11-18 2022-12-20 之江实验室 Multimode emotion analysis method and device based on modal observation and grading
US20230080660A1 (en) * 2021-09-07 2023-03-16 Kalyna Miletic Systems and method for visual-audio processing for real-time feedback
US11677575B1 (en) * 2020-10-05 2023-06-13 mmhmm inc. Adaptive audio-visual backdrops and virtual coach for immersive video conference spaces
CN116484318A (en) * 2023-06-20 2023-07-25 新励成教育科技股份有限公司 Lecture training feedback method, lecture training feedback device and storage medium
CN117057961A (en) * 2023-10-12 2023-11-14 新励成教育科技股份有限公司 Online talent training method and system based on cloud service
CN117457218A (en) * 2023-12-22 2024-01-26 深圳市健怡康医疗器械科技有限公司 Interactive rehabilitation training assisting method and system
CN117522643A (en) * 2023-12-04 2024-02-06 新励成教育科技股份有限公司 Talent training method, device, equipment and storage medium
CN117541445A (en) * 2023-12-11 2024-02-09 新励成教育科技股份有限公司 Talent training method, system, equipment and medium for virtual environment interaction

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11677575B1 (en) * 2020-10-05 2023-06-13 mmhmm inc. Adaptive audio-visual backdrops and virtual coach for immersive video conference spaces
CN112766173A (en) * 2021-01-21 2021-05-07 福建天泉教育科技有限公司 Multi-mode emotion analysis method and system based on AI deep learning
US20230080660A1 (en) * 2021-09-07 2023-03-16 Kalyna Miletic Systems and method for visual-audio processing for real-time feedback
CN114187544A (en) * 2021-11-30 2022-03-15 厦门大学 College English speaking multi-mode automatic scoring method
CN115496077A (en) * 2022-11-18 2022-12-20 之江实验室 Multimode emotion analysis method and device based on modal observation and grading
CN116484318A (en) * 2023-06-20 2023-07-25 新励成教育科技股份有限公司 Lecture training feedback method, lecture training feedback device and storage medium
CN117057961A (en) * 2023-10-12 2023-11-14 新励成教育科技股份有限公司 Online talent training method and system based on cloud service
CN117522643A (en) * 2023-12-04 2024-02-06 新励成教育科技股份有限公司 Talent training method, device, equipment and storage medium
CN117541445A (en) * 2023-12-11 2024-02-09 新励成教育科技股份有限公司 Talent training method, system, equipment and medium for virtual environment interaction
CN117457218A (en) * 2023-12-22 2024-01-26 深圳市健怡康医疗器械科技有限公司 Interactive rehabilitation training assisting method and system

Similar Documents

Publication Publication Date Title
Bibauw et al. Discussing with a computer to practice a foreign language: Research synthesis and conceptual framework of dialogue-based CALL
Aufegger et al. Musicians’ perceptions and experiences of using simulation training to develop performance skills
Fung et al. ROC speak: semi-automated personalized feedback on nonverbal behavior from recorded videos
US20150206443A1 (en) Computing system with learning platform mechanism and method of operation thereof
Chen Effects of technology-enhanced language learning on reducing EFL learners’ public speaking anxiety
KR20160077200A (en) Computing technologies for diagnosis and therapy of language-related disorders
Yang et al. The current research trend of artificial intelligence in language learning: A systematic empirical literature review from an activity theory perspective
Barcomb et al. Rock or Lock? Gamifying an online course management system for pronunciation instruction
Zhai et al. A systematic review on cross-culture, humor and empathy dimensions in conversational chatbots: the case of second language acquisition
Karyotaki et al. Chatbots as cognitive, educational, advisory & coaching systems
Gačnik et al. User-centred app design for speech sound disorders interventions with tablet computers
Ferro et al. Readlet: Reading for understanding
CN117541444B (en) Interactive virtual reality talent expression training method, device, equipment and medium
Bahreini et al. Communication skills training exploiting multimodal emotion recognition
Paay et al. Can digital personal assistants persuade people to exercise?
Seeber et al. What’s load got to do with it? A cognitive-ergonomic training model of simultaneous interpreting
US11640767B1 (en) System and method for vocal training
CN117037552A (en) Intelligent classroom interaction system and method
Fucinato et al. Charismatic speech features in robot instructions enhance team creativity
CN117788239B (en) Multi-mode feedback method, device, equipment and storage medium for talent training
Cherner et al. AI-powered presentation platforms for improving public speaking skills: Takeaways and suggestions for improvement
CN117788239A (en) Multi-mode feedback method, device, equipment and storage medium for talent training
Barmaki Gesture assessment of teachers in an immersive rehearsal environment
Kamaghe Enhanced m-learning assistive technology to support visually impaired learners in Tanzania the case of higher learning institution
Rocha et al. The AppVox mobile application, a tool for speech and language training sessions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant